Online chatbots like OpenAI’s ChatGPT and Google’s Gemini often face challenges when dealing with basic math problems, and the code they generate frequently contains bugs or incomplete information.
Sometimes, they even fabricate responses. However, on Thursday, OpenAI introduced a new version of ChatGPT, which aims to address these issues.
Powered by a new AI system called OpenAI o1, the chatbot is now capable of “reasoning” through tasks related to math, coding, and science.
“With earlier models like ChatGPT, when you asked a question, they would immediately begin responding,” explained Jakub Pachocki, OpenAI’s chief scientist.
“This new model, however, can take its time, think through the problem — in English — and break it down systematically to provide the most accurate answer.”
In a demonstration, Dr. Pachocki, alongside OpenAI technical fellow Szymon Sidor, showcased the chatbot solving an acrostic, a word puzzle much more complex than a standard crossword.
The chatbot also successfully tackled a Ph.D.-level chemistry question and diagnosed an illness based on a detailed patient report.
The new technology is part of a broader effort to create AI systems capable of reasoning through complex tasks. Competitors like Google and Meta are developing similar technologies, while Microsoft and GitHub are working to integrate OpenAI’s latest system into their products.
The objective is to build systems that solve problems through a series of logical steps, akin to human reasoning. Such advancements could significantly benefit computer programmers who rely on AI for code generation and improve the effectiveness of automated tutoring systems for math and other subjects.
According to OpenAI, the technology could also assist physicists in generating complex mathematical formulas and help healthcare researchers in their experiments.
Since ChatGPT’s initial release in late 2022, OpenAI has demonstrated that machines can respond to requests in a more human-like manner, answering questions, drafting essays, and even writing computer code. However, some responses have been flawed.
ChatGPT developed its capabilities by analyzing vast amounts of text from the internet, including Wikipedia, books, and chat logs. By identifying patterns within this text, it learned to generate its own.
However, because the internet contains a significant amount of inaccurate information, the AI sometimes perpetuated these falsehoods or even invented information.
To mitigate such issues, Dr. Pachocki, Mr. Sidor, and their team have worked to reduce the flaws in the system by employing a process known as reinforcement learning.
Through this method, which can last weeks or even months, the system learns through trial and error. For example, by repeatedly solving math problems, it learns which strategies yield correct results.
Over time, as the system processes a vast number of problems, it identifies patterns, although it still cannot reason exactly like a human. Mistakes and hallucinations can still occur.
“It’s not perfect,” Mr. Sidor acknowledged. “But you can trust it will work harder, and it’s much more likely to give the right answer.”
Starting Thursday, consumers and businesses subscribing to ChatGPT Plus and ChatGPT Teams services gained access to this new technology. OpenAI is also offering the system to software developers and companies interested in building their own AI applications.
OpenAI reported that the new system performed better than its predecessors in certain standardized tests.
On the qualifying exam for the International Mathematical Olympiad (IMO), a premier math competition for high school students, the earlier version of the technology scored 13 percent, whereas OpenAI o1 achieved a score of 83 percent.
Still, standardized test results don’t always predict real-world performance, and while the system may excel in solving a math problem, it could still face difficulties in teaching the subject effectively.
“There’s a difference between problem-solving and providing assistance,” said Angela Fan, a research scientist at Meta. “New models that reason can solve problems. But that’s not the same as helping someone work through their homework.”
Leave a Reply