For OpenAI, o1 represents a step toward its broader goal of human-like artificial intelligence. More practically, it does a better job at writing code and solving multistep problems than previous models. But it’s also more expensive and slower to use than GPT-4o. OpenAI is calling this release of o1 a “preview” to emphasize how nascent it is.
The training behind o1 is fundamentally different from its predecessors, OpenAI’s research lead, Jerry Tworek, tells me, though the company is being vague about the exact details. He says o1 “has been trained using a completely new optimization algorithm and a new training dataset specifically tailored for it.”
OpenAI taught previous GPT models to mimic patterns from its training data. With o1, it trained the model to solve problems on its own using a technique known as reinforcement learning, which teaches the system through rewards and penalties. It then uses a “chain of thought” to process queries, similarly to how humans process problems by going through them step-by-step.
At the same time, o1 is not as capable as GPT-4o in a lot of areas. It doesn’t do as well on factual knowledge about the world. It also doesn’t have the ability to browse the web or process files and images. Still, the company believes it represents a brand-new class of capabilities. It was named o1 to indicate “resetting the counter back to 1.”
I think this is the most important part (emphasis mine):
As a result of this new training methodology, OpenAI says the model should be more accurate. “We have noticed that this model hallucinates less,” Tworek says. But the problem still persists. “We can’t say we solved hallucinations.”
While truly defining pretty much any aspect of human intelligence is functionally impossible with our current understanding of the mind, we can create some very usable “good enough” working definitions for these purposes.
At a basic level, “reasoning” would be the act of drawing logical conclusions from available data. And that’s not what these models do. They mimic reasoning, by mimicking human communication. Humans communicate (and developed a lot of specialized language with which to communicate) the process by which we reason, and so LLMs can basically replicate the appearance of reasoning by replicating the language around it.
The way you can tell that they’re not actually reasoning is simple; their conclusions often bear no actual connection to the facts. There’s an example I linked elsewhere where the new model is asked to list states with W in their name. It does a bunch of preamble where it spells out very clearly what the requirements and process are; assemble a list of all states, then check each name for the presence of the letter W.
And then it includes North Dakota, South Dakota, North Carolina and South Carolina in the list.
Any human being capable of reasoning would absolutely understand that that was wrong, if they were taking the time to carefully and systematically work through the problem in that way. The AI does not, because all this apparent “thinking” is a smoke show. They’re machines built to give the appearance of intelligence, nothing more.
When real AGI, or even something approaching it, actually becomes a thing, I will be extremely excited. But this is just snake oil being sold as medicine. You’re not required to buy into their bullshit just to prove you’re not a technophobe.