16 Sep Open AI o1 surpassed human PhDs
OpenAI has introduced the o1 model, a large language model trained with reinforcement learning to perform complex reasoning. Unlike traditional models, o1 engages in a long internal chain of thought before providing a response, allowing it to think more deeply.
In competitive programming, o1 ranks in the 89th percentile on Codeforces questions, places among the top 500 students in the U.S. for the USA Math Olympiad qualifier, and exceeds PhD-level accuracy in physics, biology, and chemistry problems. While still being optimized for user-friendliness, an early version of o1, known as o1-preview, is now available for use in ChatGPT and to select API users.
The model’s reasoning capabilities are driven by a reinforcement learning process that allows it to improve with both training and more thinking time. Through a rigorous set of human exams and machine learning benchmarks, o1 has consistently outperformed previous models, including GPT-4o. Its performance in solving math problems, as seen on the AIME exams, shows a significant improvement, achieving 83% accuracy when evaluated with consensus among multiple samples.
In scientific benchmarks, o1 has also surpassed human PhDs in solving problems related to chemistry, physics, and biology, setting a new standard for AI reasoning. With its chain-of-thought approach, o1 has proven capable of refining its strategies, recognizing mistakes, and applying alternate methods, making it a groundbreaking advancement in AI reasoning.
No Comments