
Yao Shunyu's Google debut, the Gemini new model breaks SOTA: only 7 humans left to defend carbon-based programming

Google launched the Gemini 3 Deep Think model, achieving a score of 3455 Elo, ranking 8th globally, surpassing the previous high score of 2727. The model scored 84.6% on the ARC-AGI-2 benchmark test, setting a new SOTA, far exceeding Claude Opus 4.6's 68.8%. The new model aims to advance intelligent development and address research and engineering challenges, with the capability to analyze sketches and generate 3D printing files. Tsinghua University Physics Department Special Award winner Yao Shunyu participated in the project
In the face of the fierce offensive from Claude Opus 4.6 and GPT Codex 5.3, Google has responded with a significant upgrade of Gemini 3 Deep Think.

On Codeforces _ (a benchmarking platform featuring various competitive programming challenges) _, it achieved an astonishing 3455 Elo score, equivalent to 8th in the world.

Now, only 7 people globally have programming skills that can surpass it. The previous highest score was 2727 Elo, achieved by o3 a year ago.

The capabilities of Gemini 3 Deep Think go beyond this; it also directly scored an unprecedented 84.6% on ARC-AGI-2—a recognized benchmark for testing AI reasoning abilities.
It is worth noting that the scores of the previous strongest models hovered between 60%-70%, with Claude Opus 4.6 only achieving 68.8%.
On the Human Last Exam (HLE), Gemini 3 Deep Think also set a new SOTA, achieving a score of 48.4%.

The official statement indicates that the new version of Deep Think is a reasoning model specifically developed by Google, aimed at advancing the frontiers of intelligence and addressing modern challenges in science, research, and engineering.
Another "Yao Shun Yu"—the legendary special award winner from Tsinghua University's Department of Physics Shunyu Yao, joined Google DeepMind last September and is also a participant in this new Deep Think model.

The new version of DeepThink has already entered the laboratory
How powerful is the upgraded Gemini 3 Deep Think?
Its ambition goes beyond winning benchmark tests; it aims to enter the fields of research and engineering to help engineers tackle complex tasks.
The new version of Deep Think can analyze sketches, model complex shapes, and directly generate physical files for 3D printing. Here is a laptop stand it printed:
Google VP Josh Woodward shared the printed results on X, which look quite faithful to the sketch:

Mathematician Lisa Carbone from Rutgers University used Gemini 3 Deep Think to review a highly specialized mathematical paper.
As a result, Gemini 3 Deep Think successfully identified a subtle logical flaw that had gone unnoticed in previous human peer reviews.
Duke University's Wang An Laboratory optimized the preparation method for complex crystal growth using Gemini 3 Deep Think technology, aiming to discover new semiconductor materials.
As a result, Gemini 3 Deep Think successfully designed a process capable of growing films thicker than 100 microns, achieving precision targets that were difficult to reach with previous methods.
On X, XiaoKang Chen, a researcher from the DeepSeek multimodal team, also stated that Gemini 3 Deep Think excels at handling long-tail tasks in the scientific field.
He input a picture of a complex molecular structure into Deep Think, and the model accurately calculated the molecular formula.

Winning three new SOTAs, inference costs reduced by 82%
Last year, the specialized version of Deep Think won gold medals in international competitions such as IMO. Now, the newly upgraded Deep Think has set new SOTAs in multiple high-difficulty benchmark tests:
- Achieved a new SOTA of 48.4% in HLE without using any tools;
- Achieved an unprecedented score of 84.6% in the ARC-AGI-2 test, verified by the ARC Prize Foundation;
- Achieved an astonishing Elo score of 3455 on Codeforces;
- Reached gold medal level in the 2025 International Mathematical Olympiad.

Among them, ARC-AGI-2 is hailed as the "Turing Test" of the AI world, aimed at measuring the model's ability to handle novel reasoning tasks that it has never encountered before.
It is worth noting that the initial version of Deep Think, released last December, scored only 45.1%, and in less than three months, it has soared to 84.6%, surpassing Opus 4.6.
In the ARC-AGI-1, Gemini 3 Deep Think achieved a score of 96%, hitting the ceiling directly.

While performance has improved, reasoning costs have also significantly decreased. The cost of executing each task for the initial Deep Think was $77.16. This upgrade has reduced costs by 82%, with each task now costing only $13.62.

Since both 1 and 2 have been dominated by Gemini, the ARC Prize is already building ARC-AGI-3...
In addition to mathematics and programming, the upgraded Deep Think also performs excellently in a wide range of scientific fields such as chemistry and physics.
In the 2025 International Physics Olympiad and Chemistry Olympiad, Gemini 3 Deep Think achieved gold medal level scores in the written exams.
Furthermore, it has demonstrated capabilities in advanced theoretical physics, scoring 50.5% in the CMT-Benchmark test.

Led by Chinese, creating the strongest reasoning model
The research and development team of Gemini 3 Deep Think includes many Chinese members Core members include the post-95 Chinese scientist Yi Tay, who is engaged in research on reinforcement learning and reasoning in the Gemini team.

Previously, he co-led early large language model projects at Google Brain, including PaLM-2, UL2, and Flan-2.
After working at Google Brain for over three years, Yi Tay briefly left Google between 2023 and 2024 to co-found a unicorn AI startup—Reka.
Reka AI was founded by researchers from DeepMind, Google, and Meta, with the aim of creating powerful and efficient foundational models, and is now also developing tools for interface design, application logic, and other application aspects.
After a year and a half of entrepreneurship, Yi Tay returned to Google DeepMind as a senior research scientist, continuing his research in artificial intelligence and large language models.
Yao Shunyu, a Tsinghua alumnus who just transitioned from Anthropic to Google DeepMind last year, also participated in the development of the new model Deep think.

Yao Shunyu studied physics at Tsinghua University for his undergraduate degree and was awarded the special scholarship for outstanding undergraduate students at Tsinghua (the highest scholarship honor awarded to outstanding undergraduates).
During his undergraduate studies, he published high-level papers in Physical Review Letters (one of the top academic journals in the field of physics), providing the first international theoretical framework for the topological energy bands of non-Hermitian systems, accurately predicting related phenomena and defining two new physical concepts.
After graduating, he went to Stanford University to pursue a PhD, focusing on cutting-edge issues such as quantum many-body chaos and the dynamics of open quantum systems, studying under renowned scholars like Douglas Stanford (an American theoretical physicist regarded by peers as one of the top young scientists with the potential to change the direction of physics) and Zhenbin Yang (Yang Zhenbin, a Chinese-American scientist recognized as one of the most important physicists of the 20th century).
After completing his PhD, he first conducted postdoctoral research at UC Berkeley, and then joined Anthropic. During his year at Anthropic, he helped establish the foundational team for reinforcement learning, responsible for the Claude 3.7 Sonnet framework and the fundamental reinforcement learning theory behind the Claude 4 series After leaving Anthropic, Yao Shunyu moved to Google DeepMind to continue his research in AI. This release of the new Deep Think model is also his debut work at Google.
Risk Warning and Disclaimer
The market has risks, and investment requires caution. This article does not constitute personal investment advice and does not take into account the specific investment goals, financial situation, or needs of individual users. Users should consider whether any opinions, views, or conclusions in this article align with their specific circumstances. Investment based on this is at one's own risk
