The changes in AI over the past month have been groundbreaking

Wallstreetcn
2024.12.24 11:07
portai
I'm PortAI, I can summarize articles.

Ethan Mollick, a professor at the Wharton School of the University of Pennsylvania, observed that artificial intelligence has made breakthrough progress in the past month, including the emergence of smarter AI, particularly with the release of Gen3 and o1 models; with the introduction of visual capabilities, both ChatGPT and Gemini can now watch real-time video and engage in voice interactions simultaneously; AI-generated videos have suddenly become very good, with a real leap coming from the AI text-to-video generator that appeared last week

In the past month, AI has made significant leaps in multiple fields.

Ethan Mollick, a professor at the Wharton School of the University of Pennsylvania, summarized the latest advancements in artificial intelligence and their implications in a recent article. Mollick specializes in entrepreneurship and innovation as well as artificial intelligence, and is dedicated to exploring the significance of AI for work and education.

Mollick observed that there have been groundbreaking advancements in artificial intelligence over the past month:

(1) With intelligent AI represented by GPT-4 becoming ubiquitous, smarter AI is emerging, particularly with the release of Gen3 and o1 models;

(2) With the introduction of visual capabilities, ChatGPT and Gemini can now simultaneously watch live video and engage in voice interactions. Models that can interact with humans in real-time through the most common human senses (sight and sound) transform AI into a companion that feels like it is in the same room with you, rather than an entity trapped in a chat box on a computer;

(3) AI-generated videos have suddenly become very good, but the real leap comes from the emergence of AI text-to-video generators last week.

Mollick stated that we are far from seeing the end of advancements in artificial intelligence; what is striking is not just individual breakthroughs, but the speed and breadth of transformation. AI is making significant leaps at an uneven pace, even exceeding humans' ability to easily measure its impact.

Below is the full translation of Mollick's article:

Last month, the state of artificial intelligence changed, and just last week, the pace of change accelerated dramatically. AI labs launched a plethora of new products, some revolutionary and some incremental, making it hard to keep up. I believe that several of these changes are true breakthroughs that will reshape the future of AI (and perhaps ours as well). Here is our current situation:

Intelligent AI is now everywhere

At the end of last year, there was only one publicly available GPT-4/Gen2-type model, which was GPT-4. Now there are six to ten such models, some of which are open-source, meaning anyone can use or modify them for free. From the United States, there are OpenAI's GPT-4o, Anthropic's Claude Sonnet 3.5, Google's Gemini 1.5, Meta's open Llama 3.2, Elon Musk's Grok 2, and Amazon's new Nova. Chinese companies have released three open multilingual models that seem to have GPT-4 level performance, particularly Alibaba's Qwen, R1's DeepSeek, and 01.ai's Yi. Europe has only one participant in this field, which is France's Mistral. This confusing array of names means that building capable AI is not solely about OpenAI's unique magical formula, but rather any company with computer science talent and the ability to obtain the chips and computing power needed to train models can do so.In fact, the GPT-4 level artificial intelligence was shocking at its release, raising great concerns about the future, and now it can run on my home computer. The latest small model released by Meta this month is called Llama 3.3, which offers similar performance and can run completely offline on my gaming computer. Microsoft's newly launched micro Phi 4 is at the GPT-4 level and can almost run on a mobile phone, while its slightly less powerful predecessor Phi 3.5 can do so even more easily. Some degree of intelligence is provided on demand.

The Llama 3.3 running on my home computer passed the "rhyme involving cheese puns" benchmark test, with only a few unnatural puns.

Moreover, as I have discussed (and will soon discuss again), these ubiquitous AIs are now starting to power agents that can pursue their own goals as autonomous AIs. You can see what this means in my previous articles, where I used early agents for price comparison shopping and monitoring construction sites.

Very Smart AI Has Now Emerged

All of this means that if GPT-4 level performance is the highest level AI can achieve, then as we gradually get used to their capabilities, it may be enough for us to experience five to ten years of continuous change. But there is no sign that AI development is about to slow down significantly. We know this because there were two other important releases last month—the first release of the Gen3 model (which you can think of as GPT-5 class models) and the release of the o1 model, which can "think" before answering, making them significantly stronger in reasoning capabilities than other LLMs. We are in the early stages of the Gen3 release, so I won't go into too much detail about them in this article, but I do want to talk about o1.

When o1 was released in its early o1-preview form, I discussed it, but the two more complex versions, o1 and o1-pro, have significantly enhanced capabilities. These models take time to perform invisible "thinking" before answering questions—mimicking the human logical problem-solving approach. This method is called test-time computation, and it has proven to be key to enabling the models to solve problems better. In fact, these models are now smart enough to make meaningful contributions to research in various waysHere's an interesting example: I read an article about the recent social media panic—a scholarly paper pointed out that black plastic utensils might poison humans because they are partially made from recycled electronic waste. The paper noted that a compound called BDE-209 leaches from these utensils at a very high rate, close to the safety dose level set by the U.S. Environmental Protection Agency. Many people threw away their spatulas, but Joe Schwarcz from McGill University thought this was unreasonable and discovered a mathematical error where the authors mistakenly multiplied the dose of BDE-209 by 10 on page seven of the article—this error was overlooked by both the paper's authors and the peer reviewers. I was curious if o1 could spot this mistake. So, I pasted the PDF text from my phone and input: "Check the math in this article carefully." Just like that, o1 immediately identified the error (other AI models did not).

When a model can not only handle an entire academic paper but also understand the significance of "checking the math," and then successfully verify the results, the capabilities of artificial intelligence undergo a fundamental change. In fact, my experiments, along with those of others, have sparked interest in studying how frequently o1 can find errors in scientific literature. We do not know how often o1 can accomplish this feat, but finding the answer seems important as it points to a new frontier of capability.

Indeed, even the early version of o1, the preview model, seems to represent a leap in scientific capability. Researchers from Harvard University, Stanford University, and others published a shocking medical working paper concluding that "o1-preview demonstrates superhuman performance in differential diagnosis, diagnostic clinical reasoning, and management reasoning, outperforming previous generations of models and human doctors in multiple domains." This paper has not yet been peer-reviewed, and it does not suggest that artificial intelligence can replace doctors, but it, along with the results above, does indicate that the world is changing, and not considering artificial intelligence as a second opinion may soon become a mistake.

Perhaps more importantly, more and more researchers are telling me that o1, especially o1-pro, is generating novel ideas and solving unexpected problems in their fields. The problem is that only experts can evaluate whether artificial intelligence is right or wrong. For example, my very smart Wharton colleague Daniel Rock challenged me to have o1-pro “prove the universal approximation theorem for neural networks using proofs not found in the literature, without 1) assuming infinitely wide layers and 2) more than 2 layers.” Here is its response:

Is this correct? I don’t know. It’s beyond my area of expertise. Daniel and other experts who have seen it also cannot judge its correctness at first glance, but they find it interesting enough to warrant further investigation. It turns out that the proof is flawed (though more interaction with o1-pro might fix these errors). Nevertheless, the result still introduces some new methods that inspire further thought. As Daniel pointed out to me, when researchers use o1, it doesn’t need to be correct to be useful: “Asking o1 to complete the proof in a creative way is essentially asking it to be a research colleague. The model doesn’t have to prove correct to be useful; it just needs to help us become better researchers.”

The artificial intelligence we have now seems capable of solving very difficult PhD-level problems, or at least effectively collaborating with researchers to tackle these issues. Of course, the problem is that unless you are a PhD in a particular field yourself, you don’t actually know whether these answers are correct, which brings a new set of challenges for evaluating artificial intelligence. Further testing is needed to understand how useful it is and in which areas it is beneficial, but this new frontier of AI capabilities is worth watching.

AI Can Observe You and Talk to You

For months, we have been using AI voice models, but last week we introduced a new feature—vision. ChatGPT and Gemini can now simultaneously watch live video and engage in voice interactions. For example, I can now share my screen in real-time with Gemini's new small Gen3 model, Gemini 2.0 Flash.

Or better yet, you can try it for free. Seriously, it’s worth experiencing what this system can do. Gemini 2.0 Flash is still a small model with limited memory, but you’re starting to get the ideaModels that can interact with humans in real-time through the most common human senses (sight and sound) turn AI into a companion that is with you in the room, rather than an entity trapped in a computer chat box. ChatGPT's advanced voice mode can perform the same function via mobile phones, meaning millions of users can widely access this feature. As AI becomes increasingly prevalent in our lives, its impact will be profound.

AI Video Suddenly Becomes Very Good

In the past year, artificial intelligence image creation technology has made impressive progress, with models running on my laptop capable of generating images that are difficult to distinguish from real photos. They have also become easier to manipulate, responding appropriately to prompts like "otters on a plane using Bluetooth" and "otters on a plane using Wi-Fi." If you want to try it yourself, Google's ImageFX is a very simple interface that uses the powerful Imagen 3 model released last week.

But the real leap last week came from AI text-to-video generators. Previously, AI models from Chinese companies typically represented the cutting edge of video generation, including impressive systems like Kling and some open-source models. However, the situation is changing rapidly. First, OpenAI released its powerful Sora tool, followed by Google launching the more powerful Veo 2 video creator, which has become a recent topic of discussion. If you subscribe to ChatGPT Plus, you can now use Sora, which is worth trying, but I got early access to Veo 2 (reportedly set to launch in a month or two), and it... is stunning.

Demonstration is always better than narration, so take a look at this 8-second clip compilation (currently the limit, although it can obviously create longer films). I provided the exact prompts for each clip, and the clips were selected from the first batch of films made by Veo 2 (it creates four clips at a time), so there was no selection from many examples. Notice the apparent weight and heft of objects, shadows, and reflections, the consistency across scenes while maintaining hairstyle and detail, and how closely the scenes match what I requested (if you look closely, the red balloon is there). There are errors, but they are hard to spot at first glance (although it still struggles with gymnastics, which is very challenging for video models) It's truly impressive.

What does all this mean?

I will elaborate on my views in future articles, but the lesson to be drawn is that, for better or worse, we are far from seeing the end of advancements in artificial intelligence. What is striking is not just individual breakthroughs—AI grading math exams, generating video clips close to movie quality, or running on gaming computers. Rather, it is the speed and breadth of transformation. A year ago, GPT-4 felt like a glimpse into the future. Now, it is essentially running on mobile phones, and new models are catching errors missed in academic peer reviews. This is not steady progress—we are witnessing AI making significant leaps at an uneven pace, surpassing our ability to easily measure its impact. This indicates that when the situation is in dynamic flux, the opportunity to shape how these technologies change your field exists now, rather than after the transformation is complete