Bard, Claude2, and OpenAI are all upgrading, nobody is idle; the large models are catching up with OpenAI, while OpenAI is preparing to become an expert in combating overwork; Meta defeats Midjourney; Stability AI collaborates with TENCENT to launch Stable Doodle;
What major events happened in the AI field this week?
While large models are catching up with ChatGPT, OpenAI is preparing to become an anti-expert;
This week, right after OpenAI updated the "Code interpreter" plugin, two major competitors, Anthropic and Alphabet-C, announced updates to Claude and Bard respectively;
The current trend of these two competitors is to allow users to "use GPT4 plus for free" and even surpass it;
On the other hand, the originator of AI large models remains calm: not only does it not rush to develop large models, but it is also prepared to pause and wait for the progress of other large models.
"According to reports, OpenAI is preparing to create multiple low-cost small-scale GPT-4 models, each smaller expert model trained in different tasks and subject areas."
In short, OpenAI is planning to take a cost-reducing and lightweight approach, and the next target is likely to promote various specialized large models.
In the view of "Hard AI," this approach of "hybrid expert models" by OpenAI may indeed sacrifice some quality of answers, but it may be a more effective path closer to industrial applications.
In this weekly report, you can also obtain the following information:
1. Bard, Claude2, and ChatGPT are all upgrading, no one is idle
2. The AI drawing field continues to unfold:
Meta defeats Midjourney; Stability AI collaborates with TENCENT to launch Stable Doodle; the video segmentation large model "SAM-PT" appears;
3. Major events in the domestic model field:
Insuring domestic large models; JD.com releases the Yanxi large model; Zhuyuan surpasses DeepMind; Wang Xiaochuan's large model upgrades again;
4. Overseas hot news
Oxford and Cambridge lift the ban on ChatGPT; Meta is about to release the commercial version of AI models; Musk "reverses" on the spot, from resisting AI to establishing "xAI";
Previously, Bard only supported English inquiries, but it has finally updated to support more than 40 languages including Chinese, and added access for the European Union and Brazil regions.
Not only that, Bard has also updated the following features:
- Uploading and understanding images (tips: only available in English version);
- Asking questions through voice;
- Saving conversation history and sharing conversation links (same as GPT);
- Customizing the length and style of replies;
- Exporting code functionality.
The second generation of Claude, which has been upgraded, directly utilizes the GPT plus membership, supports uploading PDFs, and can help you find and summarize the relationships between multiple documents (supports multiple formats such as txt and pdf, with a maximum size of 10MB). GPT4 Latest Plugin - Code Interpreter, initially this plugin was called - Making Everyone a Data Analyst (especially strong in data processing and plotting);
However, recently, it has unlocked some new features under the testing of netizens, such as making short videos, creating simple games, emojis, and so on;
The functionality of this plugin still needs to be explored and decrypted by netizens.
Meta has launched a single multimodal large model - CM3leon, which is the peak of the market as soon as it is launched?
Now everyone says that CM3leon is even better than Stable Diffusion, Midjourney, and DALL-E 2, why is that?
[How powerful is it]
CM3leon adopts an autoregressive model, leading the previous multimodal models such as Stable Diffusion in terms of computational efficiency, reducing the calculation by five times;
It can handle more complex prompts and complete plotting tasks;
Edit existing images based on text instructions in any format, such as changing the sky color or adding objects in specific locations.
To be objective: CM3leon's capabilities can truly be at the forefront of the multimodal market, not only with higher clarity, but also breaking through the previous painting bottlenecks of multimodal models, such as detailed depiction of hands, object and spatial layout based on language prompts, etc.;
This may be attributed to CM3leon's multifunctional architecture, which means that multimodal large models can freely switch between text, images, videos, and other tasks, which was not achievable in previous multimodal models.
In simple terms, the Stable Doodle large model is like giving it a sketch to help you control the image; similar to the effect of ControlNET;
[How powerful is it]
This Stable Doodle is a combination of the Stable Diffusion XL model and the T2I-Adapter.
And T2I-Adapter is a graphic-text controller developed by TENCENT ARC Lab; it has a storage space of only 70M, very compact, but it can better understand the outline of the sketch and help SDXL to further control image generation;
Some time ago, Meta AI open-sourced a very powerful basic image segmentation model called Segment Anything Model (SAM), which instantly ignited the AI community. Now, researchers from the Swiss Federal Institute of Technology Zurich, the University of Science and Technology, and the Swiss Federal Institute of Technology Lausanne have released the SAM-PT model, which extends SAM's zero-shot capability to tracking and segmentation tasks in dynamic videos.
In other words, videos can now be subjected to detailed segmentation.
The "Interim Measures for the Management of Generative Artificial Intelligence Services," jointly announced by seven departments, will take effect on August 15, 2023.
The main points include:
Requirements for classification and hierarchical supervision;
Clear requirements for handling training data and annotations;
Clear requirements for providing and using generative AI services;
The introduction of these "Interim Measures" is equivalent to providing insurance for companies that use and provide generative AI services in China. In the future, even if there are problems, they will know where to file complaints.
Last month, Tmall Genie and the Tongyi Big Model Joint Team released a 100PoisonMpts large model governance open-source dataset, also known as "100 Bottles of Poison for AI," with the aim of guiding AI to avoid the pitfalls of discrimination and bias that are difficult for ordinary people to avoid.
This is the evaluation result after poisoning multiple large models: in terms of depression issues, GPT4, GPT3.5, and Claude still have higher overall scores;
What is alignment for?
Simply put, large model alignment research aims to make AI provide answers that better align with human intentions, mainly in answering questions that are more emotional, empathetic, and in line with human values. We hope that AI will also learn to care about humanities in the future.
JD.com has officially released the Yanxi large model and the Yanxi AI development computing platform, aiming to be the most industry-savvy service tool.
Currently, Yanxi has started accepting reservations and is expected to be officially launched in August.
The "Wudao·Vision" research team at the Intelligence Source Research Institute has open-sourced a new unified multimodal pre-training model called Emu. It not only performs excellently in eight benchmark tests but also surpasses many previous SOTA models.
The biggest feature of this pre-training model is: bridging multimodal input and multimodal output;
It achieves: content completion for any multimodal image-text task and next-step autoregressive prediction for the task;
What can this pre-training model do?
It can train a CM3leon large model that is comparable to the freshly released Meta. (The method is provided, and the rest depends on individual efforts)
Baichuan Intelligence has once again upgraded its large model, Baichuan-13B, with the number of parameters increasing from 7 billion to 13 billion. Baichuan-13B-Chat, a conversational model, debuted alongside its two quantized versions, INT4/INT8.
Baichuan-13B sets a new record for open-source training data:
The training data for Baichuan-13B is a staggering 1.4 trillion tokens! This is 140% more than LLaMA_13B (a well-known meta-model). It outperforms GPT directly in Chinese language evaluations, especially in the fields of natural sciences, medicine, arts, mathematics, and more.
Other AI News from Overseas
- Oxford and Cambridge have lifted the ban on ChatGPT.
- Meta is planning to release a commercial version of their AI model.
- Elon Musk, who previously voiced strong opposition to generative AI, now announces the establishment of "xAI" on the spot, contradicting his previous stance.