AI "multimodal" battle: "Drawing new artifact" Dall-E 3 detonates Bing, Microsoft urgently deploys thousands of servers over the weekend.

Wallstreetcn
2023.10.04 11:27
portai
I'm PortAI, I can summarize articles.

OpenAI's multimodal technology is truly impressive. Will AGI be far behind?

Still stuck in the era of AI writing poetry and solving math problems? The world of generative AI has changed, and multimodality is becoming the mainstream trend.

At the end of September, OpenAI announced the new generation of image generation model - DALL·E 3, and merged it with ChatGPT. It can not only create delicate illustrations with just a few simple instructions, but also make complex animations with simple instructions, which has amazed netizens and can even challenge Midjourney.

Multimodal update boosts Bing traffic

Because it directly integrates with the Bing search engine, Bing is now possibly the easiest tool on the Internet to create high-quality AI images for free.

According to The Information, some Microsoft employees have stated that with the integration of DALL·E 3, Bing's traffic has surged far beyond its usual weekly levels. The employee said that the traffic increase even caused Bing's AI chatbot to be offline for several hours on weekends. Bing's leadership had to request access to more AI-dedicated servers from Microsoft's top management, and engineers spent a weekend getting thousands of servers online.

Multimodality will accelerate the birth of AGI, and the demand for computing power will further increase

In addition to drawing, multimodal AI is also seen as one of the important paths to AGI. AGI refers to comprehensive intelligent systems that have intelligence comparable to or surpassing human intelligence.

On the one hand, multimodal AI can integrate and process various types of information, such as text, images, audio, and video. This rich information processing capability provides the system with more knowledge and understanding, thereby promoting the development of intelligent systems towards AGI.

On the other hand, multimodal AI can simultaneously process multiple data sources, enabling the system to engage in more comprehensive reasoning and decision-making. This comprehensiveness helps simulate the human ability to consider multiple sources of information, bringing the system closer to human cognitive patterns.

In addition, Guosheng Securities also pointed out that the current input and output of multimodal AI are mainly text and images, and application scenarios include intelligent office and various AIGC functions. In the next 1-5 years, as multimodal GPT develops and improves AI's generalization ability, universal vision, universal robotic arms, industry service robots, and truly intelligent homes will enter our lives. In the next 5-10 years, large models combined with complex multimodal solutions are expected to have complete interactive capabilities with the world, bringing broad applications to various industries such as universal robots.

Furthermore, the growth of applications and the increasing complexity of multimodal data processing have driven the demand for computing power. If models are the "traffic entrance" of the next era, then computing power is the driving force behind the models. In the context of intensified competition among models in major companies, the arms race for computing power is expected to become even more intense.