In recent times, Google's progress in large AI models has attracted a lot of attention. But when everyone was rubbing their hands, trying to wait and see how Google would turn the tide, OpenAI, the overlord in the field of AI large models, once again announced major news. According to foreign media The Information, OpenAI is about to launch a multimodal model GPT-Vision . The title of the article bluntly states that this is used to hit back at Google. Although the new version has not actually come yet, it is enough for us to get a glimpse of the next stage of competition focus in this track - multimodality. 01# Where has “GPT-5” progressed to?According to The Information, OpenAI is preparing to launch the image understanding function GPT-Vision based on GPT-4. This is equivalent to adding buffs to GPT-4 and improving it step by step. Anyway, GPT-4 is still recognized as the top AI. In addition, the report also mentioned that OpenAI may launch a large model code-named "Gobi" after GPT-Vision. Unlike GPT-4, the so-called "more powerful" Gobi was built according to the multimodal model from the beginning. The outside world has locked this new large model as a strong candidate for GPT-5, because most people do not believe the rumors previously refuted by Sam Altman, CEO and co-founder of OpenAI, at an MIT event:
After all, this statement was mainly used to respond to the open letter "Pause AI Giant Experiment". On March 29, thousands of people in the technology industry, including Tesla CEO Elon Musk, Apple co-founder Steve Wozniak, and Turing Award winner Yoshua Bengio, jointly called for a six-month suspension of the development of AI systems more powerful than GPT-4 to allow time to resolve AI safety and ethical issues. Earlier this month, Mustafa Suleyman, co-founder of DeepMind and current CEO of Inflection AI, said in an interview that he believed OpenAI was secretly training GPT-5. Suleyman put the speculation of most people on the table, and the pressure was put on OpenAI again. However, it may be too early to talk about GPT-5 now, because OpenAI has not responded to the relevant news so far. Except that the new large model code-named Gobi may be the rumored GPT-5, we don’t know anything else. Even according to foreign media reports, OpenAI seems to have not started training Gobi yet. Relatively speaking, the situation of GPT-Vision is more traceable. Many people are speculating that GPT-Vision is likely the multimodal feature that was demonstrated at the GPT-4 launch event in March. At that time, GPT-4 generated web page code based on a simple handwritten sketch, shocking the world. But after the momentary surprise, apart from providing it to Be My Eyes, a company that creates technology for the blind, there has been no further information on feature updates or actual use, including features such as Vincent images. The reason can probably be inferred from a New York Times report in July that OpenAI is worried that the feature could be abused for facial recognition and other purposes. Combined with what Sam Altman mentioned in his previous rumor-busting article, "OpenAI is addressing various security issues based on GPT-4 that were ignored in the open letter." The relevant security concerns may have been resolved. It also means that this kind of blocking is likely to be lifted. According to The Information, OpenAI hopes to provide image understanding more broadly under the name of "GPT-Vision", which will open up many new image-based applications for GPT-4, such as generating text that matches pictures. At the same time, there are rumors that DALL-E 3 is also under development and may be integrated into ChatGPT or GPT-4. It and GPT-Vision may be announced at the OpenAI Developer Conference on November 6, as OpenAI CEO Sam Altman once said:
In general, although GPT-5 has not yet arrived, GPT-4 is going to focus on multimodality, and a new round of AI craze to refresh the view of science and technology may not be far away. 02#OpenAI and Google are competingIn reporting on OpenAI's new move, Chinese and foreign media had surprisingly consistent views, and basically believed that it was aimed at Google's Gemini. According to media reports on September 14, citing three people directly familiar with the matter, Google has provided an early version of Gemini to a small number of companies and sold it to enterprises through the company's cloud computing services, which means that Google is considering incorporating it into consumer services and the release of Gemini may be imminent. Gemini is known as Google's culmination of work. There have been news reports since April this year that the project's participants include big names such as the former DeepMind founder Demis Hassabis, and Google founder Segey Brin has also personally joined in the training of Gemini. At the end of last month, analysts Dylan Patel and Daniel Nishball of SemiAnalysis revealed more relevant information. Based on the existing information, we can have the following understanding of Gemini: 1. The first generation of Gemini should be trained on TPUv4, and a smaller number of chips are selected to ensure chip reliability and hot swapping. It has now begun training on TPUv5 Pod, which has a computing power 5 times greater than that used to train GPT-4. 2. Gemini’s training database is 9.36 billion minutes of video subtitles on Youtube, and the total data set size is about twice that of GPT-4. 3. Gemini consists of a group of large language models , which may use the MOE architecture and speculative sampling technology to generate tokens in advance through small models and transmit them to large models for evaluation, thereby improving the overall reasoning speed. 4. Gemini supports chatbots, summarizing text or generating original text (such as email drafts, lyrics or news articles) , generating original pictures, etc. 5. Gemini supports helping engineers write code. Google hopes it can improve developers' code generation capabilities to catch up with Microsoft's GitHub Copilot code assistant, which relies on OpenAI. 6. Google employees have also discussed using Gemini to implement functions such as chart analysis, such as asking models to explain the meaning of charts, and using text or voice commands to browse web pages or other software. 7. Gemini has different sizes of versions, and developers can purchase simplified versions to handle simple tasks. The version is small enough to run on personal terminals. It is worth noting that compared to GPT-4, Gemini has an advantage - in addition to public information on the Internet, it can also use Google to obtain a large amount of proprietary data from its consumer products. Therefore, some relevant people believe that:
Although Gemini has not yet officially launched, many people have expressed optimism about it. In the article mentioned above by Dylan Patel and Daniel Nishball, there are similar views:
We can see that every item of Gemini is compared with GPT-4, which is inevitable. After all, before ChatGPT came out, Google was the one holding the sword of AI. So the consensus among the public is...
Based on this, Google has to work harder to prove that it can still score points in AI. Google chose to steal the home and try to plant its flag on the high ground before OpenAI came up with a real multimodal model. Of course, OpenAI does not intend to let Google chase, which is why GPT-Vision and Gobi were born. This also points out that the focus of the next stage of AI competition is the multimodality that each company is rolling in. After all, text-based generative AI is no longer new, and no matter how smart it is, it can only be inferior to the glory of ChatGPT. However, today, the battlefield of AI is no longer a situation of two armies fighting each other. Google and OpenAI are just more prominent giants in the melee. Both of them, who also need to make profits, have added commercial parts to the large-scale model projects, such as policies for enterprises. However, Meta, a newcomer who has taken a different approach, has taken the open source route and has been continuously releasing new features, focusing on large quantities and free of charge. It’s hard to judge whether people will choose Meta because of the cost. It can be said that the current AI melee has reached a stalemate and white-hot stage. Who will rush out next? Let the bullets fly for a while. |
<<: AI is a standard feature of future SaaS, but it is not a panacea
>>: What is the second half of the new consumption sector “rolling” about?
The majority of Douyin's audience is female, b...
Exploring the deep value and practical path of use...
38 Festival is approaching. As the first major pro...
On the Xiaohongshu platform, where traffic competi...
This article mainly discusses the phenomenon and a...
When writing copy, do you not know who your target...
In this era of information explosion and content b...
For beginners who want to start a cross-border e-c...
Bawang Chaji, a brand that came from behind in the...
As one of the world's largest e-commerce platf...
The sales of products such as cosmetics are still ...
WeChat recently fully launched an innovative featu...
Jiang Xiaobai's copywriting was once widely ci...
Shein is a cross-border brand that has developed o...
To build a good brand, it is very important to kno...