GPT-5 is coming? OpenAI's latest large model is exposed!

In recent times, Google's progress in large AI models has attracted a lot of attention.

But when everyone was rubbing their hands, trying to wait and see how Google would turn the tide, OpenAI, the overlord in the field of AI large models, once again announced major news.

According to foreign media The Information, OpenAI is about to launch a multimodal model GPT-Vision . The title of the article bluntly states that this is used to hit back at Google.

Although the new version has not actually come yet, it is enough for us to get a glimpse of the next stage of competition focus in this track - multimodality.

01# Where has “GPT-5” progressed to?

According to The Information, OpenAI is preparing to launch the image understanding function GPT-Vision based on GPT-4. This is equivalent to adding buffs to GPT-4 and improving it step by step. Anyway, GPT-4 is still recognized as the top AI.

In addition, the report also mentioned that OpenAI may launch a large model code-named "Gobi" after GPT-Vision. Unlike GPT-4, the so-called "more powerful" Gobi was built according to the multimodal model from the beginning.

The outside world has locked this new large model as a strong candidate for GPT-5, because most people do not believe the rumors previously refuted by Sam Altman, CEO and co-founder of OpenAI, at an MIT event:

We are not training GPT-5 now and will not be training it in the short term.

Sam Altman responds to GPT-5 rumors at MIT

After all, this statement was mainly used to respond to the open letter "Pause AI Giant Experiment". On March 29, thousands of people in the technology industry, including Tesla CEO Elon Musk, Apple co-founder Steve Wozniak, and Turing Award winner Yoshua Bengio, jointly called for a six-month suspension of the development of AI systems more powerful than GPT-4 to allow time to resolve AI safety and ethical issues.

Earlier this month, Mustafa Suleyman, co-founder of DeepMind and current CEO of Inflection AI, said in an interview that he believed OpenAI was secretly training GPT-5. Suleyman put the speculation of most people on the table, and the pressure was put on OpenAI again.

Screenshot from the interview program "CEO of Inflection AI Mustafa Suleyman on risks of artificial intelligence"

However, it may be too early to talk about GPT-5 now, because OpenAI has not responded to the relevant news so far. Except that the new large model code-named Gobi may be the rumored GPT-5, we don’t know anything else. Even according to foreign media reports, OpenAI seems to have not started training Gobi yet.

Relatively speaking, the situation of GPT-Vision is more traceable.

Many people are speculating that GPT-Vision is likely the multimodal feature that was demonstrated at the GPT-4 launch event in March. At that time, GPT-4 generated web page code based on a simple handwritten sketch, shocking the world.

Demonstration process at the GPT-4 launch conference in March

But after the momentary surprise, apart from providing it to Be My Eyes, a company that creates technology for the blind, there has been no further information on feature updates or actual use, including features such as Vincent images.

The reason can probably be inferred from a New York Times report in July that OpenAI is worried that the feature could be abused for facial recognition and other purposes. Combined with what Sam Altman mentioned in his previous rumor-busting article, "OpenAI is addressing various security issues based on GPT-4 that were ignored in the open letter." The relevant security concerns may have been resolved.

It also means that this kind of blocking is likely to be lifted.

According to The Information, OpenAI hopes to provide image understanding more broadly under the name of "GPT-Vision", which will open up many new image-based applications for GPT-4, such as generating text that matches pictures.

At the same time, there are rumors that DALL-E 3 is also under development and may be integrated into ChatGPT or GPT-4. It and GPT-Vision may be announced at the OpenAI Developer Conference on November 6, as OpenAI CEO Sam Altman once said:

There will be “great stuff”, though nothing as big as GPT-4.5 or GPT-5.

In general, although GPT-5 has not yet arrived, GPT-4 is going to focus on multimodality, and a new round of AI craze to refresh the view of science and technology may not be far away.

02#OpenAI and Google are competing

In reporting on OpenAI's new move, Chinese and foreign media had surprisingly consistent views, and basically believed that it was aimed at Google's Gemini.

According to media reports on September 14, citing three people directly familiar with the matter, Google has provided an early version of Gemini to a small number of companies and sold it to enterprises through the company's cloud computing services, which means that Google is considering incorporating it into consumer services and the release of Gemini may be imminent.

Gemini is known as Google's culmination of work. There have been news reports since April this year that the project's participants include big names such as the former DeepMind founder Demis Hassabis, and Google founder Segey Brin has also personally joined in the training of Gemini.

At the end of last month, analysts Dylan Patel and Daniel Nishball of SemiAnalysis revealed more relevant information.

Based on the existing information, we can have the following understanding of Gemini:

1. The first generation of Gemini should be trained on TPUv4, and a smaller number of chips are selected to ensure chip reliability and hot swapping. It has now begun training on TPUv5 Pod, which has a computing power 5 times greater than that used to train GPT-4.

2. Gemini’s training database is 9.36 billion minutes of video subtitles on Youtube, and the total data set size is about twice that of GPT-4.

3. Gemini consists of a group of large language models , which may use the MOE architecture and speculative sampling technology to generate tokens in advance through small models and transmit them to large models for evaluation, thereby improving the overall reasoning speed.

4. Gemini supports chatbots, summarizing text or generating original text (such as email drafts, lyrics or news articles) , generating original pictures, etc.

5. Gemini supports helping engineers write code. Google hopes it can improve developers' code generation capabilities to catch up with Microsoft's GitHub Copilot code assistant, which relies on OpenAI.

6. Google employees have also discussed using Gemini to implement functions such as chart analysis, such as asking models to explain the meaning of charts, and using text or voice commands to browse web pages or other software.

7. Gemini has different sizes of versions, and developers can purchase simplified versions to handle simple tasks. The version is small enough to run on personal terminals.

It is worth noting that compared to GPT-4, Gemini has an advantage - in addition to public information on the Internet, it can also use Google to obtain a large amount of proprietary data from its consumer products. Therefore, some relevant people believe that:

The model should be particularly accurate in understanding the user’s intent for a particular query, and it appears to produce fewer false answers (i.e., hallucinations).

Although Gemini has not yet officially launched, many people have expressed optimism about it. In the article mentioned above by Dylan Patel and Daniel Nishball, there are similar views:

The statement that may not be obvious is that the sleeping giant, Google has woken up, and they are iterating on a pace that will smash GPT-4 total pre-training FLOPS by 5x before the end of the year.

We can see that every item of Gemini is compared with GPT-4, which is inevitable. After all, before ChatGPT came out, Google was the one holding the sword of AI.

So the consensus among the public is...

The point here is Google had all the keys to the kingdom, but they fumbled the bag.

Based on this, Google has to work harder to prove that it can still score points in AI. Google chose to steal the home and try to plant its flag on the high ground before OpenAI came up with a real multimodal model. Of course, OpenAI does not intend to let Google chase, which is why GPT-Vision and Gobi were born.

This also points out that the focus of the next stage of AI competition is the multimodality that each company is rolling in. After all, text-based generative AI is no longer new, and no matter how smart it is, it can only be inferior to the glory of ChatGPT.

However, today, the battlefield of AI is no longer a situation of two armies fighting each other. Google and OpenAI are just more prominent giants in the melee.

Both of them, who also need to make profits, have added commercial parts to the large-scale model projects, such as policies for enterprises. However, Meta, a newcomer who has taken a different approach, has taken the open source route and has been continuously releasing new features, focusing on large quantities and free of charge.

It’s hard to judge whether people will choose Meta because of the cost.

It can be said that the current AI melee has reached a stalemate and white-hot stage. Who will rush out next? Let the bullets fly for a while.

<<: AI is a standard feature of future SaaS, but it is not a panacea

>>: What is the second half of the new consumption sector “rolling” about?

How much money do I need to register Shopee? How much does it cost in the early stage?

GPT-5 is coming? OpenAI's latest large model is exposed!

01# Where has “GPT-5” progressed to?

02#OpenAI and Google are competing

How much money do I need to register Shopee? How much does it cost in the early stage?

How come the outdated cultural and creative ice cream is still popular?

How do I fill in the shein address? How do I return a product?

Does Amazon Merchant Manager charge any fees? What are Amazon's fees?

How to reverse calculate Xiaohongshu’s traffic budget?

What is Coupang?

Dismantling 15+ brand 3.8 cases to unlock the secret of successful female marketing

Exclusive: The iOS version of Douyin Mall is launched. Will Kuaishou and Xiaohongshu also launch e-commerce apps?

How to find industry experts to communicate with without connections

Writing with ChatGPT, four levels of prompt words

Recommend

How can private domain operators efficiently formulate and optimize private domain operation SOPs?

A Complete Guide to Writing Data Analysis Reports

How does Jiaoxia expand its product categories? You can learn from Xiaohongshu’s successful experience

From "social currency" to "target of public criticism", whose heart has been hurt by Jasmine Yogurt?

Do I need an external network to run an independent website? How can I run an independent website well?

Eleven interesting sentences

Can I withdraw the money from my Amazon account after it is deactivated? How can I cancel my Amazon account?

How come a worker becomes famous for just cooking sloppy food?

Why do celebrities compete to be the top store explorers?

What to do if Amazon warehouse capacity is insufficient? How to deal with it?

Successful transformation! Domestic trade OEM has become the leader of Shopee

A Huawei meme is atypically popular

"Joy of Life 2" is back, let's review those high-energy lines together

What are the payment methods on cross-border platforms? Introduction to payment methods

Is it safe to buy things on Amazon? How to pay?