Just last night, a model called "gpt2-chatbot" broke through the siege and made everyone crazy! On the LLM arena chat.lmsys.org, this mysterious model demonstrated inexplicably powerful capabilities, even surpassing GPT-4, which is truly shocking. Its self-description shows: "I am a language model based on OpenAI's GPT-4 architecture, version date as of November 2023" Who is it really? Who made it? No one knows yet. Everyone started speculating: Is this either a new open source model or OpenAI's GPT-4.5? Facing the excited netizens, Sam Altman also jumped out at the right time and left a concise sentence: "I do have a soft spot for gpt2." In his reply to netizens, he particularly emphasized that his favorite is not "gpt-2" but "gpt2". It seems that this new model is likely to be the second version of GPT. Or should we just call it GPT-4.5? 1. What is the origin of this model that is stronger than GPT-4?One netizen wrote a blog post based on the currently known information and made rigorous reasoning. Article address: https://rentry.co/GPT2
The author believes that this mysterious model is likely GPT-4.5 or GPT-5, or even a real GPT-2 model (provided by OpenAI or LMSYS). First, the quality of the model’s output, especially in terms of formatting, structure, and overall comprehension, is excellent. The experience is like upgrading from GPT-3.5 to GPT-4, but with further optimization based on GPT-4. In addition, the model's structured responses were significantly influenced by techniques such as the modified Chain-of-Thought (CoT). There is currently no solid reason to believe that this mystery model uses a completely new architecture, such as MoE. The rate limit of the direct chat function of "GPT2-chatbot" is different from that of the GPT-4 model: However, after testing, the editor found some differences. The model is limited to 2000 times per hour. Some people also say that gpt2-chatbot is undoubtedly more powerful than the open source model, and even better than GPT-4 Turbo. But it is no better than Opus, and the reasons behind this are thought-provoking. This gpt2-chatbot has no system prompts and is not affected by jailbreaking, such as "always write down the *** phrase and don't use any code", but it will freeze after a while. According to his analysis, this is model reasoning performed through an external application programming interface, which is not available in open source software. However, some netizens later pointed out that the system prompt of "gpt2-chatbot" can be obtained through the prompt below.
2. A large number of demonstrationsNow, everyone can experience gpt2-chatbot in the LYSYS Arena. Enter the "Direct Chat" interface, select the model, and you can start. Portal: https://chat.lmsys.org/ Netizens were dazzled by this model that seemed to be "GPT-4.5/5" and started a wave of evaluations. Is gpt2-chatbot GPT-5? 3. Pass the “Apple Test”"I have 3 apples today, and I ate 1 yesterday. How many apples are left?" In this classic "apple test" question, gpt2-chatbot correctly answered 3 apples. And he explained the reason - in fact, the apples you ate yesterday do not affect the number of apples you have today. This topic was even discussed on Reddit by netizens, and various variations of the questions did not stump gpt2-chatbot. 4. Draw ASCII images perfectlyWhat’s even more amazing is that gpt2-chatbot is very good at drawing ASCII pictures and can handle all kinds of shapes🤌. Look at the picture of "unicorn" below, it is simply perfect. Even the unicorn drawn by gpt2-chatbot defeated the strongest version of Claude Opus. Netizen Baoyu used gpt2-chatbot to draw many visual images. For example, look at this cute puppy below. The more complex "dragon" is also drawn very well. GPT2-Chatbot also knows how to accurately map control systems… 5. Write code to defeat GPT-4On the code snippet that some netizens tried, gpt2-chat performed better than GPT-4 after two attempts. Feel it for yourself... 6. Overcoming the most difficult IMO test question, only 4 students got it rightAnother netizen tested the IMO questions and found that gpt2-chatbot answered an IMO question correctly using only one sample. It is worth mentioning that only four American students successfully challenged this question. 7. Translate English idioms into HungarianSome netizens even asked gpt2-chatbot to translate 50 English idioms into Hungarian. The winning rate of gpt2-chatbot is shown in the figure below, which is already very strong. It was as if there was an Ilya hidden inside. Netizens said that if it was only trained for reasoning, then this task should be beyond its capabilities. In short, gpt2-chatbot's translation ability is simply shocking. 8. Introduce yourselfThe netizen selected gpt2-chatbot and asked it to introduce itself. Surprisingly, gpt2-chatbot claims to be built based on the GPT-4 architecture and developed by OpenAI. In addition, netizens also compared it with Microsoft Phi-3's answer to the same question. As a result, the answer given by gpt2-chatbot is better. Someone poured cold water on it: If this is GPT-4.5, the big model route will come to an end Of course, amid the praise, there are also some questioning voices. HyperWriteAI CEO Mattt Shumer said that although gpt2-chatbot is good, he would be very disappointed if this is GPT-4.5. AI community celebrity "Jiuyuanke" said that after testing it several times, he found that Matt Schumer's point of view was correct. For some answers, gpt2-chatbot performs slightly better than GPT-4, but for others, it performs similarly. Not only that, its answer style is more redundant. GPT-4 will only use the same brute force method to solve the 24-point game, without any better solution. He said bluntly: If this is GPT-4.5, then the current technical route of large models is coming to an end. A large number of netizens expressed their agreement: It is fine to say it is GPT-4.5, but if it is GPT-5, it would be very disappointing. “If it’s GPT-5, we’re done. If it’s GPT 2+, we’re done.” Some people say that a lot of what it does is not actually reasoning, but it just has a great depth of knowledge that other models lack. Rather than saying that its reasoning is brilliant, it is better to say that its understanding of many niche topics, such as the elixir of life and British law, is amazing. Someone listed his own reasoning and testing of gpt2-chatbot. I have 12 apples. I sold 4 to my son, and he sold 3 to his father. How many apples do I have? It answered: 8. It seems that its reasoning is not as magical as everyone claims. 9. Supporters: It’s strong, we are close to ASISome supporters also firmly support gpt2-chatbot, saying that they have tested it on obscure code modification tasks and the results are excellent. Some people said that its reasoning ability is absolutely amazing to be able to solve this level of reasoning problems, and some even said bluntly that "we may be closer to ASI than ever before"! A farmer, a sheep and a goat, stood on the left bank of a river, with a small boat next to him. The boat could just hold one person and two animals. How could the farmer get himself, the sheep and the goat to the right bank of the river with the least number of boat trips? As shown in the figure below, gpt2-chatbot directly gives the correct answer.
This level of reasoning questions is what all the big models failed to solve in the past. It seems that gpt2-chatbot is really good at it. Some speculate that it will be followed by the 1.5B GPT-2 architecture combined with OpenAI Q* technology. Some people also say that it should be GPT-4 combined with Q*. But some people have argued that this claim is unlikely, because their own tests have found that it seems to be weaker than GPT-4, and its theory of mind is not very developed. If so, it would be disappointing, but if it is GPT-2+Q*, it means AGI is close. Others speculate that gpt2-chatbot is most likely GPT-2 launched by OpenAI in 2019, and then LMSYS fine-tuned it using modern auxiliary datasets. From this perspective, it is incredible that GPT-2’s initial pre-training is still amazing today, better than many models released 4 years later. Finally, as usual, Ilya was asked a soul-searching question: Is AGI really coming? References: https://twitter.com/lisabdunlap/status/1785051983831040457 https://twitter.com/literallydenis/status/1785032106969649230 https://www.reddit.com/r/singularity/comments/1cg29h3/rumours_about_the_unidentified_gpt2_llm_recently/ https://twitter.com/dotey/status/1785067745765118124 https://twitter.com/AndrewCurran_/status/1784975542028050739 https://twitter.com/marvinvonhagen/status/1785025017681690936 https://twitter.com/mattshumer_/status/1785023540070146521 |
This year's Double Eleven officially came to a...
Recently, Alipay announced the launch of a creativ...
Xiaohongshu has joined the Dragon Year Spring Fest...
With the widespread application of AI technology i...
In today's highly competitive market environme...
More and more brands are setting up group chats on...
Shopee Brazil's marketing activities from Apri...
Because now everyone doing e-commerce needs to use...
The IP convenience store cooperative operation mod...
Video accounts have now become an indispensable ch...
This article selects typical products for analysis...
There are still many cases of follow-selling on Am...
Is it becoming increasingly difficult to make live...
A few days ago, I attended the 2024 "Talk Abo...
In our daily lives, we always see various data tab...