Is Sora an opportunity or a challenge for domestic manufacturers?

Is Sora an opportunity or a challenge for domestic manufacturers?

At the beginning of this year, OpenAI generated a new AI technology - Sora in the field of literary video. Is this an opportunity or a challenge for our domestic big companies? Let's take a look at the author's analysis~

Putting aside the technical route, in terms of the implementation effect alone, do domestic large-scale model companies have the same "opportunity to take off" in terms of video generation?

In the first month of the Year of the Dragon, just like last year’s ChatGPT, OpenAI launched another blockbuster at the beginning of the year - Sora in the field of literary videos.

Faced with such AI generation capabilities, practitioners of almost all types have felt a great shock. A film producer with an IT background told Lujiu Business Review that Sora's stunning performance has made practitioners around him feel a great sense of crisis. The sharp decline in film production costs and the emergence of new filmmakers will be easier than ever.

However, when faced with questions raised by Lujiu Business Review, such as "Does Sora have the conditions for commercialization?" and "Does Vincent Video have higher requirements for computing power, and how to solve it?", the producer responded with "development problems, development solutions."

This is obviously too optimistic. After all, more practitioners believe that even Sora has many immature aspects from the concept to the mature industrial commercial stage.

Therefore, regardless of the technical route, in terms of the realization effect, do the domestic large model manufacturers that have relevant layouts in general models such as Vincent Video have the same "opportunity to rise"? What substantial leaps has Vincent Video made compared to Vincent Video in the past? This is a very interesting topic.

1. Sora, a revolution or a bubble?

It must be admitted that the emergence of Sora has brought the realization of general artificial intelligence (AGI) one step closer. The reason is that it has already simulated the movement of the real physical world, such as the movement and interaction of objects. However, this level of improvement alone is not "amazing". According to the official report of OpenAI, Sora's "revolutionary" is mainly reflected in the following points.

First of all, the duration. As a general text-generated video model, it can generate a 60-second video based on the text description provided by the user. It is not only of high quality, but also can more completely and accurately restore the prompt input by the user.

Secondly, there are breakthroughs in scene complexity and character generation. So far, Sora has been able to generate scenes with multiple characters, specific types of movements, precise themes, and complex background details. The lens language has also become complex, which makes the video itself begin to have a certain narrative function, which is exactly what is needed in the current short video field.

Again, in addition to generating videos from text, Sora can also animate images from static images, or generate new videos from existing videos, to fill in missing frames or extend video content.

A senior tech media person told Lujiu Business Review that the emergence of AI products such as Sora is an opportunity for "equal thinking", because some tech journalists who have been following the industry for a long time often have some "brain-opening" ideas, but lack the right tools to implement them. But with AI tools such as GPT and Sora, once journalists see opportunities and ideas, AI may help them realize products, and the rest is to verify the feasibility of the product.

However, after communicating with many industry insiders, Lujiu Business Review found that even Sora, which is currently in its heyday, has the possibility of being overvalued.

Li Mingshun, chairman of Xingxing AI, is more rational about this. In his opinion, the emergence of Sora is largely a general model of literature and literature, which is extended to the staged technology iteration in the video field. Sora's current qualitative change is largely due to the unlimited investment in computing power and funds, coupled with the continuous repetitive training of massive training sets. This is the result of "great effort brings miracles".

Compared with its superiority in technology, Sora's superiority in "resource endowment" has obviously opened up a greater gap with a number of domestic "computing power shortage" manufacturers. This is a gap that domestic large-scale model-related manufacturers have been unable to overcome for quite a long time.

From an investment perspective, "universal models" in vertical fields such as Sora are not popular targets.

A primary market practitioner told Lujiu Business Review that pure primary market investment usually only invests in big concepts and high-valuation targets. The main reason is that the fund life of the primary market is 7 years, the investment period is 2 years, and the exit in 5 years is a high probability event. However, no one can make a conclusion on whether the vertical model of Wensheng Video can be industrialized and commercialized within 5 years.

In addition, the only known information about Sora at present is the technical report released on February 15, but three days later, news of financing came out. Without being open to the public and the outside world not knowing its actual level, OpenAI's valuation has approached 80 billion US dollars in the financing led by venture capital firm Thrive Capital. This primary market practitioner told Lujiu Business Review that this technology release is likely part of OpenAI's "valuation management."

Zhou Yahui, chairman of Kunlun Wanwei, said in his WeChat Moments, “Scientists and engineers here in (Silicon Valley) don’t recognize the value of startup stocks other than OpenAI, and think they are all paper wealth. They would rather take an offer of 1 million packages (half of the stock) from OpenAI, Google, FB, or Microsoft than an offer of 3 million (80% of the stock) from a startup.”

It can be seen that after Sora, OpenAI has further widened the gap with other major AI companies.

2. Domestic large models, dangers and opportunities for manufacturers

Although Meta, Google, and Microsoft are all ready to move, compared with the capital market's madness for Sora, domestic large model manufacturers appear much calmer. Most of the domestic large companies still choose to develop large models based on their own applications, rather than pursuing the so-called native AI large model upgrades. ByteDance is one of them, and its conservative attitude towards generative AI has been reflected as early as the Wenshengwen stage. And judging from the time of entry, ByteDance is not late. According to a report by LatePost, after OpenAI released GPT-3 in June 2020, ByteDance trained a large generative language model with billions of parameters.

If the development proceeds according to plan, by 2023, ByteDance will not be far behind OpenAI's GPT. However, in a business system that prioritizes ROI, ByteDance has obviously not calculated the investment. Therefore, its exploration of generative AI is always slower than its competitors.

In terms of release time, Baidu Wenxin Yiyan was released in March 2023 and was upgraded to version 4.0 in October of the same year. Following closely behind were Alibaba's Tongyi Qianwen and Tencent Hunyuan Assistant, and ByteDance released the Yunlark model in August 2023.

One of the results of being a latecomer is the lack of users. Wenxin Yiyan's monthly active users exceeded 100 million last year, while ByteDance's Doubao still has less than 10 million. However, after ByteDance appointed Zhang Nan to lead Jianying, it is expected to make faster progress in generative AI.

If ByteDance has not seen any products that can be used immediately in the field of Wensheng video, Baidu and Alibaba are not the same. As early as last year's Baidu World Conference, Baidu has demonstrated the Wensheng video capabilities of Wenxin Yiyan, which is mainly integrated into the "Yijing Liuying" plug-in.

Of course, the generated video that appeared at the World Conference was just a successful case of Yijing Liuying's countless card draws. After testing, Lujiu Business Review found that Yijing Liuying still has some limitations.

One is the material library. Currently, Yijingliuying uses a non-copyrighted material library, which makes it impossible to use it in the industrial commercialization of specific brands.

Secondly, due to possible concerns about portrait rights, it is currently not possible to generate videos with portraits, but it can be used to generate product videos without trademarks.

The third is that the videos currently generated are all about 30 seconds long. If you want to achieve a similar effect to Sora, you need to splice two video clips. If you want to keep the content and style consistent, it will obviously become a difficult task.

The most popular technology currently used by Tongyi Qianwen is the image-generated video technology represented by the National Dance King. With just a full-body photo, it can perform various popular dance moves. On Bilibili, the total number of video views of secondary creation videos of historical figures such as Empress Dowager Cixi dancing to Subject 3 is about 10 million.

Although it has not yet reached the level of industrialization and has not closed the gap with foreign Sora, foreign Sora has also not achieved industrialization, which means that at least in the commercialization process, there is still not much distance between the two. The rest is just to keep catching up.

Li Mingshun, chairman of Xingxing AI, holds a similar view. He told Lujiu Business Review that OpenAI still occupies the top position in the industry, but this is largely based on the previous computing power reserves and technology accumulation. Domestic general-purpose large model manufacturers such as BAT and ByteDance will also continue to catch up. The reason is simple. To some extent, general-purpose large models have become a symbol of the basic capabilities of Internet companies.

The competition seems to have just begun.

3. Vincent video, where is the real winning factor?

Of course, whether it is OpenAI's Sora or a number of large model manufacturers in China, their ultimate goal is still to industrialize and streamline the production of high-quality video content. But at present, even as powerful as Sora, there are many immature factors that prevent it from being applied to the industrial field. The product architect of Zhixingyuan (www.creatlyai.cn), an AI dynamic video solution product, told Lujiu Business Review that although Sora looks very convenient at present, it can directly generate high-quality videos through text, and only needs to be controlled by a few prompt words, and the mental and operational burden on users seems to be very small.

However, due to Sora's limited understanding of the real physical world, problems still occur in some scenes, such as the disorder of candlelight direction, the disorder of precise quantity, and the distortion of objects in and out of space. These details are difficult to change in post-editing.

This is not without a solution. Because Sora currently has video extension and video splicing functions, users can generate several seconds of video for post-production editing. If you don't have enough knowledge of prompt word engineering, multiple generation + manual post-production is inevitable.

In addition, in industrialized product promotional videos, customers usually release some new products, such as new down jackets, new cars, new mobile phones, etc. However, the user's materials do not exist in the training set of the video model, resulting in the only way to generate similar products and then reprocess them, that is, film and television post-production.

There are also differences in the needs of professional users and non-professional users. For example, for general light users, if there is no commercial demand, then the model is a trial product, and any newly generated work is a surprise to them. But for professional users (such as directors), if the first generation is not satisfactory, it will involve multiple generations and multiple post-production, which is a considerable burden on computing power and manpower.

The aforementioned film producer told Lujiu Business Review that in the film and television production process, the biggest cost in the post-production stage is the labor cost for editing and special effects, that is, the secondary processing. If the workflow is not advanced enough, it is likely to increase costs in the post-production process, thus affecting the ROI of the project.

If the current Wensheng videos still require a lot of manual adjustment, and the lens and the restoration of the physical world cannot be achieved 1:1, then the cost-effectiveness of using AI to generate video materials is actually not high.

Based on this, a film and television post-production practitioner told Lujiu Business Review that in his opinion, AI can directly replace mid-term work such as construction and shooting. Because AI's simulation and restoration of the physical world can be close to the real level through continuous training.

The above are just some of the changes Sora has deduced for the film and television industry. As for sub-sectors such as games, advertising, and short video creation, the changes are definitely far greater than the problems. The application of AI will definitely bring about revolutionary changes. Domestic large companies are obviously more willing to make efforts and try in the commercial exploration of AI applications.

Similarly, according to Zhou Yahui’s circle of friends, “Open AI will soon release GPT4.5, and it is estimated that it will deliberately choose to release it when Anthropic releases Claude 3.” In addition to Sora’s generative video, what other amazing innovations does the latest iteration of Open AI have? This should be the most concerned issue for domestic large companies engaged in large-model strategy and business departments.

Finally, for Vincent Video, it is a matter of whether to base on +AI to develop large-scale model applications, or to base on AI+ to train and upgrade its own original large-scale models. Obviously, large American and Chinese companies have made their own choices.

Author: Hu Jiaming WeChat public account: Lujiu Business Review

<<:  New players, new changes, what will local life be about in 2024?

>>:  E-commerce bifurcation: Taobao, Jingdong and Douyin compete on price, while video accounts compete on brands

Recommend

How to start Amazon Logistics Remote Delivery Program? What is the method?

Merchants who open stores on Amazon need to select...

What are Amazon's main product strategies? How to promote them?

Everyone should know about the Amazon platform. No...

Which brands will still be growing in 2024? We listed 4

This article deeply analyzes four brands that will...

How to maintain a Facebook Mall account? How is Facebook Mall?

Facebook Mall is an e-commerce function of Faceboo...

Does Amazon charge for product removal? How is it charged?

Merchants who operate stores on Amazon can remove ...

Marketing strategies of beverage brands

What do you think of when you mention Nongfu Sprin...

Who is responsible for the Manner coffee incident?

There is a lot of discussion about the Manner Coff...

How can you make money by starting a business in 2024?

The environment changes every year, and the corres...

Which trend should brand marketing “follow”?

This article deeply explores the new trends and ch...