Can videos also be narrated? This is now being realized. After the release of OpenAI's large-scale human video model Sora, domestic companies rushed to join the market, and domestic large-scale human video models entered an accelerated stage. Over the past six months, AI-generated videos have been advancing in fits and starts. Vidu, which claims to be the first self-developed large video model in China, and the subsequent video generation models launched by ByteDance, Tencent and many other domestic manufacturers have attracted attention from the outside world from time to time. Recently, another domestic video large model joined the battle, and the official website of Kuaishou's "Keling" video generation large model was officially launched. On the 21st, Kuaishou Keling released a major update: the image-to-video function was officially opened, supporting the conversion of static images into 5-second videos. Users can control the movement of objects in the image through prompt text; at the same time, the video continuation function was launched, supporting one-click continuation and multiple consecutive continuations of generated videos, and can generate up to about 3 minutes of video. Compared with the large video models released by various companies before, which were mainly used to display videos, the Keling large model unveiled this time not only has the same effect as Sora, but has also been opened for invitation testing experience on Kuaishou's Kuaiying App. According to Kuaishou, the Keling large model was developed by the Kuaishou AI team. It uses a similar technical route to Sora and combines a number of self-developed technological innovations. The video resolution it generates reaches 1080p, the maximum length can reach 2 minutes (frame rate 30fps), and supports free aspect ratios. In addition, the official also claimed that the Keling large model can generate large-scale reasonable movements and make them conform to the objective laws of motion. In the official video example, an astronaut is running on the moon. As the camera slowly rises, the astronaut's gait and shadow are kept reasonable and appropriate. Almost at the same time, Meitu announced that it will launch a new product MOKI at the end of July. This product is based on the video generation capabilities of Meitu's large model and can help users generate AI short films. However, there is also a view that compared to the large language model that has emerged in droves, the large video model is slower to heat up and lacks the presence of giants. Why is this so? Aren’t the big companies interested? At the same time, in the last round of large language model competition, Kuaishou and Meitu had a low presence. And in the field of large video models, what are the biggest advantages of these two companies? In this regard, Beijing Business Daily reporter Wei Wei and Shu Le had a discussion. I think: Large companies that are still preparing for the "college entrance examination" will not directly attack the "postdoctoral" level. Making a video is not just a bunch of pictures making up a PPT. Big companies are not in a hurry to make efforts in this area, and it is not very practical. It is just a muscle show. After all, video generation isn’t just about stringing a bunch of AI drawings together into a cartoon. In addition to considering more details such as image consistency, compliance with description, light and shadow division, storyboard performance, etc., there is also the ability to understand and recreate the plot. All of these require in-depth learning in multiple vertical fields such as video structure, content analysis, shooting techniques, and narrative methods. Its difficulty is far from that of chatting, painting or specializing in chess, which can be accomplished by accumulating data and user error correction. Even masters in the field of film and television often make mistakes. It is conceivable how difficult it is to make a film with artificial intelligence, which is still in the "college entrance examination stage". But Kuaishou and Meitu need to show off their muscles, even if it’s just a show. Whether it is Kuaishou or Meitu, their biggest advantage in the field of large video models is that they have rich "learning materials" for deep learning of artificial intelligence. Relying on these "learning materials", certain copyright issues can be avoided. In addition, through years of content accumulation, vertical segmentation and labeling in the video field, the large model can better "retrieve" knowledge and also have a certain degree of video professionalism in algorithm design. But that’s all. Technically, we still lack the original accumulation of artificial intelligence algorithms. In addition, even if the video big model is mature, it is difficult to make a big breakthrough in the film and television industry. Whether it is a short drama, an advertisement, a long video or a movie, they will all have the "blockbuster special effects". But what ultimately attracts the audience is the content (from the screenplay to the camera movements and the actors' acting skills). These are the keys to large-scale commercial monetization. I believe that large video models may find it easier to find some business opportunities in the animation field. |
Exploring how the "Pre-Qin Lady's Steps&q...
The popularity of the movie "Investiture of t...
Starbucks, which rarely collaborates with others, ...
When Taobao is deeply trapped in inflated prices, ...
Nowadays, Internet traffic always appears in unexp...
China is about to usher in a new year - the Year o...
Internet people deal with data in many scenarios a...
There are actually quite a lot of merchants doing ...
"Group buying special offers, compete for a g...
With the continuous development of cross-border e-...
Lin Sheng, the founder of Zhong Xue Gao, started l...
Luckin Coffee and Kudi are engaged in a fierce pri...
Amazon’s global store opening information has been...
The form of micro-short dramas fits the trend of f...
An interpretation of new social trends in the firs...