Tongyi Qianwen: Skipping the third subject, what does it have to do with AI e-commerce?

Tongyi Qianwen: Skipping the third subject, what does it have to do with AI e-commerce?

In 2023, AI big models have almost overturned our lives. Subject 3 can be jumped without moving, and a photo can generate a video. How does it combine with e-commerce? Let's take a look!

The guys from Haidilao sprained their ankles so many times because of the third subject.

However, no matter how much he planned, he did not expect that with the help of the "National Dance King" quietly launched by Tongyi Qianwen, everyone can pass the third subject with just a photo.

With the current "posture level" of large models, generating text from text and generating pictures from text has become "child's play" for it. No matter how fiercely everyone competes on the rankings, the real competition has actually reached the field of cross-modal generation - this is the case with the picture-generated video of "King of Dance" and the same is true for making famous foreigners such as Musk and Zuckerberg speak Chinese in videos.

Moreover, the Animate Anyone model that "King of Dance" relies on is no longer just a "toy". Combined with Alibaba Cloud's latest Outfit Anyone "one-click fitting" model, in the future, when we buy clothes on Taobao, we can upload a photo and see a dynamic video of the effect of wearing them.

Alibaba’s dream of “AI e-commerce” now has another piece of the puzzle.

1. It has been popular abroad for three months

The reason why Subject Three has become a "phenomenal" dance is not because of how popular it is on domestic short video platforms, because the social dance five or six years ago has already achieved this - but whether it can become a symbol of cultural output and thus gain a ticket to enter the mainstream.

Judging from the results alone, Subject 3 has long since stepped beyond the short video platform, has been incorporated into games, and has entered the international stage.

For example, at the World Sports Dance Competition on December 9 last year, world champion Christina and several dancers demonstrated the national standard version of "Subject Three"; in Tencent's "genuine family party game" "Mean Dream Star" which was launched just one month ago, Lao Hu also discovered the character-exclusive "Subject Three" facial expressions and movements.

To become the "all-round dancing king" today, you only need to upload a full-body photo in the Tongyi Qianwen app and wait for about 10 minutes to generate a 10-second short dance video.

Entering the template area of ​​"National Dance King", there are 12 popular dance modules to choose from, including DJ Slow Rock, Just Want to Say "I Love You" to You, Ghost Step Dance, Mongolian Dance, Subject Three, Paddling Step, etc. Although there are many options to choose from, Subject Three is undoubtedly the C position in terms of traffic and attention.

If we look back at the timeline, we can also find that the time when foreign short video experts began to "go viral" about Subject 3 was around October to November last year. At about the same time, the Alibaba Cloud team released a large model technical document related to "Almighty Dance King".

You should know that this dance is different from the general gesture dance, which requires a certain dance foundation and limb coordination. In other words, not everyone can easily master it abroad.

However, the paper released by Alibaba Cloud uses the "Animate Anyone" model to convert fixed character images into animated videos controlled by specific posture sequences. In other words, people who had no dance skills before can now use this model to start with just one picture, providing all dance novices with the possibility of "taking the stage" in the field of short videos.

Therefore, it is not surprising that this tweet introducing "Animate Anyone" has received more than 50 million hits in less than a month.

2. Let the picture jump to "Subject 3"

Before the emergence of "Animate Anyone", there were still many "hurdles" to overcome in synthesizing static images into dynamic videos.

The first is detail consistency. For example, in the AI-generated videos of pictures or texts currently on the market, there are often problems with partial deformation, blurred details, and uncontrollable frame rates in images other than the subject, which will affect the quality of the generated video.

Another example is motion control and continuity. If the video is to be used commercially, the actions of the characters in the video must be controllable. If it is generated by AI, the character actions are mainly controlled by pre-entered action sequences. However, the character actions in AI videos cannot be completely controlled accurately by prompts.

In the process of generating text from pictures, we also need to deal with the conversion process from image to video, and we need to ensure the spatial and temporal consistency of the image in this process.

Prior to this, although AIGC products such as Stable Diffusion and Midjourney have initially possessed cross-modal generation capabilities such as text-generated images, images-generated images, and images-generated videos, the aforementioned problems have not yet been solved in the field of AI-generated videos.

"Animate Anyone" has made improvements to the aforementioned related issues to a large extent. First, the team uses an auxiliary model called "ReferenceNet" to capture the spatial details of the reference image, which ensures the consistency of the appearance details of each frame of the character;

Secondly, the team used an efficient Pose Guider, which effectively controlled the character's movement posture. In the video, the character moves according to the set posture with stable transition to ensure continuity and smoothness.

The reason why the details of AI-generated videos were previously uncontrollable was largely because the time relationship between each frame was not tight enough, and many details could not be retained to the next frame. In the "Animate Anyone" model, a timing generation module is used to ensure the relationship between multiple video frames, and many high-resolution details are retained throughout the process.

Even after solving so many technical problems, the current generation effect of "Almighty Dance King" is still somewhat different from that of real people on camera. For example, the generated dance rhythm is even, but most real music is fast and slow. This inevitably reduces the generation effect of "Subject 3".

But compared with its predecessors, "Animate Anyone" has solved the most important key issues such as image consistency, posture stability, and multi-frame relationship control, and has achieved a breakthrough of 10-60+ points in the image-generated video track.

3. Another piece of the puzzle of AI e-commerce?

What does it mean when a picture video goes from 10 points to 60 points?

This means that it can not only fully preserve the face, body proportions, clothing details and background information of the person, but also accurately control the generated movements, and the length of the generated video is technically unlimited. Compared with text-generated video products such as Gen2 and Pika, AnimateAnyone focuses more on the person himself.

In other words, at least in the area of ​​image-generated videos, "Animate Anyone" has turned AI videos from "toys" into "preliminary commercial applications." Coupled with Alibaba's recently released "Outfit Anyone," users can try on top and bottom garments simply by using a flat-layout image of the garment.

If the two are combined, users can not only try on most of their favorite clothes by uploading a photo, but also directly preview the effect of wearing it through dynamic animation. Rather than saying that Taobao models are out of work because of AI, it is better to say that AI allows everyone to become their own model.

Of course, "Animate Anyone" can be applied in more places than just this. Many game developers can use this algorithm to convert static game character images into animated characters with different movements and postures, so as to achieve multiple reuse of the same material and increase the immersion and credibility of the corresponding characters.

What do people fear the most when buying things on Taobao? The low credibility caused by the difference between the "seller's show" and the "buyer's show". However, after your AI avatar becomes a Taobao model, this concern will be largely dispelled, and the most important trust cost in the transaction will also disappear.

I still remember that when Pinduoduo's market value just surpassed Alibaba, Mr. Ma Yun not only expressed his congratulations to Pinduoduo, but also specifically mentioned the concept of "AI e-commerce": "The era of AI e-commerce has just begun, and it is an opportunity and a challenge for everyone."

In the price and transaction aspects, it may be AI-driven real-time price comparison to help consumers buy their favorite products; in the service and experience aspects, it may be better digital after-sales and a more immersive shopping experience.

The "AI model" that can try on clothes for users is just a small step in the clothing category experience, but in terms of Alibaba's entire AI e-commerce strategy, it may be just one of the countless basic puzzle pieces, but it is also an extremely important piece.

Author: Lao Hu, Supervisor: Daman, Layout: Yuqi

Source public account: IQ Tax Research Center (ID: gh_c55b3561ece1), the world is full of tricks, I will step into the trap for you!

<<:  Harbin is now the city of wealth

>>:  DingTalk personal version is fully launched, and AI photo templates are launched in collaboration with "The Legend of Zhen Huan"

Recommend

How does Amazon enter a competitor's store? What are the benefits?

As the saying goes, knowing yourself and your enem...

How to create a Facebook store? How to apply?

In recent years, the demand for overseas e-commerc...

Community Operations | What is the #topic# in the content community?

In this article, the author will share with you so...

Have Yili and Mengniu lost their voice in the Spring Festival marketing war?

Many brands have begun to market their products du...

From content to consumption, the logic of IP co-branding has completely changed

In the current consumer market, IP co-branding has...

批「信息茧房」者,困在了「认知茧房」里

It is human nature to be partial to information. A...

Xiaomi SU7 is a hot seller, why do people think it is great value for money?

This article explores an important strategy in pro...