2023 Revelation丨The Year of Virtual Humans

In 2023, the emergence of large models has given the industry new hope. This article will review the year of virtual humans in 2023 from the three levels of technology, products, and commercialization with many manufacturers. Let's take a look.

In 2023, the big model "revived" many industries. The most surprising thing was that it pulled virtual digital people (hereinafter referred to as virtual people) out of the tomb of the living dead.

Before the beginning of the new year, the Metaverse, which was once popular in 2022, has quickly cooled down. Virtual humans, as the widows of the Metaverse, cannot escape the fate of frost. Many virtual human start-ups have begun to enter a difficult stage of difficulty in financing and product launch; even large companies have begun to lay off platforms that were once invested heavily in and were specifically created for the production of virtual humans.

Just when everyone thought that virtual humans would experience a long winter, the big model came.

The arrival of big models first means that there are new concepts and stories outside the metaverse; but more importantly, the capabilities of big models have indeed had a profound impact on virtual human technology. Problems such as high costs, slow production cycles, and high barriers to entry that have not been solved in the metaverse era have also begun to be broken one by one by big models; at the same time, big models allow virtual humans to truly begin to be implemented on a large scale and integrated into the industry, which is a key step towards the maturity of the industrial chain.

But the year was still chaotic.

At the beginning of the year, the resurrected virtual people began to frantically look for landing scenarios; in the middle of the year, agents who smelled business opportunities brought thousands of virtual people into the live broadcast room and began to frantically reap profits under the guise of technology, which also caused corporate customers to stay away from virtual people for a while.

At the end of the year, when the bubble gradually disappeared and the technology gradually matured, the entire market began to return to calm, the industrial chain began to differentiate, the division of labor between the upstream, midstream and downstream gradually became clear, and all walks of life no longer pursued form but began to really think about what virtual people could bring?

There is no doubt that the most important scenario for virtual humans is not what we are experiencing now. It is the super entrance to the real world and the virtual world, and a real NPC in the game of life. However, it is still too early to achieve this goal, both in terms of technology and ecology.

At the end of the year, we will review the year of virtual humans in 2023 with multiple manufacturers from three levels: technology, products, and commercialization.

1. Large models make virtual people "alive"

Before the advent of large models, virtual humans have always been subject to the problem of high costs.

In 2022, virtual humans are usually customized one-to-one, and the prices range from tens of thousands to hundreds of thousands. But even so, the effects of virtual humans are not satisfactory.

The brand manager of a well-known wine company told "Self-Quadrant" that the company once tried to use virtual people in advertisements on large screens at airports, but even after finding a well-known domestic manufacturer and spending nearly 600,000 yuan, the effect was still "fake at first glance."

The direct reason for the high cost and poor implementation results is naturally the immaturity of the technology.

Before the big models, the production of virtual humans was mostly driven by real people, which required special actors to wear equipment and use motion capture technology to collect data for a long time to complete 3D modeling. This method has high labor costs and a long production cycle.

A virtual human practitioner told "Self-quadrant" that the data collection of a virtual human requires a dedicated production team for several months, and after that, professional technicians are needed to conduct special adjustments.

This naturally directly increases the production cost of virtual humans. A virtual human manufacturer told us: "When hiring actors for filming, we almost lose money on every sale."

In addition to real-life drivers, there are also virtual humans that rely on algorithms to drive them. However, this type of virtual human requires a large amount of data to train various driving models in the early stages, and the final effect is also affected by technologies such as speech synthesis, NLP technology, speech recognition, and CG rendering.

Although this type of virtual human already had relatively stable technology in certain specific directions before the big model, it still lacked a powerful "brain" to unify various modules and allow the virtual human to achieve the ideal effect.

In addition, before the big model, virtual humans driven by traditional algorithms usually relied on preset parameters and limited models for training, but after the big model, the generation capacity of the big model almost gave virtual humans unlimited training parameters.

▲Image source: AVIC Securities Research Institute

Nowadays, large models have penetrated into the entire industry chain of virtual human production as a kind of production capacity, which has directly promoted a significant reduction in the production cost of virtual humans, from hundreds of thousands of yuan to about a thousand yuan. The production cycle has also been shortened from several months to hours.

Silicon Intelligence, one of the earliest companies in China to develop AI digital humans, told ZiQuandian: "Since its launch, Silicon Intelligence's digital human image cloning products have been priced at 8,000 yuan. We are expanding our market share through standardized prices and services. Currently, prices on the market vary widely, ranging from a few hundred to a few thousand yuan. In the new year, we are considering lowering the 8,000 yuan threshold, further reducing costs, and adopting a new business model."

Reducing costs and increasing efficiency is the first level of change brought about by large models. Large model technology not only directly reduces the difficulty of making virtual people, but also makes virtual people look more like humans.

For example, the big model has changed the traditional method of virtual human 3D modeling that relied on CG technology. Instead, it uses video big model production tools and efficiently generates 3D models through algorithms. This makes the facial details of the virtual people more realistic, and the synchronization of facial expressions and lip shapes is also improved, making the virtual people look more natural when speaking.

In addition, the big model also improves the ability of virtual humans in terms of interaction, allowing virtual humans to go from single output to factual interaction with people.

According to IDC, the level of automation of virtual humans can be divided into L1-L5 stages. As shown in the figure below, we are currently in the middle stage of L3-L4. In the live broadcast room or the device where virtual humans interact, users can communicate and talk with virtual humans in real time through text to introduce products, solve problems for users, etc.

▲Source: Screenshot of IDC report

A 2D virtual human manufacturer told Zi Quadrant: "At present, relatively intelligent virtual humans can already achieve live interaction. The technical principle behind this is that the company has established a template library or knowledge base in advance. Once the corresponding keywords are triggered in the live broadcast, the virtual human can retrieve the content in real time to answer."

However, from the perspective of effectiveness, this technology is not yet mature enough. Some users have reported that "it takes ten minutes for the virtual person to reply to the questions asked in the live broadcast room, and I have no patience to wait in the live broadcast room."

Finally, AIGC’s production capacity also gave virtual humans a “soul”.

SenseTime Intelligence Research Institute has sorted out the three major characteristics of AI digital virtual humans, including multimodal interaction, deep learning capabilities, and AIGC productivity.

Compared with the early days of virtual human production, which relied heavily on manual labor, AIGC has greatly improved the production efficiency of AI virtual humans and lowered the production threshold; large models allow virtual humans to deeply learn more knowledge and skills, and recognize multimodal content including pictures, videos, and audio, providing a basis for the realization of natural interaction between virtual humans and real people.

▲The picture is original from Quadrant. Please indicate the source when reprinting.

To a large extent, the big model solves the problem of virtual humans' ability in natural language understanding and content output. For example, in a live broadcast scenario, virtual humans can rely on the generation capabilities of the big model to complete the script writing work in terms of oral content, scripts, and even screenplays. This greatly reduces the threshold of the industrial chain and makes virtual humans not just an image, but also a production tool.

At present, some virtual human manufacturers have begun to extend to the entire generation chain, such as "one-click generation of video copy", "one-click generation of explanation video materials", "multi-language rapid translation", etc., transforming from virtual human technology providers to more holistic solution providers.

Microsoft XiaoIce CEO Li Di even proposed that the future of virtual humans will be a hybrid model.

"Self-quadrant" believes that the next stage of virtual humans will enter the "virtual humans +" stage. Virtual humans + RPA will create digital employees within the enterprise; virtual humans + AI Agent will create companion robots on the C-end; virtual humans + AR/VR will create visible game NPCs in 3D space; virtual humans + embodied intelligence will give humanoid robots souls.

If multiple technologies are fully integrated, virtual talents can really become an "entry-level application". At that time, the competition will focus on a series of comprehensive capabilities such as the openness of virtual people, ecological construction, and scenario expansion.

2. Virtual humans, will they move towards 2D or 3D?

With the explosion of virtual humans driven by large models, the types of virtual humans on the market have gradually become richer. From 2D to 3D, from real people to algorithms, the variety is already dazzling. The essence behind this is actually the different classification standards of virtual humans in the market.

However, after sorting out some commonly used standards and classifications, "Zi Quadrant" found that virtual humans are actually classified more based on visual differences in product implementation, namely 2D virtual humans and 3D virtual humans. From the perspective of technology, industrial chain and application scenarios, 2D virtual humans and 3D virtual humans have already taken two distinct paths.

▲The picture is original from Quadrant. Please indicate the source when reprinting.

In essence, the ultimate goal of virtual humans is to make them as close to real people as possible, because only in this way can virtual humans bring users the same feelings and interactive experiences as real people. Based on this standard, 3D virtual humans are more in line with this goal and are also the direction of future applications.

But in comparison, 2D virtual humans have lower production costs, are easier to implement, and can be used on a large scale for commercial purposes in a short period of time.

▲The left picture is a 2D virtual human, and the right picture is a 3D virtual human

From a technical perspective, the technical architecture behind the two types of virtual humans is completely different.

2D virtual humans pay more attention to facial expressions, lip shape and tone of speech; 3D virtual humans pay more attention to overall coordination, body movements when speaking, and the geometric relationship between the virtual human and space, etc.

Specifically, 2D virtual humans can be quickly generated using image processors, while 3D virtual humans require modeling, animation, rendering and other steps; 2D virtual humans require a smaller amount of data and do not require high computing power, while 3D virtual humans are just the opposite; 2D virtual humans do not require high precision, but 3D virtual humans must achieve a highly realistic anthropomorphic effect.

The difference in technology means that the production cost of 3D virtual humans is much higher than that of 2D virtual humans.

The production of the popular 3D virtual human Liu Yexi cost as much as one million yuan. So in the last wave of the metaverse, 3D virtual humans gained popularity as the protagonists, but they were still a long way from being accepted by ordinary people. Therefore, 2D virtual humans seized the opportunity to fill the gap in this demand.

According to market research, companies such as Baidu, SenseTime, and Mofa Technology are currently involved in the field of 3D virtual humans; Tencent, JD.com, Kuaishou, Zego Technology, Wondershare, and Yilan Technology are updating their products and services in the direction of 2D virtual humans. In addition, companies represented by Sugar Planet, TrueVision, iResearch International, and Dimensity Technology are using virtual space as a starting point to improve supporting facilities for virtual humans.

Mofa Technology, a manufacturer deeply engaged in 3D virtual humans, told "Self-quadrant" that Mofa Technology's full-stack technology mainly includes four dimensions, namely AIGC 3D realistic image, AIGC 3D animation, AIGC sound and AIGC text. In Youyan products, it also involves technologies such as AIGC 3D camera movement and AIGC lighting, which is an extremely complex comprehensive consideration.

As for the difficulties of 3D virtual humans, Mofa admitted that the current generation technology is gradually being overcome, but high-quality 3D data is a very scarce resource in the industry, which is also one of the barriers to 3D virtual humans.

AIGC Wensheng videos need to rely on high-quality 3D training data to generate stable videos with geometric and spatial relationships. It is almost impossible for 2D virtual human manufacturers to build 3D data from 0 to 1.

Mofa Technology believes that: "3D virtual humans are a kind of character form carrier. To be able to use them, they need to be implemented in the form of products to solve practical problems of enterprises."

This means that virtual human manufacturers not only need to provide 3D virtual human products, but also need to integrate them with the company's business scenarios, give the company a certain degree of freedom, and create a 3D virtual human production platform.

Currently, Mofa Technology has built an end-to-end hyper-realistic 3D virtual human industrial production line and the "3D virtual human AIGC platform - Nebula Platform", providing enterprises with a series of generation tools. Enterprises can more flexibly adjust the details, structure and adaptive usage scenarios of virtual humans according to their own needs.

SenseTime has also built the Ruying virtual human production platform based on the daily new big model. Based on the 30,000 algorithm models accumulated in the fields of vision and speech, the virtual human can quickly identify and respond and establish a dialogue relationship with the user. Moreover, the virtual human will not "lose memory" afterwards and will continue to learn and iterate.

In comparison, the production process of 2D virtual humans is much simpler. A 2D virtual human manufacturer told us: "There are currently two ways to produce 2D virtual humans. One is to ask real people to record and then customize their images. The other is to extract images from videos provided by users and then use them in various scenes. The logic of 2D virtual human customization tools on the market is basically the same: input text materials and you can generate a simple virtual human image."

Large models significantly improve the efficiency of 2D virtual humans. AI can significantly reduce the manual processing time in the traditional video production process by 90%, and the model training time by 60%. It only takes 48 hours to complete the training of various customized digital humans. At present, the time for Wensheng AI digital human videos is about a few minutes. With the advancement of technology, the efficiency of Wensheng videos will continue to improve in the future.

The large model has reduced the cost of producing virtual humans and also reduced the price of virtual humans.

Because they are cheap, 2D virtual people have begun to flood into live broadcast rooms and short videos. Many big Vs have started to have digital avatars. Liu Run and Zhou Hongyi have successively unveiled their virtual avatars. They introduce products in the live broadcast rooms 24 hours a day, 7 days a year. Whether it is large enterprises or small and medium-sized businesses, the enthusiasm for virtual people is unprecedentedly high. This has also attracted a large number of mixed agents and shell manufacturers who have no real technology and are just cheating under the guise of technology.

From July to August this year, virtual people selling for "99 yuan", "299 yuan" and "499 yuan" began to appear on platforms such as Xiaohongshu, Xianyu, Taobao and Douyin. According to the investigation of "Self-quadrant", these sellers often attract users to place orders through various success stories, but after the order is placed, they do not provide after-sales service regardless of the use. Through one "trap" after another, the popularity of virtual people is inflated just for "selling" rather than "using". Buyers who want to save trouble thought they caught the vent, but ended up suffering a big loss; sellers who want to make quick money took the opportunity to enter and successfully harvested the first batch of leeks.

A brand merchant told Zi Quadrant: "When virtual humans were at their peak, we bought virtual human anchors. The first few live broadcasts were quite good, and we could barely break even in the first three months. But in July and August this year, the electricity bill was more than 10,000 yuan, plus the traffic for the virtual human anchors, we lost about 100,000 yuan."

The industry was a mixed bag, which discouraged many companies that originally wanted to try. However, by the end of the year, through the intensive issuance of regulatory policies and the tightening of the conditions for virtual human live broadcasts by the platform, the chaotic phenomenon was temporarily controlled, and the industry returned to a cooling-off period and began to really think about what virtual humans can do.

3. Popular virtual humans, commercialization in various forms

In fact, it was not until this year that digital talent really got on the commercialization track.

Sima Huapeng, founder and CEO of Silicon Intelligence, once mentioned in an interview that "in 2019, no one used digital humans even though they were free." This year, we have also felt some new changes. "Last year, we cloned more than 100,000 digital humans. Some customers were reluctant to publicly say that they were using AI to assist their work, but this year everyone is willing to say it."

"After the emergence of ChatGPT, the mentality of the entire industry has changed greatly."

In 2023, the business model of virtual people has differentiated into three relatively mature models:

The first type is the IP type, which is mainly in the metaverse period. The product positioning is multiple idols, entertainment, scientific research and education. From this, a series of roles such as virtual idols, digital astronauts, and brand spokespersons have been derived, which concretize the idol identity and create and operate through IP, such as Liu Yexi, Luo Tianyi, etc.
The second type is functional virtual humans, also called service virtual humans, represented by digital employees, virtual anchors, digital customer service, etc., which assist human work in the fields of finance, culture and tourism, retail, live broadcast, etc., reduce enterprise costs, and provide automated, standardized and intelligent services;
The third type is the virtual avatar (Avatar) that is being explored, also known as the virtual space identity agent, which creates specific game identities, virtual concert audiences, immortal images, etc. for players. It is mostly used in games, VR, and the metaverse. It is the interactive entrance between virtual space and real space. Players can not only have virtual images, but also promote the production of virtual content, which is also the ultimate state that virtual people want to achieve.

According to data from Sullivan, TouBao Research Institute and others, in 2023, many brands are exploring how to use virtual people to create greater value. Among them, virtual singers, celebrity clones, and virtual spokespersons for consumer brands have quickly become popular on the Internet.

Guo Degang speaks English, Taylor Swift speaks Sichuanese, AI "resurrected" Leslie Cheung and Anita Mui, and also opened online concerts for a number of singers. The spread of short videos has accelerated the entry of virtual people into the public's field of vision, and the enthusiasm of the C-end also reflects the needs of the B-end.

At the beginning of the year, Liu Run, a big V on Douyin, took the lead in making short videos about digital humans; at the end of the year, Yan Bojun, a science blogger with 12 million followers, began to publish short videos about digital humans produced by Silicon Intelligence on multiple social media platforms.

Yan Bojun also said in an interview: "When I first released content produced by AI, some viewers pointed out: 'Why don't you blink?' In fact, from actions, expressions, language to thoughts, AI is constantly learning and imitating every feature of me. This is a process of continuous evolution."

▲Image source: provided by the interviewed companies

It is understood that in August 2023, the company jointly established a joint venture company, Qianyu Intelligence, with the celebrity MCN company Qianxun, and released an AI digital human live broadcast solution to create digital avatars for Qianxun's anchors. In addition to the 8 hours of the anchor's live broadcast, the anchor will continue to broadcast on behalf of the anchor, thereby extending the live broadcast duration.

Mofa Technology, on the other hand, focuses more on the capabilities of virtual humans themselves. The marketing center of a certain medical institution requires thousands of topics, hot topics, and popular science videos every month for the operation of various video platforms and marketing campaigns. The current team of dozens of people can produce hundreds of videos per month.

By using the one-stop AIGC video creation platform of Mofa Youyan, you can generate 3D videos with one click from graphic content, eliminating many links such as video shooting and production. After using Youyan, the marketing center solved the problem of insufficient production capacity and independently completed the operation of the full-platform video matrix. It can also create a large amount of high-quality popular science content for customer acquisition, and realize the rapid and large-scale mass production of high-quality medical popular science videos. Not only has the team's production capacity been greatly improved, but the ROI of customer acquisition has also increased accordingly.

More importantly, in 2023, virtual humans will begin to enter thousands of industries, from the fields of film and television, entertainment, etc. to the deep waters of digitalization such as finance, culture and tourism, education, government and enterprises.

▲Image source: Tencent's "Digital Human Industry Development Trend Report"

To give a few examples, in the field of virtual humans + education, NetEase Youdao released an AI oral teacher that provides students with an open chat scene that is more in line with a real oral contact environment, and quickly generates a result report after the conversation; iFlytek released the iFlytek Spark Cognitive Model, which covers grading Chinese and English homework, simulating real-life conversations with oral teachers, etc.

In terms of virtual humans + government affairs, Xiamen, Shenzhen, Jiangxi and other places have introduced digital employees. Their work content includes interpreting policies in multiple languages, providing the public with digital government services that can be done while chatting, and completing business consultation, information push, service guidance and other government services through the intelligent push service entrance of virtual humans.

In 2024, some manufacturers are also gradually testing the waters of digital humans + cross-border e-commerce. Silicon Intelligence told us: "Since overseas multilingualism has a high threshold for short videos and live broadcasts, we have developed the Silicon Language Translation applet and the professional version of Anylang, which can solve this problem through real-time translation combined with digital humans, helping cross-border e-commerce companies to go overseas in one stop."

In general, after experiencing the turbulent period in 2023, virtual humans have reached a new node in terms of technology, products and industry structure. In 2024, with the maturity of multimodal large models, virtual humans may take another step forward. At the same time, their gradual penetration into thousands of industries will also open more doors to digitalization.

Author: Cheng Xin, Editor: Luo Ji

Source public account: Zixiangxian (ID: zixiangxian), between the squares, there is a quadrant. Care about science and technology, economy, humanities, and life.

<<: New "Spring Festival Gala Economics": No red envelopes, just want to sell goods?

>>: The official video account has released a guide to getting hot orders!

Who will get the chance if Douyin starts delivering food?

How to earn 30% more by "serving dishes to different people"? The logic behind the huge profits of user profile income stratification

As e-commerce competition becomes increasingly fie...

Video platforms compete with the Olympic economy, who can catch this "sky-breaking wealth"?

With the arrival of the 2024 Paris Olympics, the b...

Brother Xiao Yang, lost 3 million fans! The super anchor is retreating

As the live streaming e-commerce industry is about...

2023 Revelation丨The Year of Virtual Humans

1. Large models make virtual people "alive"

2. Virtual humans, will they move towards 2D or 3D?

3. Popular virtual humans, commercialization in various forms

Who will get the chance if Douyin starts delivering food?

Use growth to leapfrog the stage and the problem disappears

After working as a food delivery agent on JD.com for half a month, Meituan’s orders exploded

Is “Moutai+” Moutai’s panacea?

Can Amazon Prime be cancelled? What is the use of Amazon Prime?

Oracle closes its advertising business unit: a cross-border journey of ambition and frustration

What are the best sellers on Amazon Australia? How to choose?

TikTok holds controlling stake in Tokopedia: Indonesia restarts live streaming sales

Internet giants no longer have internet celebrity PR

Live e-commerce "kills" 618

Recommend

What are the steps to register as an Amazon China seller? What is the prospect of Amazon e-commerce?

The business logic behind Sam's "Poor Man's Meal"

How do you play the local life services that platforms must compete for?

How do buyers place orders on Amazon without a shopping cart? How can I get one?

How to place Amazon video ads? What are the strategies?

Can mainland China open a store on the faire platform? What procedures are required to open a store in Taiwan?

90% of brands’ “IP marketing” is a mess, but this brand is popular all over the world!

In order to find out the secrets in the phones of old babies, I studied 100 middle-aged and elderly TikTok accounts

Douyin draws the sword, Meituan strikes back

Xiaohongshu focuses on local life and launches multiple supporting tools

When everyone understands that "free is the most expensive", operations go crazy

Does Shopee make money as an e-commerce company? Shopee's strategy for operating without a source of goods

How to earn 30% more by "serving dishes to different people"? The logic behind the huge profits of user profile income stratification

Video platforms compete with the Olympic economy, who can catch this "sky-breaking wealth"?

Brother Xiao Yang, lost 3 million fans! The super anchor is retreating