How China is building a parallel generative AI universe


The gigantic technological leap that machine learning models have shown in the last few months is getting everyone excited about the future of AI — but also nervous about its uncomfortable consequences. After text-to-image tools from Stability AI and OpenAI became the talk of the town, ChatGPT’s ability to hold intelligent conversations is the new obsession in sectors across the board.

In China, where the tech community has always watched progress in the West closely, entrepreneurs, researchers, and investors are looking for ways to make their dent in the generative AI space. Tech firms are devising tools built on open source models to attract consumer and enterprise customers. Individuals are cashing in on AI-generated content. Regulators have responded quickly to define how text, image, and video synthesis should be used. Meanwhile, U.S. tech sanctions are raising concerns about China’s ability to keep up with AI advancement.

As generative AI takes the world by storm towards the end of 2022, let’s take a look at how this explosive technology is shaking out in China.

Chinese flavors

Thanks to viral art creation platforms like Stable Diffusion and DALL-E 2, generative AI is suddenly on everyone’s lips. Halfway across the world, Chinese tech giants have also captivated the public with their equivalent products, adding a twist to suit the country’s tastes and political climate.

Baidu, which made its name in search engines and has in recent years been stepping up its game in autonomous driving, operates ERNIE-ViLG, a 10-billion parameter model trained on a data set of 145 million Chinese image-text pairs. How does it fair against its American counterpart? Below are the results from the prompt “kids eating shumai in New York Chinatown” given to Stable Diffusion, versus the same prompt in Chinese (纽约唐人街小孩吃烧卖) for ERNIE-ViLG.

As someone who grew up eating dim sum in China and Chinatowns, I’d say the results are a tie. Neither got the right shumai, which, in the dim sum context, is a type of succulent, shrimp and pork dumpling in a half-open yellow wrapping. While Stable Diffusion nails the atmosphere of a Chinatown dim sum eatery, its shumai is off (but I see where the machine is going). And while ERNIE-ViLG does generate a type of shumai, it’s a variety more commonly seen in eastern China rather than the Cantonese version.

The quick test reflects the difficulty in capturing cultural nuances when the data sets used are inherently biased — assuming Stable Diffusion would have more data on the Chinese diaspora and ERNIE-ViLG probably is trained on a greater variety of shumai images that are rarer outside China.

Another Chinese tool that has made noise is Tencent’s Different Dimension Me, which can turn photos of people into anime characters. The AI generator exhibits its own bias. Intended for Chinese users, it took off unexpectedly in other anime-loving regions like South America. But users soon realized the platform failed to identify black and plus-size individuals, groups that are noticeably missing in Japanese anime, leading to offensive AI-generated results.

Aside from ERNIE-ViLG, another large-scale Chinese text-to-image model is Taiyi, a brainchild of IDEA, a research lab led by renowned computer scientist Harry Shum, who co-founded Microsoft’s largest research branch outside the U.S., Microsoft Research Asia. The open source AI model is trained on 20 million filtered Chinese image-text pairs and has one billion parameters.

Unlike Baidu and other profit-driven tech firms, IDEA is one of a handful of institutions backed by local governments in recent years to work on cutting-edge technologies. That means the center probably enjoys more research freedom without the pressure to drive commercial success. Based in the tech hub of Shenzhen and supported by one of China’s wealthiest cities, it’s an up-and-coming outfit worth watching.

Rules of AI

China’s generative AI tools aren’t just characterized by the domestic data they learn from; they are also shaped by local laws. As MIT Technology Review pointed out, Baidu’s text-to-image model filters out politically sensitive keywords. That’s expected, given censorship has long been a universal practice on the Chinese internet.

What’s more significant to the future of the fledgling field is the new set of regulatory measures targeting what the government dubs “deep synthesis tech”, which denotes “technology that uses deep learning, virtual reality, and other synthesis algorithms to generate text, images, audio, video, and virtual scenes.”As with other types of internet services in China, from games to social media, users are asked to verify their names before using generative AI apps. The fact that prompts can be traced to one’s real identity inevitably has a restrictive impact on user behavior.

But on the bright side, these rules could lead to more responsible use of generative AI, which is already being abused elsewhere to churn out NSFW and sexist content. The Chinese regulation, for example, explicitly bans people from generating and spreading AI-created fake news. How that will be implemented, though, lies with the service providers.

TECHCRUNCH