China Unveils Vidu: A Powerful Text-to-Video Generator

By Chris McKay April 27, 2024 • 1 min read

China's Shengshu Technology and Tsinghua University have unveiled Vidu, a text-to-video model capable of generating 16-second clips at 1080p resolution with a single click. The announcement was made at the 2024 Zhongguancun Forum in Beijing, where they tried to position Vidu as a strong competitor to OpenAI's Sora.

Vidu is capable of producing 16-second clips at 1080p resolution—Sora by comparison can generate 60-second videos. Vidu is based on a Universal Vision Transformer (U-ViT) architecture, which the company says allows it to simulate the real physical world with multi-camera view generation. This architecture was reportedly developed by the Shengshu Technology team in September 2022 and as such would predate the diffusion transformer (DiT) architecture used by Sora.

According to the company, Vidu can generate videos with complex scenes adhering to real-world physics, such as realistic lighting and shadows, and detailed facial expressions. The model also demonstrates a rich imagination, creating non-existent, surreal content with depth and complexity. Vidu's multi-camera capabilities allows for the generation of dynamic shots, seamlessly transitioning between long shots, close-ups, and medium shots within a single scene.

The company, in its demo, attempted to recreate similar scenes that were previously shared by OpenAI during the release of Sora. And while Vidu is an impressive accomplishment and a testament to China's rapid progress in AI research, a side-by-side comparison with Sora reveals that the generated videos are not at Sora's level of realism. The output, while impressive, falls short in terms of visual fidelity. Don't take my word for it, here are some examples from Sora:

However, it is important to acknowledge that the temporal consistency achieved by Vidu is commendable, and this technology has the potential for further refinement and improvement over time.

Chris McKay is the founder and chief editor of Maginative. His thought leadership in AI literacy and strategic AI adoption has been recognized by top academic institutions, media, and global brands.