中文
 

Follow us 

Hot Topic
Market News
Events & Promo
Career Tips
Education News
Health & Life
ShengShu Launches Vidu Q3 Reference-to-Video with Expanded Visual and Audio Capabilities
PRNewswire

ShengShu Launches Vidu Q3 Reference-to-Video with Expanded Visual and Audio Capabilities

Publish date: 13 Apr 2026

Follow us

Stay updated on the job market

Popular Articles

【網民熱話】錢大媽8萬高薪招聘「豬肉分割師」  網民: 仲高人工過3大畢業生
【網民熱話】錢大媽8萬高薪招聘「豬肉分割師」 網民: 仲高人工過3大畢業生
旅行唔止放鬆  更能提升工作效率
旅行唔止放鬆 更能提升工作效率
研究指工作電郵亂加Emoji反而令你失去專業感
研究指工作電郵亂加Emoji反而令你失去專業感
對工作愈不滿  40歲後健康出現警號
對工作愈不滿 40歲後健康出現警號
7種最難有拖拍職業
7種最難有拖拍職業

Alibaba Leads RMB 2 Billion Series B to Support General World Model Development

SINGAPORE, April 13, 2026 /PRNewswire/ -- ShengShu Technology, a global leader in multimodal generative AI, today announced the global launch of Vidu Q3 Reference-to-Video, a new model capability designed for story-driven video creation through flexible reference-based generation.

Built for storytelling, Vidu Q3 Reference-to-Video enables creators to generate high-quality videos by referencing and combining a wide range of inputs—including subjects, environments, costumes, props, and visual styles—within a single workflow, significantly improving creative control, consistency, and efficiency.

The release expands capabilities across visual effects, audio, and scene composition. It supports six types of cinematic visual effects, including particle systems, fluid simulation, dynamic motion, camera movement, transitions, and lighting, enabling more expressive visual outputs. In parallel, the model enhances audio generation with five categories of sound capabilities for more natural and expressive results, covering ambient sound, motion-driven audio, atmospheric layers, foley effects, and emotion-driven cues. Together, these improvements enable greater scene diversity and more immersive, production-ready video outputs.

Designed for use cases including short-form series, animation, film and television, as well as advertising and e-commerce, Vidu Q3 Reference-to-Video enables faster production of high-quality video content for both creators and enterprises. This performance is further reflected in third-party benchmarks, where Vidu Q3 ranked No.1 in the first global Reference-to-Video leaderboard released by SuperCLUE.

Built on the Vidu Q3 model foundation, Vidu has been fully integrated across its product ecosystem, including Vidu Agent, Vidu Claw, and the Vidu App. This creates a unified system supporting the full workflow from creative ideation and content production to deployment, enabling greater efficiency and consistency across diverse use cases.

The release reflects ShengShu's broader progress in advancing its world model capabilities across digital environments.

In parallel with the product release, ShengShu announced it has raised RMB 2 billion in its Series B financing round, led by Alibaba Cloud, with participation from Andon Haitang, China Internet Investment Fund, TAL Education Group, Luminous Ventures, and others. Existing investors, including LINK-X CAPITAL, Delta Capital, and Baidu Ventures, also increased their investment.

The funding will support ShengShu's broader vision of building a general world model that bridges the digital and physical worlds. The company is advancing both its World Generation Model (WGM), which powers digital content creation through the Vidu model family, and its World Action Model (WAM), designed for physical-world interaction. Together, these systems aim to enable unified modeling, prediction, and action across environments.

ShengShu is among the first globally to pursue a unified world model architecture that connects digital and physical domains. At the core of this system is its Foundation World Model, which underpins both WGM and WAM.

Within this framework, the Vidu model family focuses on content generation and interaction in digital environments. It supports synchronized audio-visual generation, extended video duration, strong temporal and spatial consistency, and cinematic-quality visuals. Its proprietary reference-based video generation capability addresses consistency challenges in multi-subject video production.

At launch, Vidu Q3 ranked No.1 globally on the benchmark published by Artificial Analysis. It supports up to 16 seconds of synchronized audio and video generation, multi-shot composition and camera control, background music and sound effect generation, and multilingual dialogue. The launch of Reference-to-Video further expands its capabilities across visual effects, audio, and application scenarios.

Vidu is available to global developers, creators, and enterprises through both MaaS (Vidu API platform) and SaaS offerings, and has been integrated into Alibaba Cloud Model Studio to support text-to-video, image-to-video, and reference-to-video generation across industries including internet, advertising, film and animation, education, and cultural tourism.

Dr. Zhu Jun, Founder of ShengShu Technology, said:
"At its core, a world model gives AI a unified way to represent and predict the real world.
Video plays a critical role in this, as it naturally captures time, space, motion, and causality.
By building a unified model architecture, we aim to connect perception and action—creating a complete loop from understanding the world, to generating it, to acting within it, and ultimately making the world model a true bridge between the digital and physical worlds."

Follow us

Stay updated on the job market

Popular Articles

【網民熱話】錢大媽8萬高薪招聘「豬肉分割師」  網民: 仲高人工過3大畢業生
【網民熱話】錢大媽8萬高薪招聘「豬肉分割師」 網民: 仲高人工過3大畢業生
旅行唔止放鬆  更能提升工作效率
旅行唔止放鬆 更能提升工作效率
研究指工作電郵亂加Emoji反而令你失去專業感
研究指工作電郵亂加Emoji反而令你失去專業感
對工作愈不滿  40歲後健康出現警號
對工作愈不滿 40歲後健康出現警號
7種最難有拖拍職業
7種最難有拖拍職業

Hottest Tags

#豬肉分割師
#錢大媽
#高薪
#技術工種
#3大畢業生
#Dream Job
#有薪年假
#壓力
#減壓
#Emoji
#工作能力
#職場電郵

Contact Us
Notice
Back to Top
We use cookies to enhance your experience on our website. Please read and confirm your agreement to our Privacy Policy and Terms and Conditions before continue to browse our website. Read and Agreed