
After Google’s Veo 3 and OpenAI’s Sora showed the world how AI can generate hyper realistic videos from text prompts, Chinese search giant Baidu recently launched its first video generation model, MuseSteamer.
This AI model is the first AI video generation tool that generates videos with synchronised Chinese audio.
MuseSteamer is a Vision Language Model (VLM), which is a type of AI model that comes with the combined capabilities of computer vision and natural language processing. VLMs allow machines to understand and process information through images and texts, and they also let them perform tasks that require the combined understanding of visual and text data.
The model allows users to generate visuals, sound effects, and spoken Chinese dialogue simultaneously. This will reportedly be beneficial for advertisers, marketers, and anyone who wants to make high-quality videos without spending millions in production costs or working through extended timelines. MuseSteamer is essentially a business-only AI tool which turns images into short videos. Baidu has also upgraded its search offerings by making them smarter, multimodal, and more personalised.
MuseSteamer is capable of creating 10-second clips in 1080p resolution with fully synced visuals, spoken dialogue and sound effects. Those who got to try Baidu’s MuseSteamer seem to be raving about the outputs of the model. Here are some stunning video samples shared by X users.
See What’s Next in Tech With the Fast Forward Newsletter
Tweets From @varindiamag
Nothing to see here - yet
When they Tweet, their Tweets will show up here.