Google first introduced a video-generating AI from commands called Veo with the ability to create 1080p videos longer than a minute, competing with OpenAI's Sora.
Veo was launched at the Google I/O event in the early morning of May 15 (Hanoi time). The product was introduced by Demis Hassabis, CEO of Google DeepMind, as being able to create "high quality" 1080p videos with many different visual and cinematic styles.
Veo was announced three months after Sora appeared and caused a stir in the community.
According to a Google representative, the AI is capable of understanding natural language and can "accurately capture the tone of a prompt," thereby creating videos that closely reflect the user's creative vision. The model also understands cinematic terms like "timelapse" video or "aerial landscape photography," and can create consistent and coherent footage, with human subjects, animals, and objects moving realistically throughout the shot.
Demonstration videos of Veo’s capabilities are around eight seconds long, but Google says users can request longer durations of up to 1 minute and 10 seconds, as well as tweak them with additional prompts to change the results. That’s up from the one-minute maximum previously announced by OpenAI Sora.
According to Google, Veo is built on five video generation models including Generative Query Network (GQN), DVD-GAN, Imagen-Video, Phenaki, WALT, VideoPoet and Lumiere, combined with many other techniques to improve output quality and resolution.
They've improved techniques for how models learn to understand content in videos, display high-resolution images, simulate the physics of our world, and more.
“These insights will fuel advances in our AI research and enable us to build even more useful products that help people interact and communicate in new ways,” Google said.
At the event, the US tech giant also introduced an image-generating AI called Imagen 3. The product is advertised as creating pictures with "incredible levels of detail", realistic, lifelike images and less distracting details in the photo than previous models.
Imagen 3 also better understands natural language and predicts the user's intent behind the prompt, and can create photos with different styles.
Like many other video and photo-generating AIs, Veo and Imagen 3 are not yet widely available. Google says the new product is available for a limited number of creators to try out, with interested users needing to join a waiting list. The company also plans to bring some of Veo’s features to YouTube Shorts and other products.
TH (according to VnExpress)