Google DeepMind Unveils Veo 2: A New Video-Generation AI to Rival OpenAI’s Sora

Google DeepMind, the AI research lab of Google, has announced Veo 2, its next-generation video-generation model and successor to its earlier version, Veo. Designed to outpace OpenAI’s Sora, Veo 2 boasts the ability to generate two-minute video clips in 4K resolution—four times the resolution and over six times the duration of what Sora can achieve.

While these specifications are impressive, they remain theoretical for now. Currently, Veo 2 is accessible only via Google’s experimental VideoFX tool, where its output is capped at 720p resolution and eight-second durations, compared to Sora’s 1080p and 20 seconds.

Expanded Access and Future Plans

VideoFX is available only to users on a waitlist, though Google plans to expand access this week. According to Eli Collins, VP of product at DeepMind, the company also intends to integrate Veo 2 into Vertex AI, Google’s developer platform, as the model becomes ready for broader use.

“Over the coming months, we’ll continue iterating based on user feedback and look for compelling use cases across Google’s ecosystem,” Collins told TechCrunch. Updates are expected in 2025.

Improved Capabilities

Like its predecessor, Veo 2 generates videos based on text prompts (e.g., “A car racing down a freeway”) or a combination of text and reference images. However, the new model introduces several enhancements:

  • Sharper Textures and Images: Clips are clearer and handle scenes with significant motion more effectively.
  • Advanced Camera Controls: Veo 2 enables precise positioning of the virtual “camera” and dynamic movement, allowing for diverse angles and perspectives.
  • Realistic Motion and Effects: The model simulates fluid dynamics (e.g., coffee pouring) and lighting effects (e.g., reflections and shadows) with improved realism. It also supports cinematic effects and nuanced human expressions.

DeepMind shared sample videos, which highlighted Veo 2’s strengths, including realistic liquids like syrup and Pixar-style animations. However, challenges persist, such as lifeless eyes in characters and inconsistencies in complex scenes, like pedestrians blending into backgrounds or physically implausible building facades.

Addressing Limitations

Collins acknowledged the areas needing improvement:

  • Coherence and Consistency: Adherence to complex prompts over extended durations remains a challenge.
  • Detail and Realism: Fast motions and intricate details require further refinement.
  • Character Consistency: Maintaining character traits across frames is still under development.

DeepMind is working with artists, including Donald Glover and The Weeknd, to refine the model and ensure it aligns with creative workflows.

Training and Ethical Concerns

Veo 2 was trained on a vast dataset of video-description pairs, though DeepMind has not disclosed specific sources. YouTube content, given Google’s ownership, is a likely contributor.

While Google provides tools for webmasters to block data scraping, DeepMind does not currently allow creators to remove their works from training datasets. The company maintains that training on public data constitutes fair use, a stance contested by some artists and filmmakers.

DeepMind has implemented safeguards, including filters for explicit or violent content and SynthID watermarking to prevent misuse, though no watermarking technology is foolproof.

Updates to Imagen

Alongside Veo 2, DeepMind announced enhancements to Imagen 3, its commercial image-generation model. The updated version, rolling out to users of Google’s ImageFX, creates more vivid, detailed images in styles like photorealism, impressionism, and anime.

A new chiplet UI feature in ImageFX will allow users to refine prompts with suggested descriptors, improving usability and creative control.

Looking Ahead

With Veo 2 and Imagen 3, DeepMind is strengthening its position in generative AI, competing with OpenAI and other rivals. As Veo 2 integrates into Google’s platforms and matures, it could play a central role in reshaping video generation and creative industries, though ethical and technical hurdles remain to be addressed.

Related Posts