Google’s Gemini Omni turns images, audio, and text into video — and that’s just the start

中文日本語 Español

TechCrunch May 19, 2026

Google introduced Gemini Omni, a multimodal model capable of generating high-quality video content from various text, image, and audio inputs.

Read Full Article

Summary

Google has unveiled Gemini Omni, a new family of multimodal AI models capable of reasoning across text, audio, images, and video to generate high-quality video output. By synthesizing these diverse inputs, the model aims to simulate reality with an understanding of physics and context. Currently available as Gemini Omni Flash, the tool allows consumers to create personalized content, such as digital avatars and creative videos, while incorporating SynthID watermarking for safety and accountability. Future iterations, including a more powerful Pro version, are expected to expand utility for professional filmmakers and advertisers.

(Source：TechCrunch)

中文日本語 Español

Read Full Article

TechCrunch Jul 4, 2026

New Google commercial imagines a Declaration of Independence written with help from AI

Yahoo News Jul 4, 2026

Meta Paid Hundreds of Contractors to Pretend to Be Teenagers While Barraging Its Competitors’ AI With Disturbing Content

TechCrunch Jul 4, 2026

Midjourney wants Hollywood studios to reveal the details of their AI usage

TechCrunch Jul 4, 2026

Alibaba reportedly bans employees from using Claude Code

TechCrunch Jul 4, 2026