Gemini Omni Flash
One Model. Video In, Video Out, Any Direction.
Gemini's multimodal reasoning applied to video — generate from text, animate an image, blend reference images, or just describe the edit you want.
Omni Flash treats video as a conversation, not a one-shot render. It carries Gemini's real-world knowledge — history, biology, narrative logic — into every frame, and lets you refine a clip in plain language instead of re-prompting from scratch. Text, a single photo, or up to ten reference images: the same model handles all of it.
Built for
Four Ways to Create
One model, four entry points — start from nothing, from one image, from several, or from a finished clip.
Generate from a Prompt
Describe the shot, the pacing, even the soundtrack — Omni Flash controls audio directly through the prompt, so "calm background music" or "no dialogue" just works.
Best for: concepting, storyboards, quick social clips
Animate a Single Image
Bring one reference image to life with a text prompt. The model decides how to move the subject based on what you describe.
Best for: product shots, portraits, static-to-motion conversions
Combine Multiple References
Feed in up to ten reference images and bind them to roles inline with tags like '<IMAGE_REF_0>' and '<IMAGE_REF_1>' for multi-subject, style-consistent scenes.
Best for: brand consistency, multi-product scenes, character casts
Edit with Plain Instructions
Upload an existing clip and describe the change — "make this video anime, keep everything else the same." No timeline, no manual masking.
Best for: style transforms, quick fixes, iterating on a rough cut
The Essentials
Simple, predictable specs across every operation.
3–10 second clips per generation, with longer durations on Google's roadmap.
16:9 landscape and 9:16 portrait, ready for both widescreen and social formats.
Up to 10 images per generation, addressable inline in your prompt.
One flat rate — 5 credits per second — across text, image, reference, and edit.
Built for Fast-Moving Teams
Omni Flash trades render complexity for speed and a natural-language workflow.
E-Commerce & Product
Turn a single product photo into a moving showcase, or blend multiple product shots into one consistent scene.
Social & Short-Form Content
Go from prompt to a vertical, sound-aware clip in one pass — no separate audio generation step.
Rapid Video Editing
Skip the timeline. Describe a style change or fix and let conversational editing apply it directly.
Storytelling & Concepting
Use Gemini's real-world knowledge to keep narrative logic, settings, and details coherent across a scene.
Ready to Create with Gemini Omni Flash?
Start generating and editing video today with free credits.
Get Started NowFree credits to try us
Frequently Asked Questions
What is Gemini Omni Flash?
Gemini Omni Flash is Google's cost-efficient video model, combining Gemini's multimodal reasoning with video generation and editing. It's available on faktry for text-to-video, image-to-video, reference-to-video, and instruction-based video editing.
How does the conversational video editing work?
Upload a source video and describe the change you want in plain language — for example "make this video anime, keep everything else the same." The model applies the edit directly, without manual masking or a timeline. Voice editing and scene extension aren't supported yet.
What's the maximum video duration?
Text-to-video, image-to-video, and reference-to-video generations currently support 3–10 second clips. Google has said longer durations are on their roadmap.
How many reference images can I use?
The reference-to-video operation accepts up to 10 images, which you can bind to specific roles in your prompt using tags like '<IMAGE_REF_0>' and '<IMAGE_REF_1>' for multi-subject consistency.
Is generated video watermarked?
Yes. Video generated with Gemini Omni Flash carries Google's SynthID watermark, and safety filters are applied to both inputs and outputs.