Google

Gemini Omni Flash

One Model. Video In, Video Out, Any Direction.

Gemini's multimodal reasoning applied to video — generate from text, animate an image, blend reference images, or just describe the edit you want.

Try Gemini Omni Flash Now See what it does

Omni Flash treats video as a conversation, not a one-shot render. It carries Gemini's real-world knowledge — history, biology, narrative logic — into every frame, and lets you refine a clip in plain language instead of re-prompting from scratch. Text, a single photo, or up to ten reference images: the same model handles all of it.

Built for

Text to VideoImage to VideoReference to VideoVideo EditingConversational RefinementIn-Prompt Audio Control

Four Ways to Create

One model, four entry points — start from nothing, from one image, from several, or from a finished clip.

Generate from a Prompt

Describe the shot, the pacing, even the soundtrack — Omni Flash controls audio directly through the prompt, so "calm background music" or "no dialogue" just works.

Best for: concepting, storyboards, quick social clips

Image to Video

Animate a Single Image

Bring one reference image to life with a text prompt. The model decides how to move the subject based on what you describe.

Best for: product shots, portraits, static-to-motion conversions

Reference to Video

Combine Multiple References

Feed in up to ten reference images and bind them to roles inline with tags like '<IMAGE_REF_0>' and '<IMAGE_REF_1>' for multi-subject, style-consistent scenes.

Best for: brand consistency, multi-product scenes, character casts

Video Editing

Edit with Plain Instructions

Upload an existing clip and describe the change — "make this video anime, keep everything else the same." No timeline, no manual masking.

Best for: style transforms, quick fixes, iterating on a rough cut

The Essentials

Simple, predictable specs across every operation.

Duration

3–10 second clips per generation, with longer durations on Google's roadmap.

Aspect Ratios

16:9 landscape and 9:16 portrait, ready for both widescreen and social formats.

Reference Images

Up to 10 images per generation, addressable inline in your prompt.

Pricing

One flat rate — 5 credits per second — across text, image, reference, and edit.

Built for Fast-Moving Teams

Omni Flash trades render complexity for speed and a natural-language workflow.

E-Commerce & Product

Turn a single product photo into a moving showcase, or blend multiple product shots into one consistent scene.

Social & Short-Form Content

Go from prompt to a vertical, sound-aware clip in one pass — no separate audio generation step.

Rapid Video Editing

Skip the timeline. Describe a style change or fix and let conversational editing apply it directly.

Storytelling & Concepting

Use Gemini's real-world knowledge to keep narrative logic, settings, and details coherent across a scene.

Ready to Create with Gemini Omni Flash?

Start generating and editing video today with free credits.

Get Started Now

Free credits to try us

100 credits included — no card required

Text, image, reference, and edit — one flat rate

16:9 and 9:16 output, 3–10 second durations

Commercial usage rights included

Frequently Asked Questions

What is Gemini Omni Flash?

Gemini Omni Flash is Google's cost-efficient video model, combining Gemini's multimodal reasoning with video generation and editing. It's available on faktry for text-to-video, image-to-video, reference-to-video, and instruction-based video editing.

How does the conversational video editing work?

Upload a source video and describe the change you want in plain language — for example "make this video anime, keep everything else the same." The model applies the edit directly, without manual masking or a timeline. Voice editing and scene extension aren't supported yet.

What's the maximum video duration?

Text-to-video, image-to-video, and reference-to-video generations currently support 3–10 second clips. Google has said longer durations are on their roadmap.

How many reference images can I use?

The reference-to-video operation accepts up to 10 images, which you can bind to specific roles in your prompt using tags like '<IMAGE_REF_0>' and '<IMAGE_REF_1>' for multi-subject consistency.

Is generated video watermarked?

Yes. Video generated with Gemini Omni Flash carries Google's SynthID watermark, and safety filters are applied to both inputs and outputs.

Still have questions?