Gemini Omni Flash API Review: What the Preview Really Does

Gemini Omni Flash is Google's first "any-to-any" multimodal model that outputs video: you send text, images, or a previous result, and it returns a short 720p clip with sound that you can keep editing by describing changes in plain language. Google shipped it in public preview on June 30, 2026, together with Nano Banana 2 Lite, and the API model id is gemini-omni-flash-preview.

Quick answer, if you only came for one thing: the Gemini Omni Flash API supports text-to-video, image-to-video, reference-to-video with up to 3 subject images, and conversational editing. It does not support 1080p, duration control, seeds, negative prompt parameters, video extension, or turning the audio off. If you saw Omni Flash demos inside Google Flow and expected the same toolkit from the API, you'll be disappointed. We integrated it into our generator during launch week, so this review is based on what the endpoint actually returns, not on the marketing page.

What the Gemini Omni Flash API actually gives you

The API exposes Omni Flash through the Interactions API (interactions.create), not through the generateVideos endpoint that Veo uses. That detail matters, because interactions are stateful. Each response carries an id you can reference later.

According to Google's Omni documentation, four tasks work today:

Text-to-video. A prompt in, a 720p clip out, soundtrack included.
Image-to-video. Your picture becomes the opening frame and the prompt describes the motion.
Reference-to-video. Up to three subject images keep a character or product consistent in a new scene.
Conversational editing. Pass previous_interaction_id plus a short instruction, and the model changes the clip while keeping the rest intact.

The clip above came straight out of gemini-omni-flash-preview through our own pipeline, so you're looking at an actual sample, not a press asset. One text prompt: a paper sailboat drifting through spilled ink, "single continuous scene" to stop the model from cutting, and an "Audio:" line asking for paper rustle and ripples. It chose 10 seconds on its own. Turn the sound on; the audio is generated too.

Every clip runs 3 to 10 seconds at 24 fps. And here's the first surprise: you don't pick the length. There is no duration parameter. The model decides how long the clip should be based on the prompt. Ask for "a slow pan across a foggy harbor at dawn" and you might get 9 seconds. You might also get 4. For storyboarding that's fine; for anything cut to music it's genuinely annoying.

Audio is the second surprise, and a pleasant one. Every generation ships with sound: ambient noise, effects, sometimes speech if you ask for it. You can't switch it off, though. The only control you have is text, so we ended up appending an "Audio: ..." line to prompts and it works reasonably well.

What does Flow have that the API doesn't?

Google Flow is a full production tool wrapped around several models, and that wrapping is exactly what the raw API lacks. Flow gives you a Scene Builder for multi-shot sequences and preset camera controls (shot type, angle, movement). One nuance people get wrong about resolution: Flow renders at 720p by default too. The 1080p and 4K versions appear at the download step, as an export choice on top of the Veo render, not as a different generation mode. None of that tooling exists in gemini-omni-flash-preview.

Here's the honest comparison after a week of poking at both:

Capability	Flow / Gemini app	Gemini Omni Flash API
Resolution	720p render; 1080p / 4K offered at download	720p only, 24 fps
Clip length	Scene Builder chains shots	3–10 s, model decides
Camera controls	Preset shot types, angles, movement	Text prompt only
Extend a video	Yes (through Veo)	Not supported
Audio	On, with UI controls	Always on, prompt-only control
Seed / negative prompt	Hidden from user anyway	Not supported
Clips per request	One at a time	One per interaction, no sampleCount

The documentation is upfront about some of this. Google's docs state that video extension and interpolation "are not supported," and the same goes for system instructions, temperature, and negative prompt parameters (they suggest writing negatives into the regular prompt, which in my experience works maybe half the time). Video references are listed in the API schema with a 3-second cap, but the docs admit the model doesn't process them correctly yet.

So the preview API is a narrow slice of what the model can do on Google's own surfaces. That's normal for a preview. It still stings when a client asks for a 1080p export and the ceiling is 720p.

Split illustration comparing a full director control desk with a single small remote, a metaphor for Flow versus the Omni Flash API

Conversational editing is the feature that actually delivers

Editing is where Omni Flash earns its place. You generate a clip, look at it, then send a follow-up like "make the jacket red and slow down the camera" with the previous_interaction_id of the first result. The model rebuilds the video with your change applied and most of the original preserved. No re-uploading, no masks, no timeline.

Google's Gemini Enterprise documentation says you can stack up to three sequential edits while the session keeps context. In practice each edit is a fresh 720p render, so consistency drifts a little with every pass. Faces survive better than text on signs. Complex motion sometimes changes when you only asked for a color tweak.

One caveat we hit in production: parent interactions expire on Google's side. Try to edit a clip generated a while ago and the API may return a 404 for the referenced interaction. On BananaBanana we surface that as an expired video and refund the attempt, but if you build on the raw API, plan for it.

Chat conversation where each reply bubble shows the same video frame with one small change, illustrating conversational video editing

How much does Gemini Omni Flash cost?

Google's launch announcement prices Omni Flash at $0.10 per second of output video. Since the model chooses the duration, your bill is partly the model's mood: a 3-second result costs $0.30, a 10-second one costs $1.00. Same prompt, same settings, different invoice.

On BananaBanana we flattened that. One Omni Flash generation costs $1.00 regardless of what length the model picks, and an edit costs the same $1.00 because it's a full re-render. Nano Banana 2 Lite, which launched the same day, runs $0.03 per image here (Google's API rate is $0.034 for a 1K image). Full price list is on the pricing section.

Is a dollar per clip good value? For a sound-on draft that would otherwise need a video generation pass plus separate audio work, probably yes. For silent b-roll, no. Veo 3.1 Fast makes cheaper silent clips with exact durations.

Should you use Omni Flash or Veo 3.1?

Short version: they split the work, but not along the "draft vs final" line you might expect.

One thing to know before choosing: the quality gap between them isn't really about resolution. Veo 3.1 renders motion and physics more convincingly. Water behaves, fabric folds where it should, objects collide instead of ghosting through each other. Omni Flash gets these right often, not reliably, and on some clips you'll notice it without looking for it.

Pick Veo 3.1 when you need a real 4K export, an exact clip length (you set whole seconds, 4 to 8; fractions aren't a thing), silent output, extension later, or control over both the first and the last frame. That last pair is underrated: feed the same image as first and last frame and you get a seamless loop, which is the entire trick behind looping background video. It's the production tool for anything widescreen or physics-heavy.

Pick Omni Flash when the destination is a phone screen. For TikTok, Shorts or Reels, 720p is honestly all the platform preserves after its own compression, sound arrives baked into the same $1 pass, and conversational editing beats re-prompting from scratch while you chase the idea. In that lane it's not the sketching tool. It's simply the better pick.

My personal workflow after two days: for widescreen client work I still rough the concept out in Omni Flash for $1 a take, then redo the keeper in Veo at final quality. For a vertical clip that's going straight to social, the Omni take often ships as is.

Two camera drones of different sizes flying side by side over a pastel valley, representing the choice between Omni Flash and Veo 3.1

Both models are live in the BananaBanana generator, so you can compare them on the same prompt without writing a line of code or setting up a Google Cloud project.

Gemini Omni Flash API Review: What the Preview Really Does

What the Gemini Omni Flash API actually gives you

What does Flow have that the API doesn't?

Conversational editing is the feature that actually delivers

How much does Gemini Omni Flash cost?

Should you use Omni Flash or Veo 3.1?

FAQ

What is gemini-omni-flash-preview?

Can Gemini Omni Flash generate 1080p or 4K video?

Can I set the video duration in the Omni Flash API?

Does Omni Flash always generate audio?

Can Omni Flash extend or interpolate existing videos?