Retour au blog
· 1 vuestutorialimagesnano bananaprompts

Nano Banana Prompt Guide: Describe Scenes, Not Tags

A hands-on Nano Banana prompt guide: narrative scene structure, photography language, text rendering on Pro, editing prompts, and real per-image prices.

Original en en

Nano Banana Prompt Guide: Describe Scenes, Not Tags

A Nano Banana prompt is a written scene description that Google's image models, Nano Banana 2, Nano Banana Pro, and the budget Lite tier, turn into a picture. These models are built on Gemini, which means they read your prompt as language rather than as a bag of keywords. That single fact changes how you should write. According to Google's image generation docs, rich narrative descriptions consistently beat comma-separated tag lists, and our own generations back that up daily.

The short version, if you only want the template: "A photorealistic [shot type] of [subject] in [setting]. [Lighting description]. Shot from [camera angle] with [lens type]." Write complete sentences. Name the light source and the lens like a photographer would. That habit alone closes most of the gap between a generic render and an image you'd actually publish.

This is the image-side companion to our Veo 3.1 prompt guide. Same philosophy, different vocabulary: video prompts revolve around motion and camera moves, image prompts revolve around light, lens, and composition. Every demo below was generated on BananaBanana with the exact prompt shown, first attempt, no rerolls.

Why do narrative prompts work better than keyword lists?

Old diffusion habits die hard. If you learned prompting on Stable Diffusion or Midjourney, you probably write things like "perfume bottle, product photo, 8k, dramatic lighting, masterpiece". Nano Banana models parse that, but you're wasting their main strength: they understand relationships between things, not just the things themselves.

"A barista laughing at something off-frame while steam rises from the espresso machine behind her" is a scene. The keyword version ("barista, laughing, steam, espresso machine") loses who is doing what, where the steam is, and what the emotional register of the frame should be. The model fills those gaps with defaults, and defaults are what make AI images look like AI images.

A working Nano Banana prompt usually carries five or six ingredients:

IngredientWhat it controlsExample phrase
SubjectWho or what is in frame"an elderly watchmaker with wire-rim glasses"
SettingWhere the scene happens"at a cluttered workbench in a narrow shop"
StyleOverall look"photorealistic" / "flat vector illustration"
LightingLight source, direction, mood"single warm desk lamp from the left"
CameraAngle and framing"close-up, shot from slightly above"
LensOptical character"85mm lens, shallow depth of field"

You won't need all six every time. But every ingredient you skip is a decision you hand back to the model.

Six ingredients of a Nano Banana prompt shown as floating cards feeding into a single photographic frame: subject, setting, style, lighting, camera angle and lens

Photorealistic prompts: write like a photographer

For realistic photos the highest-leverage vocabulary is camera language, same as with video. Google's docs give the template I quoted above, and it maps directly onto how photographers brief a shoot: shot type, subject, environment, light, angle, glass.

Here's the difference in practice. Both images below came from Nano Banana 2 at 1K resolution, $0.06 each on our pricing. First, the prompt everyone writes on day one, just "A perfume bottle on a table":

Nano Banana 2 result for the weak prompt: a generic perfume bottle on a table with default lighting

Pretty, honestly. The model picked a bright lifestyle scene, window light, flowers, a jewelry dish, and it's a perfectly fine stock photo. But every one of those decisions was the model's, not yours, and the next run will make different ones. Now the same subject with the ingredients filled in: "A photorealistic close-up of an amber glass perfume bottle standing on wet black slate, fine water droplets across the glass. Warm golden backlight from the right, a faint cool rim light from the left. Shot from a low angle with an 85mm macro lens, shallow depth of field, dark blurred background":

Nano Banana 2 result for the structured prompt: amber perfume bottle on wet slate with golden backlight and macro depth of field

Same model, same price, same four-second render. The droplets, the two-tone light, and the low angle were all words in the prompt rather than luck. One detail worth zooming in on: the label. I never asked for one, the model invented it, and it misspelled its own invented text ("EAU DE PARFUIM"). Unprompted text is still Nano Banana 2's weak spot, which is a clean segue to the next section.

Lighting names are worth memorizing because they read reliably: golden hour, overcast softbox, three-point studio lighting, hard noon sun, neon spill, candlelight. Vague mood words ("beautiful", "dramatic") read as roughly nothing. If your subject is a product rather than a person, we push this same vocabulary a lot further in the product photography guide. In my experience the lens choice matters almost as much; "85mm macro" and "wide-angle 24mm" produce visibly different geometry, which is exactly the control tag-style prompts never give you.

How do you get readable text from Nano Banana Pro?

Text rendering is the reason Nano Banana Pro exists. The docs are direct about it: for professional assets where typography has to be precise, use Pro (gemini-3-pro-image). Nano Banana 2 handles a short word or two; Pro handles layouts.

The documented template: "Create a [image type] for [brand] with the text '[exact text]' in a [font style]. The design should be [style], with a [color scheme]." Two habits improve the hit rate. Put the exact text in quotes, and describe the font conversationally ("a chunky rounded sans-serif", "a thin elegant serif") instead of naming a specific typeface it may not know.

This poster came from Nano Banana Pro with the prompt: "Create a retro poster for a coffee stand called Banana Brew with the text 'BANANA BREW' in a chunky rounded sans-serif across the top and the slogan 'cold roast, warm people' in a small handwritten script underneath, flat illustration of a single banana-yellow espresso cup in the center, cream background, amber and dark-ink color scheme":

Nano Banana Pro text rendering demo: retro coffee poster with the exact phrase BANANA BREW rendered in a rounded sans-serif

Every character landed on the first try. To be fair, that stops being guaranteed past a couple of lines; on dense menu-style layouts Pro still occasionally drops or doubles a letter, so proofread anything with more than about ten words before you ship it.

Editing prompts: change one thing, keep the rest

Nano Banana models edit images as well as they generate them, and the prompt pattern is different enough to trip people up. The mistake is re-describing the whole scene. Describe only the change.

Google's docs call this semantic masking, and their template is worth copying verbatim: "Using the provided image, change only the [specific element] to [new element]. Keep everything else exactly the same, preserving the original style, lighting, and composition." That closing sentence does real work; without it, edits tend to drift the palette or recompose the frame.

The same logic runs the iterative workflow. Generate, look, then ask for one adjustment at a time: "make the light warmer", "replace the mug with a glass", "same scene at dusk". Each small step preserves more of what you liked than one giant corrective prompt. It feels slower. It's usually faster, because you stop losing good frames to overcorrection.

Reference images stack on top of this. Nano Banana 2 accepts up to 4 character references and 10 object references per request, Pro takes 5 and 6, per the Gemini API docs. If you're trying to keep one face consistent across a whole series, that's a separate craft with its own tricks, and we wrote it up in the character consistency guide.

A pair of tweezers swapping a single puzzle piece inside an otherwise finished picture frame, illustrating targeted Nano Banana image editing

Negative prompts, aspect ratios, and what an image costs

Here's a fact that surprises people coming from other tools: the Nano Banana API has no negative prompt parameter. Nothing like Veo's negativePrompt field exists for images, and the docs never mention one. The working substitute is positive framing. Don't write "no cars"; write "an empty, deserted street". Don't write "no text"; describe the composition so completely that stray text has nowhere to live. Framing the absence as a presence sounds like a word game, but it measurably works.

Aspect ratios are generous: ten options from square 1:1 through 3:2, 4:5, 9:16, 16:9, all the way to cinematic 21:9. Resolutions run 512px, 1K, 2K, and 4K, with one catch we covered in the Lite guide: Nano Banana 2 Lite outputs 1K only.

Current BananaBanana prices per image:

Model512px1K2K4K
Nano Banana 2 Lite$0.03
Nano Banana 2$0.03$0.06$0.09$0.13
Nano Banana Pro$0.11$0.11$0.20

My default workflow mirrors what I do with video: draft on the cheap tier, ship on the expensive one. Iterate the prompt on Lite or Nano Banana 2 at 1K until the composition holds, then rerun the final prompt once on Pro at 2K, which costs the same $0.11 as 1K and is the quiet bargain in that table. Prompt quality transfers across tiers almost perfectly. New accounts get $0.10 free, which covers your first three Lite drafts. Full grid is on the pricing page.

FAQ

What is the best prompt structure for Nano Banana?

A narrative sentence set covering subject, setting, style, lighting, camera angle, and lens: "A photorealistic [shot type] of [subject] in [setting]. [Lighting]. Shot from [angle] with [lens]." Full sentences beat keyword lists because the models are built on Gemini and parse relationships, not just nouns.

Does Nano Banana 2 support negative prompts?

No. There's no negative prompt parameter in the Gemini image API. Use positive framing instead: describe the scene you want ("an empty street at dawn") rather than the thing you don't ("no cars").

How do I make Nano Banana render text correctly?

Use Nano Banana Pro, put the exact wording in quotes, and describe the font style conversationally ("a chunky rounded sans-serif"). Keep it under about ten words per image; longer layouts still need proofreading.

Should I prompt Nano Banana 2 and Pro differently?

The structure is identical. Pro justifies its price on text rendering, complex multi-element layouts, and instructions with many constraints; Nano Banana 2 handles everyday photorealism and illustration at roughly half the cost.

How much does one Nano Banana image cost?

On BananaBanana, from $0.03 (Lite at 1K or Nano Banana 2 at 512px) to $0.20 (Pro at 4K). The sweet spot for finals is Pro at 2K for $0.11, priced the same as Pro at 1K.

tutorialimagesnano bananaprompts