Nano Banana Image Generation Skill

Generate and edit images using Google's Nano Banana (Gemini 2.5 Flash Image) and Nano Banana Pro (Gemini 3 Pro Image) APIs.

Model Selection

Model	ID	Best For	Resolution	Cost
Nano Banana	`gemini-2.5-flash-image`	Fast generation, iteration, basic edits	Up to 1024px	~$0.039/image
Nano Banana Pro	`gemini-3-pro-image-preview`	Professional assets, text rendering, complex compositions	Up to 4K	Higher cost

Selection Guide:

Use Nano Banana for: rapid prototyping, simple edits, high-volume generation
Use Nano Banana Pro for: text in images, 4K output, up to 14 reference images, Google Search grounding

Core Capabilities

Text-to-Image: Generate images from text descriptions
Image Editing: Add, remove, modify elements in existing images
Multi-Image Composition: Blend up to 14 images (Pro only)
Character Consistency: Maintain same character across multiple generations
Text Rendering: Generate legible text in images (Pro excels here)
Style Transfer: Apply artistic styles to images
Iterative Refinement: Conversational multi-turn editing

Quick Start

Generate an Image

python scripts/generate_image.py "A cozy coffee shop interior with warm lighting" --output coffee_shop.png

Edit an Image

python scripts/edit_image.py input.jpg "Add a cat sitting on the chair" --output output.png

Prompting Best Practices

Core Principle: Describe the scene, don't just list keywords.

The model understands natural language narratives better than comma-separated tags.

Prompt Structure (for best results)

Include these elements in your prompts:

Subject: Who/what is in the image (be specific)
Action: What is happening
Environment/Location: Setting and context
Lighting: Natural, studio, golden hour, etc.
Style: Photorealistic, illustration, watercolor, etc.
Composition: Camera angle, framing, perspective
Mood/Atmosphere: Emotional tone

Example - Good vs Bad Prompts

Bad: cat, hat, wizard, cute

Good: A fluffy ginger cat wearing a tiny knitted wizard hat, sitting on a wooden floor in a cozy living room. Soft natural light streams through a nearby window, creating a warm, magical atmosphere. Photorealistic, shot with an 85mm portrait lens.

For comprehensive prompting strategies, see: references/prompting-guide.md

API Configuration

Required Environment Variable

export GEMINI_API_KEY="your-api-key-here"

Get your API key from: https://aistudio.google.com/apikey

Response Modalities

Always set responseModalities: ["TEXT", "IMAGE"] to receive generated images.

Image Configuration Options

image_config = {
    "aspect_ratio": "16:9",  # Options: 1:1, 2:3, 3:2, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, 21:9
    "image_size": "2K"       # Options: 1K, 2K, 4K (Pro only for 4K)
}

For complete API reference, see: references/api-reference.md

Scripts Reference

Script	Purpose
`scripts/generate_image.py`	Text-to-image generation
`scripts/edit_image.py`	Edit existing images with text prompts
`scripts/multi_image_compose.py`	Compose multiple images (Pro only)

Important Notes

All generated images include invisible SynthID watermarks
Pro model uses "thinking" mode for complex prompts (enabled by default)
For multi-turn editing, maintain conversation history
Supported input formats: JPEG, PNG, WebP (up to 5MB)
For best performance use languages: EN, es-MX, ja-JP, zh-CN, hi-IN

nano-banana