Skill

nano-banana

Generates AI images from text descriptions using Gemini 2.5 Flash Image API. Matches: "generate an image of", "create a picture of", "make me a photo of", "generate a photo", "AI image of", "create art of", "draw a [concrete noun]", "make an illustration of", "photorealistic image of", "artistic image of", "picture of a", "image of a", "nano banana", "make it darker", "change the background", "add more detail", "edit this image", "try variations", "generate variations". Do NOT use for: diagrams, flowcharts, charts, mind maps, org charts, architecture diagrams, data visualizations (use visualize) — e.g. "draw a flowchart", "chart this data", "org chart", "diagram this process", brainstorm or ideation (use brainstorm) — e.g. "brainstorm ideas", "explore options", prompt optimization or rewriting (use prompt-master) — e.g. "improve my prompt", "optimize this prompt".

From tandem

Install

Run in your terminal

npx claudepluginhub binatrixai/tandem-marketplace --plugin tandem

Tool Access

This skill is limited to using the following tools:

ReadWriteEditGlobGrepBash

Supporting Assets

View in Repository

evals/evals.json

references/editing-guide.md

references/presets.md

references/prompt-guide.md

template.md

Skill Content

Similar Skills

executing-plans

Executes pre-written implementation plans: critically reviews, follows bite-sized steps exactly, runs verifications, tracks progress with checkpoints, uses git worktrees, stops on blockers.

superpowers

134.2k

brainstorming

7 files

Guides idea refinement into designs: explores context, asks questions one-by-one, proposes approaches, presents sections for approval, writes/review specs before coding.

superpowers

134.2k

dispatching-parallel-agents

Dispatches parallel agents to independently tackle 2+ tasks like separate test failures or subsystems without shared state or dependencies.

superpowers

134.2k

Stats

Parent Repo Stars1

Parent Repo Forks0

Last CommitApr 5, 2026

Actions

View Source View Plugin View on GitHub View README

Nano Banana

Generate AI images from text descriptions using the Gemini 2.5 Flash Image API via Cloudflare AI Gateway. Guide users through structured prompt building for best results. Save PNGs to ~/Tandem/creative/images/ and auto-open.

Output follows ${CLAUDE_SKILL_DIR}/template.md.

For prompt building reference, load ${CLAUDE_SKILL_DIR}/references/prompt-guide.md once per conversation during guided mode.

For style and social format presets, load ${CLAUDE_SKILL_DIR}/references/presets.md when user selects a preset option in guided mode.

For editing and variation workflows, load ${CLAUDE_SKILL_DIR}/references/editing-guide.md when entering editing or variation mode.

Language

See METHODOLOGY.md language mirror rule. Reply in the user's language.

Step 1: Parse Request and Mode Detection

Detect which mode applies based on the user's request:

Direct mode: Prefix "generate:" or "generate image:", or a clearly detailed prompt (subject + style + details already specified) -- skip to Step 4
Edit mode: User references a previously generated image AND requests a change ("make it darker", "change the background", "add a person", "edit this image") -- skip to Editing Mode section
Variation mode: User says "try variations", "generate variations", "give me 2-4 options" -- skip to Variation Mode section
Guided mode: Everything else -- proceed to Step 2

Step 2: Guided Prompt Building

Load ${CLAUDE_SKILL_DIR}/references/prompt-guide.md once per conversation.

Brand detection: Read ~/Tandem/memory/brand.md using the Read tool. If the file exists: extract brand colors, style guidelines, and constraints. These will be automatically appended to the prompt in Step 3. If the user says "ignore brand" or "no brand": skip brand constraints for this generation. If brand.md does not exist: skip silently -- no error, no mention.

Content safety note (show once per conversation): "Works best for: objects, scenes, abstract art, stylized illustrations. Note: photorealistic people and public figures may be restricted -- illustrated versions work great."

Walk through Visual Descriptor fields via AskUserQuestion:

2a. Subject (required):

AskUserQuestion: "What should be in the image? Be as specific as you can."

2b. Style:

AskUserQuestion: "What style should the image have?"
Options: ["Use a style preset", "Let me describe"]

If "Let me describe": ask for freeform style input.

If "Use a style preset": Load ${CLAUDE_SKILL_DIR}/references/presets.md. Present the 10 style presets with their "Best for" descriptions:

AskUserQuestion: "Pick a style preset:"
Options: ["Photo-Realistic", "Illustration", "Flat Design", "Watercolor", "Oil Painting", "3D Render", "Anime/Manga", "Pixel Art", "Sketch/Pencil", "Pop Art"]

Apply the selected preset's prompt modifiers, quality markers, and negatives in Step 3 assembly.

2c. Mood/Details (optional):

AskUserQuestion: "Any specific mood, lighting, or environment details? (Skip for smart defaults)"
Options: ["Skip -- use smart defaults", "Let me describe"]

If skipped: apply sensible defaults from prompt-guide.md based on the subject and style (e.g., natural lighting for photorealistic, soft edges for watercolor).

2d. Aspect Ratio:

AskUserQuestion: "What shape should the image be?"
Options: ["Use a social format preset", "Square (1:1)", "Landscape (16:9)", "Portrait (9:16)", "Wide banner (21:9)", "Standard photo (3:2)"]

Map to API values:

Square = 1:1
Landscape = 16:9
Portrait = 9:16
Wide banner = 21:9
Standard photo = 3:2

If "Use a social format preset": Load ${CLAUDE_SKILL_DIR}/references/presets.md (if not already loaded).

AskUserQuestion: "Pick a social format:"
Options: ["Instagram Post (1:1)", "Instagram Story (9:16)", "LinkedIn Banner (16:9)", "Twitter/X Header (3:1)", "Facebook Cover (16:9)", "YouTube Thumbnail (16:9)"]

Apply the selected format's aspect ratio and composition guidance to the prompt.

Do NOT ask about Quality or Negative -- apply sensible defaults from prompt-guide.md automatically.

Step 3: Assemble Prompt

Combine the Visual Descriptor fields into a narrative paragraph following the assembly pattern in prompt-guide.md:

Subject as a descriptive sentence
Style woven into the description
Details as natural elaboration
Quality markers appended naturally 4b. If brand constraints were loaded in Step 2, append them here (e.g., "Using brand colors #1a73e8 and #34a853, clean modern style")
Default negative exclusions at the end

Default quality: "high detail, sharp focus, professional quality" Default negative: "No text, no watermarks, no distorted features, no blurry elements"

Show the assembled prompt to the user for confirmation:

AskUserQuestion: "Here's the assembled prompt. Ready to generate?"
Options: ["Generate this", "Edit prompt first", "Start over"]

If "Edit prompt first": let user modify the text, then re-confirm. If "Start over": return to Step 2.

Step 3b: Prompt Optimization Offer (one-time)

If this is the FIRST generation in the conversation AND the user did NOT use direct mode ("generate:" prefix), offer prompt optimization:

AskUserQuestion: "Would you like to optimize this prompt with Prompt Master first?"
Options: ["Yes, optimize it", "No, generate as-is"]

If "Yes": Hand off the assembled prompt to prompt-master with context: "The user wants to optimize this image generation prompt for Nano Banana. After optimization, offer to generate the image." Mark as shown -- do NOT repeat on subsequent generations in this session.

If "No" or direct mode: proceed to Step 4.

Step 4: Generate Image

CRITICAL: No base64 data may ever appear in conversation context. When image generation is available, the entire pipeline (API call + decode + file save) must happen in ONE Bash invocation.

NOTE: Image generation via direct API calls is not available in this environment. The Cowork sandbox blocks outbound HTTP requests (see METHODOLOGY.md -- No Direct API Calls rule).

Pending API Gateway implementation (see docs/architecture/api-gateway-design.md). Image generation through the gateway is planned for v11.0.

Until then, inform the user: "Image generation is not yet available — it will be enabled in a future update via the API Gateway (v11.0). I can help you craft the perfect prompt now so it's ready."

If the user wants to continue with prompt crafting, proceed to Step 6 (present the assembled prompt details without an image file). If the user wants to stop, acknowledge and end gracefully.

Step 5: Content Safety Handling

Proactive guidance (show once per conversation, during Step 2): Already handled in Step 2 intro text.

On SAFETY block from API:

Suggest alternatives based on what was requested:

Person/portrait: "Try an illustrated or cartoon-style portrait instead."
Celebrity/public figure: "Try describing the scene or setting without naming real people."
Other: "Try a stylized or abstract version of the same concept."

Offer to regenerate with the modified prompt:

AskUserQuestion: "Would you like to try with a modified prompt?"
Options: ["Yes, try stylized version", "Try something different", "Done"]

Step 6: Present Result

Use the template from ${CLAUDE_SKILL_DIR}/template.md:

## Image Generated

**Prompt:** [assembled prompt]
**Style:** [style used]
**Aspect Ratio:** [ratio]
**File:** `[output file path]`

The image has been saved and opened for viewing.

Track the output file path as last_generated_image for editing mode.

Offer follow-up:

AskUserQuestion: "What would you like to do next?"
Options: ["Generate another", "Edit this image", "Try variations of this", "Different style of same subject", "Done"]

If "Edit this image": go to Editing Mode. If "Try variations of this": go to Variation Mode. If "Different style of same subject": return to Step 2b (style selection) with the same subject, skip 2a.

Step 7: Stats and Sync

Log to stats.json:

Read ~/Tandem/stats.json. If it does not exist, create it as []. Append:

{
  "type": "image-generation",
  "action": "created",
  "count": 1,
  "timeSavedMinutes": 5,
  "description": "Image: [slug]",
  "timestamp": "<current ISO 8601 UTC>"
}

Write the updated array back to ~/Tandem/stats.json.

Run /sync:

Follow the /sync workflow from tandem-skills/core/sync/SKILL.md.

If stats.json write or /sync fails: continue -- the image file is the primary deliverable.

Editing Mode

Load ${CLAUDE_SKILL_DIR}/references/editing-guide.md for the complete editing workflow.

Overview:

Identify the most recently generated image (from last_generated_image state)
Parse the user's edit request as natural language
NOTE: Image editing requires the API Gateway (not yet available — see docs/architecture/api-gateway-design.md, planned for v11.0). When available: re-read PNG, encode, send with edit instruction via gateway, decode result, save as versioned file (img-...-v2.png), auto-open
Present result using template.md, update last_generated_image to the new file
Offer follow-up:

AskUserQuestion: "What would you like to do next?"
Options: ["Edit again", "Try variations", "Generate something new", "Done"]

CRITICAL: All base64 handling (read source + encode + API call + decode response) happens in ONE Bash invocation. No base64 data in conversation context.

If no previous image exists, tell user: "No recent image to edit. Let's generate one first." and redirect to Step 1.

Variation Mode

Load ${CLAUDE_SKILL_DIR}/references/editing-guide.md for the complete variation workflow.

Overview:

Ask how many variations (if not specified):

AskUserQuestion: "How many variations?"
Options: ["2", "3", "4"]

Use the current prompt (or ask user for one if none assembled yet)
NOTE: Variation generation requires the API Gateway (not yet available — see docs/architecture/api-gateway-design.md, planned for v11.0). When available: N separate gateway calls with same prompt, save as img-...-var1.png through img-...-varN.png
Present all filenames, then:

AskUserQuestion: "Which variation do you prefer?"
Options: [list each filename + "None -- try again"]

Set selected file as last_generated_image
Offer follow-up:

AskUserQuestion: "What would you like to do next?"
Options: ["Edit the selected image", "Generate something new", "Done"]

Anti-Patterns

Do NOT let base64 data enter conversation context -- all decoding inside Bash
Do NOT use the Bash tool for anything except the API call pipeline -- file operations use Read/Write/Edit
Do NOT skip the guided prompt flow unless user explicitly uses direct mode
Do NOT ask more than 4 questions during guided mode -- keep it quick
Do NOT batch multiple AskUserQuestion -- one at a time
Do NOT generate without showing assembled prompt first (unless direct mode)
Do NOT show the content safety note more than once per conversation
Do NOT accumulate image history in multi-turn edits -- only send the latest image + edit instruction
Do NOT overwrite original files during editing -- always create versioned copies
Do NOT mention brand.md if it does not exist
Do NOT apply brand constraints if user says "ignore brand" or "no brand"
Do NOT offer prompt optimization more than once per conversation

Error Handling

429 (rate limit / free tier): "Image generation requires a paid Google AI Studio API key. Get one at https://aistudio.google.com/apikey (paid plan required)."

SAFETY finish reason: "This image was blocked by content safety filters." Then suggest an illustrated or stylized alternative based on the subject.

Network error / non-200 response: "Could not reach the API. Check your internet connection and try again."

No inline_data in response: "The API returned text but no image. Try a more specific or different prompt."

API key file missing: API key setup is handled by the API Gateway (v11.0). No local key file needed.

Invalid JSON response: "Received an unexpected response from the API. Try again in a moment."

Source image missing for edit: If last_generated_image file no longer exists, ask user to generate a new image.

Variation generation partial failure: If some variations succeed and some fail, present the successful ones and note the failures. Offer to retry only the failed variations.