Generate and edit images using Gemini API (Nano Banana Pro). Supports text-to-image, image editing, multi-turn refinement, Google Search grounding for factual accuracy, and composition from multiple reference images.
Generate and edit images using Gemini 3 Pro (Nano Banana) via API. Triggers when you need to create visuals, modify existing images, or iterate on designs through multi-turn refinement.
/plugin marketplace add agneym/agneym-claude-marketplace/plugin install explore-with-illustrations@agneym-claude-marketplaceThis skill inherits all available tools. When active, it can use any tool Claude has access to.
mise.tomlscripts/compose_images.pyscripts/edit_image.pyscripts/gemini_images.pyscripts/generate_image.pyscripts/multi_turn_chat.pyGenerate professional-quality images using Google's Gemini 3 Pro Image model (aka Nano Banana Pro). The environment variable GEMINI_API_KEY must be set.
gemini-3-pro-image-preview (Nano Banana Pro)
CRITICAL FOR AGENTS: These are executable scripts in your PATH. All scripts now default to gemini-3-pro-image-preview.
scripts/generate_image.py "A technical diagram showing microservices architecture" output.png
scripts/edit_image.py diagram.png "Add API gateway component with arrows showing data flow" output.png
scripts/multi_turn_chat.py
For high-resolution technical diagrams:
scripts/generate_image.py "Your prompt" output.png --size 4K --aspect 16:9
All image generation uses the generateContent endpoint with responseModalities: ["TEXT", "IMAGE"]:
import os
from google import genai
client = genai.Client(api_key=os.environ["GEMINI_API_KEY"])
response = client.models.generate_content(
model="gemini-3-pro-image-preview",
contents=["Your prompt here"],
)
for part in response.parts:
if part.text:
print(part.text)
elif part.inline_data:
image = part.as_image()
image.save("output.png")
Control output with image_config:
from google.genai import types
response = client.models.generate_content(
model="gemini-3-pro-image-preview",
contents=[prompt],
config=types.GenerateContentConfig(
response_modalities=['TEXT', 'IMAGE'],
image_config=types.ImageConfig(
aspect_ratio="16:9", # 1:1, 2:3, 3:2, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, 21:9
image_size="4K" # 1K, 2K, 4K (Nano Banana Pro supports up to 4K)
),
)
)
Pass existing images with text prompts:
from PIL import Image
img = Image.open("input.png")
response = client.models.generate_content(
model="gemini-3-pro-image-preview",
contents=["Add a sunset to this scene", img],
)
Use chat for iterative editing:
from google.genai import types
chat = client.chats.create(
model="gemini-3-pro-image-preview",
config=types.GenerateContentConfig(response_modalities=['TEXT', 'IMAGE'])
)
response = chat.send_message("Create a logo for 'Acme Corp'")
# Save first image...
response = chat.send_message("Make the text bolder and add a blue gradient")
# Save refined image...
Keep prompts concise and specific. Research shows prompts under 25 words achieve 30% higher accuracy. Structure as:
Subject + Adjectives + Action + Location/Context + Composition + Lighting + Style
Include camera details: lens type, lighting, angle, mood.
"Photorealistic close-up portrait, 85mm lens, soft golden hour light, shallow depth of field"
Specify style explicitly:
"Kawaii-style sticker of a happy red panda, bold outlines, cel-shading, white background"
Be explicit about font style and placement:
"Logo with text 'Daily Grind' in clean sans-serif, black and white, coffee bean motif"
Describe lighting setup and surface:
"Studio-lit product photo on polished concrete, three-point softbox setup, 45-degree angle"
Be explicit about positions, relationships, and labels:
"Technical diagram: Component A at top, Component B at bottom. Arrow from A to B labeled 'HTTP GET'. Clean boxes, directional arrows, white background."
Generate images based on real-time data:
response = client.models.generate_content(
model="gemini-3-pro-image-preview",
contents=["Visualize today's weather in Tokyo as an infographic"],
config=types.GenerateContentConfig(
response_modalities=['TEXT', 'IMAGE'],
tools=[{"google_search": {}}]
)
)
Combine elements from multiple sources:
response = client.models.generate_content(
model="gemini-3-pro-image-preview",
contents=[
"Create a group photo of these people in an office",
Image.open("person1.png"),
Image.open("person2.png"),
Image.open("person3.png"),
],
)
curl -s -X POST \
"https://generativelanguage.googleapis.com/v1beta/models/gemini-3-pro-image-preview:generateContent" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"contents": [{"parts": [{"text": "Technical diagram showing RESTful API architecture"}]}]
}' | jq -r '.candidates[0].content.parts[] | select(.inlineData) | .inlineData.data' | base64 --decode > output.png
responseModalities: ["IMAGE"]) won't work with Google Search groundingCreating algorithmic art using p5.js with seeded randomness and interactive parameter exploration. Use this when users request creating art using code, generative art, algorithmic art, flow fields, or particle systems. Create original algorithmic art rather than copying existing artists' work to avoid copyright violations.
Applies Anthropic's official brand colors and typography to any sort of artifact that may benefit from having Anthropic's look-and-feel. Use it when brand colors or style guidelines, visual formatting, or company design standards apply.
Create beautiful visual art in .png and .pdf documents using design philosophy. You should use this skill when the user asks to create a poster, piece of art, design, or other static piece. Create original visual designs, never copying existing artists' work to avoid copyright violations.