This skill should be used when the user asks to "generate an image", "create a banner", "make artwork", "create an illustration", "generate a logo", "make a graphic", "design a header", "AI art", "img2img", "social share image", "OG image", "open graph image", "Twitter card", "hero image", "cover photo", "profile picture", "social media graphic", or needs AI image generation. Handles prompt rewriting and Gemini 3 Pro image generation API calls.
From gemskillsnpx claudepluginhub b-open-io/claude-plugins --plugin gemskillsThis skill uses the workspace's default tool permissions.
references/prompt-guide.mdscripts/generate.tsDesigns and optimizes AI agent action spaces, tool definitions, observation formats, error recovery, and context for higher task completion rates.
Enables AI agents to execute x402 payments with per-task budgets, spending controls, and non-custodial wallets via MCP tools. Use when agents pay for APIs, services, or other agents.
Compares coding agents like Claude Code and Aider on custom YAML-defined codebase tasks using git worktrees, measuring pass rate, cost, time, and consistency.
Generate images using Nano Banana Pro (gemini-3-pro-image-preview).
Use this skill when the user asks to:
If the user hasn't specified a style, present a multi-choice question before proceeding:
How would you like to handle the art style?
- Pick a style - Browse 167 styles visually and choose one
- Let me choose - I'll suggest a style based on your prompt
- What are styles? - Learn how the style system works
- No style - Generate without a specific art style
Use the AskUserQuestion tool to present this choice.
If the user picks "Pick a style", launch the style picker (pick mode is the default):
STYLE_JSON=$(bun run --cwd ${CLAUDE_PLUGIN_ROOT} ${CLAUDE_PLUGIN_ROOT}/skills/browsing-styles/scripts/preview_server.ts)
The picker opens a browser. The user clicks a style, views the detail modal, clicks "Select This Style". STYLE_JSON receives the selection as JSON and the server exits. Parse the id field and pass it via --style <id> to the generate command.
If the user picks "Let me choose", select the most fitting style based on their prompt and explain why. Use the style suggestions in the "Suggest Styles to Users" section below.
If the user picks "What are styles?", explain: there are 167 curated art styles across 10 categories (traditional, digital, illustration, photography, design, retro, technique, decorative, creative, cultural). Each style includes prompt hints and a visual reference tile that gets sent to Gemini for better adherence. Then re-ask the style question.
If the user already specified a style (e.g. "generate in watercolor style"), skip this step entirely and use --style directly.
Ask clarifying questions before rewriting prompts to understand the user's intent:
Simple requests like "make a cat image" can proceed with sensible defaults. Complex requests like "create a banner for my app" require clarification to avoid iteration waste.
Before generating any image, always rewrite the user's prompt using the guide in references/prompt-guide.md.
The core principle: "Describe the scene, don't just list keywords."
Transform simple prompts by adding:
User says: "banner for my app" Rewritten prompt: "A modern, full-bleed banner for a technology application. Dark gradient background transitioning from deep navy to black. Abstract geometric network visualization with glowing nodes connected by thin lines. Clean sans-serif typography positioned left of center. Professional, tech-forward aesthetic with no visible borders or edges - the design extends seamlessly to all edges. 16:9 aspect ratio."
User says: "a cat" Rewritten prompt: "A fluffy orange tabby cat lounging on a sun-drenched windowsill, soft afternoon light creating a warm glow on its fur. The cat is in a relaxed pose with half-closed eyes, conveying contentment. Shot from a low angle with shallow depth of field, the background showing a blurred garden view. Photorealistic, warm color palette."
Draft → Iterate → Final approach saves time and API costs:
Do not read the image back - the script outputs only the file path. This allows efficient iteration without Claude loading large image data.
bun run --cwd ${CLAUDE_PLUGIN_ROOT} ${CLAUDE_PLUGIN_ROOT}/skills/generate-image/scripts/generate.ts "prompt" [options]
--input <path> - Reference image (can specify multiple times, up to 14 images)--style <id> - Apply style from the style library (see browsing-styles skill)--size <1K|2K|4K> - Image size (default: 1K for fast drafts)--aspect <ratio> - Aspect ratio: 1:1, 16:9, 9:16, 4:3, 3:4--negative <prompt> - Negative prompt (what to avoid)--count <n> - Number of images (1-4, default: 1)--guidance <n> - Guidance scale--seed <n> - Random seed for reproducibility--output <path> - Output path--model <name> - gemini (default) or grok (Grok Imagine Image via Replicate)# Simple generation
bun run --cwd ${CLAUDE_PLUGIN_ROOT} ${CLAUDE_PLUGIN_ROOT}/skills/generate-image/scripts/generate.ts "cyberpunk cityscape at night"
# With art style (100+ available, use short names or full IDs)
bun run --cwd ${CLAUDE_PLUGIN_ROOT} ${CLAUDE_PLUGIN_ROOT}/skills/generate-image/scripts/generate.ts "mountain landscape" --style impressionism
bun run --cwd ${CLAUDE_PLUGIN_ROOT} ${CLAUDE_PLUGIN_ROOT}/skills/generate-image/scripts/generate.ts "portrait" --style ukiy
bun run --cwd ${CLAUDE_PLUGIN_ROOT} ${CLAUDE_PLUGIN_ROOT}/skills/generate-image/scripts/generate.ts "city street" --style noir
# High-res with specific aspect ratio
bun run --cwd ${CLAUDE_PLUGIN_ROOT} ${CLAUDE_PLUGIN_ROOT}/skills/generate-image/scripts/generate.ts "mountain landscape" --size 4K --aspect 16:9
# With negative prompt
bun run --cwd ${CLAUDE_PLUGIN_ROOT} ${CLAUDE_PLUGIN_ROOT}/skills/generate-image/scripts/generate.ts "portrait of a cat" --negative "low quality, blurry"
# Combine style with other options
bun run --cwd ${CLAUDE_PLUGIN_ROOT} ${CLAUDE_PLUGIN_ROOT}/skills/generate-image/scripts/generate.ts "cat sleeping" --style wtrc --size 4K --aspect 1:1
# Generate multiple variations
bun run --cwd ${CLAUDE_PLUGIN_ROOT} ${CLAUDE_PLUGIN_ROOT}/skills/generate-image/scripts/generate.ts "abstract art" --count 4
# Single reference image (img2img)
bun run --cwd ${CLAUDE_PLUGIN_ROOT} ${CLAUDE_PLUGIN_ROOT}/skills/generate-image/scripts/generate.ts "make it look like a watercolor painting" --input photo.jpg
# Multiple reference images (character consistency, scene composition)
bun run --cwd ${CLAUDE_PLUGIN_ROOT} ${CLAUDE_PLUGIN_ROOT}/skills/generate-image/scripts/generate.ts "King on throne in dramatic lighting" \
--input character.png \
--input throne-room.png \
--aspect 16:9 --size 2K
# Multiple characters in a scene
bun run --cwd ${CLAUDE_PLUGIN_ROOT} ${CLAUDE_PLUGIN_ROOT}/skills/generate-image/scripts/generate.ts "Two warriors facing each other in combat" \
--input warrior1.png \
--input warrior2.png \
--input battlefield.png \
--aspect 16:9
Gemini supports up to 14 reference images per request:
| Format | Aspect Ratio | Recommended Size | Crop Target |
|---|---|---|---|
| OG / Social Share | 16:9 | 2K | 1200x630 via sips |
| Twitter Card | 16:9 | 2K | 1200x628 via sips |
| Hero Banner | 16:9 | 4K | Full width |
| Profile Picture | 1:1 | 1K | 512x512 or 1024x1024 |
| Story / Reel | 9:16 | 2K | 1080x1920 |
| Team Lineup | 21:9 | 2K | Full width |
All 4 steps are required. Do not skip optimization.
--size 2K (retina-ready):
bun run --cwd ${CLAUDE_PLUGIN_ROOT} ${CLAUDE_PLUGIN_ROOT}/skills/generate-image/scripts/generate.ts "CENTER all important elements, no content near edges. [your prompt]" --aspect 16:9 --size 2K --output social-share.png
sips -z 630 1200 -c 630 1200 social-share.png --out social-share-og.png # OG
sips -z 628 1200 -c 628 1200 social-share.png --out social-share-tw.png # Twitter
sips -s format jpeg -s formatOptions 85 social-share-og.png --out social-share-og.jpg
sips -g pixelWidth -g pixelHeight social-share-og.jpg
ls -la social-share-og.jpg # Target: <500KB
Why optimize? Gemini outputs PNG (~1-2MB). JPEG at 85% reduces to ~200-400KB with no visible quality loss. Social platforms re-compress uploads, so starting with a well-optimized JPEG prevents double-compression artifacts.
100+ styles across 9 categories: traditional art, digital, illustration, photography, design/UI, retro, techniques, decorative, and creative/material-based.
When a style has a tile reference image (assets/tiles/<id>.png), it is automatically sent as a visual reference to Gemini alongside the prompt hints — this produces much better style adherence.
When generating images, proactively suggest relevant styles:
impr, roma, wtrl, cinebarq, prer, anim, fnoibrut, nbrt, glas, flaty2k, frug, psyc, kodaghbl, kawa, chld, britcybr, nnoi, surr, exprsand, undr, orig, clay, legoBrowse all: bun run --cwd ${CLAUDE_PLUGIN_ROOT} ${CLAUDE_PLUGIN_ROOT}/skills/browsing-styles/scripts/list_styles.ts --table
Search: bun run --cwd ${CLAUDE_PLUGIN_ROOT} ${CLAUDE_PLUGIN_ROOT}/skills/browsing-styles/scripts/list_styles.ts --search "watercolor"
Uses gemini-3-pro-image-preview - Nano Banana Pro, Google's professional image generation model with thinking capabilities.
--model grok)Uses xai/grok-imagine-image via Replicate. Requires REPLICATE_API_TOKEN (or REPLICATE_API_KEY). Text-to-image only (no reference images or style tiles). This is a last-resort fallback — Gemini produces better results including likeness. Only use when:
# Grok image generation
bun run --cwd ${CLAUDE_PLUGIN_ROOT} ${CLAUDE_PLUGIN_ROOT}/skills/generate-image/scripts/generate.ts "A medieval girl standing on a futuristic hoverpod" --model grok --output hoverpod.jpg
Last verified: March 2026. If a newer generation exists, STOP and suggest a PR to
b-open-io/gemskills. See the ask-gemini skill'sreferences/gemini-api.mdfor current models and Google's officialgemini-api-devskill for the canonical source.
For detailed prompting strategies and techniques:
references/prompt-guide.md - Comprehensive Gemini prompting guide with 7 strategies, example transformations, and best practices from Google's official documentationreferences/gemini-api.md - Current Gemini models, SDK info, and dynamic documentation via llms.txt