Help us improve
Share bugs, ideas, or general feedback.
From skills
Generates images from text, edits images with references, performs product placement, style transfer, and multi-image composition using OpenAI DALL-E or Google Gemini.
npx claudepluginhub michaelboeding/skills --plugin skillsHow this skill is triggered — by the user, by Claude, or both
Slash command
/skills:image-generationThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Generate and edit images using AI (Google Gemini Nano Banana Pro, OpenAI DALL-E 3).
Generates AI images from text prompts, edits images, and composes from multiple references using Gemini models. Supports t2i, i2i, product mockups, and stickers.
Provides prompting techniques for AI image generation and editing models on Replicate. Covers natural language prompts, photographic vocabulary, and iterative editing.
Generates images from text, edits existing images, applies style transfers, composes from multiple references, and supports multi-turn refinement using Google's Gemini API via Python scripts. For logos, stickers, mockups.
Share bugs, ideas, or general feedback.
Generate and edit images using AI (Google Gemini Nano Banana Pro, OpenAI DALL-E 3).
Capabilities:
Users can specify what they want:
| User Says | Mode | What Happens |
|---|---|---|
| "Generate an image of a sunset" | Generate | Text-to-image, no reference needed |
| "Create a logo for my coffee shop" | Generate | Text-to-image with text rendering |
| "Edit this image: add a hat to the cat" | Edit | User provides image, AI modifies it |
| "Remove the background from this photo" | Edit | User provides image, AI edits it |
| "Put this product on a kitchen counter" | Product | User provides product + optional scene |
| "Make this photo look like Van Gogh painted it" | Style | User provides photo, AI applies style |
| "Combine these photos into a group shot" | Composite | User provides multiple images |
Environment variables must be configured for the APIs to work. At least one API key is required:
OPENAI_API_KEY - For OpenAI DALL-E 3 image generationGOOGLE_API_KEY - For Google Gemini (Nano Banana / Nano Banana Pro)See the repository README for setup instructions.
gpt-image-1.5 (state of the art, best quality)gpt-image-1 (great quality, cost-effective)gpt-image-1-mini (fastest, most affordable)autoautoauto⚠️ Note: DALL-E 2 and DALL-E 3 are deprecated and will stop being supported on 05/12/2026.
gemini-2.5-flash-image): Fast, efficient, 1K resolution, up to 3 reference imagesgemini-3-pro-image-preview): Professional quality, up to 4K, thinking mode, up to 14 reference images (default)⚠️ Use interactive questioning — ask ONE question at a time.
⚠️ Use the AskUserQuestion tool for each question below. Do not just print questions in your response — use the tool to create interactive prompts with the options shown.
Q0: Model Selection
"Which image generation model would you like to use?
- Google Gemini (Nano Banana Pro) - Up to 4K, 14 reference images, style transfer, thinking mode (Recommended)
- OpenAI GPT Image 1.5 - State of the art, transparency, streaming, up to 16 input images
- OpenAI GPT Image 1 - Great quality, transparency, image editing
- OpenAI GPT Image 1 Mini - Fastest, most affordable"
Wait for response. If user doesn't have a preference, recommend Gemini for editing/reference tasks or GPT Image 1.5 for pure generation.
Q1: Reference
"I'll generate that image for you! First — do you have any reference images?
- Product photos to include
- Style references
- Images to edit
- No, generate from scratch"
Wait for response.
Q2: Aspect Ratio
"What aspect ratio?
- 1:1 (square)
- 16:9 (landscape/widescreen)
- 9:16 (portrait/vertical)
- 4:3 / 3:4 (classic)
- Other (2:3, 3:2, 4:5, 5:4, 21:9)
- Or specify"
Wait for response.
Q3: Resolution
"What resolution?
- 1K (fast)
- 2K (balanced)
- 4K (highest quality)"
Wait for response.
Q4: Style
"Any style preferences?
- Photorealistic
- Artistic/painterly
- Cartoon/illustration
- 3D render
- Or describe your own"
Wait for response.
| Question | Determines |
|---|---|
| Reference | Generation vs editing mode |
| Aspect Ratio | Image dimensions |
| Resolution | Quality level |
| Style | Prompt enhancement direction |
Parsing:
Transform the user request into an effective image generation prompt:
Example transformation:
Use the model selected by the user in Q0:
Check which API keys are configured in environment:
OPENAI_API_KEY → GPT Image models availableGOOGLE_API_KEY → Gemini (Nano Banana Pro) availableIf the user's selected model isn't available: Inform them and offer alternatives.
Model mapping from Q0:
gemini.py with gemini-3-pro-image-previewopenai_image.py with gpt-image-1.5openai_image.py with gpt-image-1openai_image.py with gpt-image-1-miniExecute the appropriate script from ${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/:
For OpenAI GPT Image - Text to Image:
python3 ${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/openai_image.py \
--prompt "your enhanced prompt" \
--model "gpt-image-1" \
--size "1024x1024" \
--quality "high" \
--output "/path/to/output.png"
For OpenAI GPT Image - With Transparent Background:
python3 ${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/openai_image.py \
--prompt "A product icon with no background" \
--model "gpt-image-1" \
--background "transparent" \
--quality "high" \
--output "/path/to/output.png"
For OpenAI GPT Image - Image Editing (with reference images):
python3 ${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/openai_image.py \
--prompt "Add a wizard hat to this cat" \
--model "gpt-image-1" \
--image "/path/to/cat.jpg" \
--input-fidelity "high" \
--output "/path/to/output.png"
For OpenAI GPT Image - Multiple Reference Images:
python3 ${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/openai_image.py \
--prompt "Create a gift basket containing these items" \
--model "gpt-image-1" \
--image "/path/to/item1.png" \
--image "/path/to/item2.png" \
--image "/path/to/item3.png" \
--output "/path/to/output.png"
For OpenAI GPT Image - With Mask (Inpainting):
python3 ${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/openai_image.py \
--prompt "Replace the pool with a garden" \
--model "gpt-image-1" \
--image "/path/to/scene.jpg" \
--mask "/path/to/mask.png" \
--output "/path/to/output.png"
For OpenAI GPT Image - Streaming with Partial Images:
python3 ${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/openai_image.py \
--prompt "A beautiful sunset over mountains" \
--model "gpt-image-1" \
--stream \
--partial-images 2 \
--output "/path/to/output.png"
For Google Gemini (Nano Banana Pro) - Text to Image:
python3 ${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/gemini.py \
--prompt "your enhanced prompt" \
--model "gemini-3-pro-image-preview" \
--aspect-ratio "1:1" \
--resolution "2K" \
--output "/path/to/output.png"
For Google Gemini - With Reference Images (editing, product placement, etc.):
python3 ${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/gemini.py \
--prompt "Add a wizard hat to this cat" \
--image "/path/to/cat.jpg" \
--aspect-ratio "1:1" \
--resolution "2K"
For Google Gemini - Multiple Reference Images (composition, style transfer):
python3 ${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/gemini.py \
--prompt "Place this product on the kitchen counter in this scene" \
--image "/path/to/product.png" \
--image "/path/to/kitchen.jpg" \
--aspect-ratio "16:9" \
--resolution "2K"
For Google Gemini (Nano Banana - faster, fewer features):
python3 ${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/gemini.py \
--prompt "your enhanced prompt" \
--model "gemini-2.5-flash-image" \
--aspect-ratio "1:1"
Missing API key: Inform the user which key is needed and how to set it up:
API rate limit: Suggest waiting or trying the other API.
Content policy violation: Rephrase the prompt to be more appropriate.
Generation failed: Retry with simplified prompt or different API.
Both OpenAI GPT Image and Google Gemini support reference images for advanced editing:
OpenAI GPT Image: Up to 16 input images, with input_fidelity: high for preserving faces/logos
Google Gemini: Nano Banana (up to 3), Nano Banana Pro (up to 14)
Tip: For best results with reference images, be specific about what you want to preserve vs. change.
| Feature | GPT Image 1.5 | GPT Image 1 | GPT Image 1 Mini | Nano Banana | Nano Banana Pro |
|---|---|---|---|---|---|
| Provider | OpenAI | OpenAI | OpenAI | ||
| Model ID | gpt-image-1.5 | gpt-image-1 | gpt-image-1-mini | gemini-2.5-flash-image | gemini-3-pro-image-preview |
| Best for | State of the art | Quality + value | Speed + cost | Fast generation | Professional assets |
| Sizes | 1024², 1536x1024, 1024x1536, auto | Same | Same | 1K only | Up to 4K |
| Quality options | low, medium, high, auto | Same | Same | N/A | N/A |
| Aspect ratios | 3 + auto | Same | Same | 10 options | 10 options |
| Reference images | Up to 16 | Up to 16 | Up to 16 | Up to 3 | Up to 14 |
| Image editing | Yes | Yes | Yes | Yes | Yes |
| Inpainting (mask) | Yes | Yes | Yes | Yes | Yes |
| Transparent background | Yes | Yes | Yes | No | No |
| Streaming | Yes | Yes | Yes | No | No |
| Input fidelity | high/low | high/low | low only | N/A | N/A |
| Output formats | png, jpeg, webp | Same | Same | png | png |
| Compression | 0-100% | Same | Same | No | No |
| Text rendering | Excellent | Excellent | Good | Good | Excellent |
| Thinking mode | No | No | No | No | Yes |
| Max prompt length | 32,000 chars | 32,000 chars | 32,000 chars | N/A | N/A |
| Speed | ~30-60s | ~20-40s | ~10-20s | ~10-20s | ~30-60s |
⚠️ DALL-E 2 and DALL-E 3 are deprecated and will stop being supported on 05/12/2026. Use GPT Image models instead.