From image-tools
Generate images using Google Gemini API. Use when user asks to generate, create, or make images, pictures, photos, or visual content. Also for editing images, image-to-image generation, or any AI image creation requests.
How this skill is triggered — by the user, by Claude, or both
Slash command
/image-tools:generate-imageThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Generate high-quality AI images using Google's Gemini API.
Generate high-quality AI images using Google's Gemini API.
Run the setup check to validate environment:
"${CLAUDE_PLUGIN_ROOT}/skills/generate-image/scripts/check-setup.sh"
This validates Python, venv, dependencies, API key, and shows existing images.
SCRIPT_DIR="${CLAUDE_PLUGIN_ROOT}/skills/generate-image/scripts"
"$SCRIPT_DIR/generate.sh" "your detailed prompt" \
--aspect-ratio 16:9 \
--resolution 4K \
--output images/slug-v1/image.png \
--user-request "user's original request verbatim" \
--composition "your reasoning: why you chose this style, composition, colors, etc."
Options:
--aspect-ratio: 1:1, 3:4, 4:3, 2:3, 3:2, 16:9, 9:16, 21:9, 9:21, 32:9, 2:1 (default: 1:1)--resolution: 1K, 2K, 4K (default: 2K)--output: Output filename (default: output.png)--fast: Use faster model (lower quality)--images: Reference image(s) for editing/fusion--user-request: Original user request (ALWAYS pass this)--composition: Your reasoning/composition notes explaining prompt choices (ALWAYS pass this)--no-metadata: Skip saving metadata YAML fileOutput: The script saves {image_name}_metadata.yaml alongside the image containing: user request, composition reasoning, final prompt, all parameters, token usage, timestamps, and model response.
Interactive (no prompt or vague prompt): Use AskUserQuestion to gather subject, style, aspect ratio, quality.
Direct (detailed prompt provided): Execute immediately with sensible defaults.
Programmatic (called from workflow): Never block, use defaults, execute immediately.
check-setup.sh script (especially on first use or errors)--user-request--compositionsunset-mountains)images/{slug}-v* folders for iterations--user-request, --composition)image_metadata.yaml)When the user's request lacks detail, use AskUserQuestion to gather:
User wants iteration when they say: "another version", "adjust", "modify", "change", "try with", "regenerate".
Find latest version, increment, archive previous in archive/v{N}/.
Gemini cannot generate transparent images. All generated images have a solid background.
When the user wants transparency, use a two-step process: generate with a chroma key background, then remove it with the image-tools:manipulate-image skill afterwards.
MANDATORY: Before picking a chroma key color, list every color the subject will contain. Think through:
Then pick the chroma key that has zero overlap with any of those colors.
| Chroma Key | Exact RGB | Hex | Gemini keyword | Use when subject does NOT contain |
|---|---|---|---|---|
| Green | RGB(0, 255, 0) | #00FF00 | "chroma key green" | Green, lime, emerald, forest tones |
| Magenta | RGB(255, 0, 255) | #FF00FF | "chroma key magenta" | Pink, purple, magenta, violet, fuchsia tones |
| Blue | RGB(0, 0, 255) | #0000FF | "chroma key blue" | Blue, sky blue, navy, cyan, teal tones |
| Yellow | RGB(255, 255, 0) | #FFFF00 | "chroma key yellow" | Yellow, gold, blonde, sunshine, amber tones |
Selection rules:
--composition flagAppend this exact block to the end of your prompt (replace [COLOR], [R,G,B], [HEX]):
The subject is placed on a chroma key [COLOR] studio backdrop (RGB [R,G,B], hex [HEX]). Plain studio setting with soft, diffused lighting on the subject only. The background is a perfectly flat, uniform solid [COLOR] with no gradients, no shadows, no texture, no patterns, no depth of field, no bokeh. Every pixel of the background must be the same [COLOR] color (RGB [R,G,B], #[HEX]) from edge to edge with zero variation.
Example for green chroma key:
The subject is placed on a chroma key green studio backdrop (RGB 0,255,0, hex #00FF00). Plain studio setting with soft, diffused lighting on the subject only. The background is a perfectly flat, uniform solid green with no gradients, no shadows, no texture, no patterns, no depth of field, no bokeh. Every pixel of the background must be the same green color (RGB 0,255,0, #00FF00) from edge to edge with zero variation.
Gemini won't produce exact RGB values, so always sample the actual corner pixel, then use the manipulate-image scripts directly to remove the background and trim.
VENV="${CLAUDE_PLUGIN_ROOT}/scripts/venv"
MANIPULATE="${CLAUDE_PLUGIN_ROOT}/skills/manipulate-image/scripts"
# 1. Sample actual background color from corner
"$VENV/bin/python" -c "from PIL import Image; print(Image.open('<image>').getpixel((0,0)))"
# 2. Remove chroma key with HSV detection + spill suppression
"$MANIPULATE/run.sh" alpha <image> --transparent "<actual R,G,B>" --tolerance 15 --feather 40 -o <output>
# 3. Trim empty space
"$MANIPULATE/run.sh" trim <output> -o <final>
User mentions "transparent", "no background", "PNG with alpha", "cutout", or "sticker". Always tell the user you're using a chroma key approach since the result won't be transparent until the second step.
For detailed setup, troubleshooting, prompt engineering, and file templates, see SETUP.md.
npx claudepluginhub dkmaker/my-claude-plugins --plugin image-toolsGenerates images from text, edits images with references, performs product placement, style transfer, and multi-image composition using OpenAI DALL-E or Google Gemini.
Generates AI images from text prompts, edits images, and composes from multiple references using Gemini models. Supports t2i, i2i, product mockups, and stickers.
Generates or edits images via Google Gemini from text prompts or up to 14 reference images for text-to-image and multi-image editing.