Skill

gemini-image-coder

Generate and edit images using Google's Gemini API via Python scripts. Supports text-to-image, image editing, multi-turn refinement, custom resolutions, and aspect ratios.

Python

ai-ml

Popularity

Parent stars

Parent forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/majestic-creative:gemini-image-coder

User invocable

Model invocable

Inline context

Default effort

Tool Access

This skill is limited to the following tools:

Read Write Bash WebSearch

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Generate and edit images using Google's Gemini API. Requires `GEMINI_API_KEY` environment variable.

Supporting Files

scripts/edit_image.pyscripts/generate_image.py

SKILL.md

221 lines · ~1.6k tokens

Stats

LanguageShell

Parent stars39

Parent forks7

MaintenanceGood

Last CommitMar 22, 2026

Actions

View Source View Plugin View on GitHub View README

Gemini Image Generation

Generate and edit images using Google's Gemini API. Requires GEMINI_API_KEY environment variable.

Quick Reference

Setting	Default	Options
Model	`gemini-3-pro-image-preview`	Use this for all generation
Resolution	1K	1K, 2K, 4K
Aspect Ratio	1:1	1:1, 2:3, 3:2, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, 21:9

CLI Scripts

Generate Image

python scripts/generate_image.py "A cat in space" output.jpg
python scripts/generate_image.py "Epic landscape" landscape.jpg --aspect 16:9 --size 2K
python scripts/generate_image.py "Logo for Acme Corp" logo.jpg --aspect 1:1

Edit Image

python scripts/edit_image.py input.jpg "Add a rainbow" output.jpg
python scripts/edit_image.py photo.jpg "Make it look like Van Gogh" artistic.jpg

Core API Pattern

import os
from google import genai
from google.genai import types

client = genai.Client(api_key=os.environ["GEMINI_API_KEY"])

response = client.models.generate_content(
    model="gemini-3-pro-image-preview",
    contents=["Your prompt here"],
    config=types.GenerateContentConfig(
        response_modalities=['TEXT', 'IMAGE'],
    ),
)

for part in response.parts:
    if part.text:
        print(part.text)
    elif part.inline_data:
        image = part.as_image()
        image.save("output.jpg")  # Always use .jpg!

Custom Resolution & Aspect Ratio

response = client.models.generate_content(
    model="gemini-3-pro-image-preview",
    contents=[prompt],
    config=types.GenerateContentConfig(
        response_modalities=['TEXT', 'IMAGE'],
        image_config=types.ImageConfig(
            aspect_ratio="16:9",
            image_size="2K"
        ),
    )
)

Editing Images

from PIL import Image

img = Image.open("input.jpg")
response = client.models.generate_content(
    model="gemini-3-pro-image-preview",
    contents=["Add a sunset to this scene", img],
    config=types.GenerateContentConfig(
        response_modalities=['TEXT', 'IMAGE'],
    ),
)

Multi-Turn Refinement

chat = client.chats.create(
    model="gemini-3-pro-image-preview",
    config=types.GenerateContentConfig(response_modalities=['TEXT', 'IMAGE'])
)

response = chat.send_message("Create a logo for 'Acme Corp'")
# Save first image...

response = chat.send_message("Make the text bolder and add a blue gradient")
# Save refined image...

Prompting Best Practices

Style	Prompt Pattern
Photorealistic	Include camera: lens, lighting, angle, mood
Stylized Art	Specify style explicitly: "kawaii-style", "cel-shading"
Text in Images	Be explicit: font style, placement, colors
Product Mockups	Describe lighting setup and surface

Examples

# Photorealistic
"A photorealistic close-up portrait, 85mm lens, soft golden hour light, shallow depth of field"

# Stylized
"A kawaii-style sticker of a happy red panda, bold outlines, cel-shading, white background"

# Logo with text
"Create a logo with text 'Daily Grind' in clean sans-serif, black and white, coffee bean motif"

# Product mockup
"Studio-lit product photo on polished concrete, three-point softbox setup, 45-degree angle"

Advanced Features

Google Search Grounding

response = client.models.generate_content(
    model="gemini-3-pro-image-preview",
    contents=["Visualize today's weather in Tokyo as an infographic"],
    config=types.GenerateContentConfig(
        response_modalities=['TEXT', 'IMAGE'],
        tools=[{"google_search": {}}]
    )
)

Multiple Reference Images (Up to 14)

response = client.models.generate_content(
    model="gemini-3-pro-image-preview",
    contents=[
        "Create a group photo of these people in an office",
        Image.open("person1.jpg"),
        Image.open("person2.jpg"),
        Image.open("person3.jpg"),
    ],
    config=types.GenerateContentConfig(
        response_modalities=['TEXT', 'IMAGE'],
    ),
)

Critical: File Format

Gemini returns JPEG by default. Always use .jpg extension.

# CORRECT
image.save("output.jpg")

# WRONG - causes "Image does not match media type" errors
image.save("output.png")  # Creates JPEG with PNG extension!

If PNG is Required

from PIL import Image

for part in response.parts:
    if part.inline_data:
        img = part.as_image()
        img.save("output.png", format="PNG")  # Explicit conversion

Multi-Image Consistency

When generating a set of images that must look like the same scene (e.g., room makeovers, product variations, before/after sequences):

Lock the architecture, vary only the style.

Write one detailed base description: dimensions, camera angle, window count/position, door location, furniture size, ceiling height, floor type
Copy the base description identically into every prompt
Change only the style portion: colors, materials, decor, lighting fixtures

Common failures without this technique:

Windows appear/disappear between images
Room dimensions change, furniture moves
Result looks like N different rooms, not one room in N styles

Prompt structure:

[Camera/phone type]. [Detailed room architecture — identical across all images].
[Natural lighting description]. [Orientation].
**[STYLE VARIATION — only this part changes per image]**

Tips:

Include "iPhone photo" and "realistic lighting" for photorealistic output
Add signs of life (mugs, remotes, books) so spaces feel inhabited, not staged
"Before" images should look modern but tired, not derelict
Always use portrait orientation (9:16 / 2:3) for social media slideshows

Notes

All generated images include SynthID watermarks
Default to 1K for speed; use 2K/4K when quality is critical
For editing, describe changes conversationally—the model understands semantic masking
Image-only mode won't work with Google Search grounding

gemini-image-coder

Popularity

Invocation

Tool Access

Context Preview

Supporting Files

SKILL.md

gemini-image-coder

Popularity

Invocation

Tool Access

Context Preview

Supporting Files

SKILL.md

Gemini Image Generation

Quick Reference

CLI Scripts

Generate Image

Edit Image

Core API Pattern

Custom Resolution & Aspect Ratio

Editing Images

Multi-Turn Refinement

Prompting Best Practices

Examples

Advanced Features

Google Search Grounding

Multiple Reference Images (Up to 14)

Critical: File Format

If PNG is Required

Multi-Image Consistency

Notes

Similar Skills

Gemini Image Generation

Quick Reference

CLI Scripts

Generate Image

Edit Image

Core API Pattern

Custom Resolution & Aspect Ratio

Editing Images

Multi-Turn Refinement

Prompting Best Practices

Examples

Advanced Features

Google Search Grounding

Multiple Reference Images (Up to 14)

Critical: File Format

If PNG is Required

Multi-Image Consistency

Notes

Similar Skills