From utility-skills
Generates images from text prompts and text via reverse-engineered Gemini Web API. Supports vision input with local images, downloads/saves images, multi-turn chats. TypeScript CLI with Bun.
npx claudepluginhub xuanxuan1983/baoyu-xuanyi-skills --plugin ai-generation-skillsThis skill uses the workspace's default tool permissions.
Supports:
scripts/gemini-webapi/client.tsscripts/gemini-webapi/components/gem-mixin.tsscripts/gemini-webapi/components/index.tsscripts/gemini-webapi/constants.tsscripts/gemini-webapi/exceptions.tsscripts/gemini-webapi/index.tsscripts/gemini-webapi/types/candidate.tsscripts/gemini-webapi/types/gem.tsscripts/gemini-webapi/types/grpc.tsscripts/gemini-webapi/types/image.tsscripts/gemini-webapi/types/index.tsscripts/gemini-webapi/types/modeloutput.tsscripts/gemini-webapi/utils/cookie-file.tsscripts/gemini-webapi/utils/decorators.tsscripts/gemini-webapi/utils/get-access-token.tsscripts/gemini-webapi/utils/http.tsscripts/gemini-webapi/utils/index.tsscripts/gemini-webapi/utils/load-browser-cookies.tsscripts/gemini-webapi/utils/logger.tsscripts/gemini-webapi/utils/parsing.tsGenerates text and images via reverse-engineered Gemini Web API. Supports prompts, reference images for vision input, and multi-turn conversations. Useful for Gemini image/text generation or as backend for other skills.
Generates text and images via reverse-engineered Gemini Web API. Supports prompts, reference images for vision input, and multi-turn conversations. Useful when needing Gemini image/text generation backend.
Generates images and text from prompts using Google Gemini Web. Supports reference image uploads, multi-turn sessions, and experimental video generation as backend for other skills.
Share bugs, ideas, or general feedback.
Supports:
--sessionIdImportant: All scripts are located in the scripts/ subdirectory of this skill.
Agent Execution Instructions:
SKILL_DIR${SKILL_DIR}/scripts/<script-name>.ts${SKILL_DIR} in this document with the actual pathScript Reference:
| Script | Purpose |
|---|---|
scripts/main.ts | CLI entry point for text/image generation |
scripts/gemini-webapi/* | TypeScript port of gemini_webapi (GeminiClient, types, utils) |
Before using this skill, the consent check MUST be performed.
Step 1: Check consent file
# macOS
cat ~/Library/Application\ Support/baoyu-skills/gemini-web/consent.json 2>/dev/null
# Linux
cat ~/.local/share/baoyu-skills/gemini-web/consent.json 2>/dev/null
# Windows (PowerShell)
Get-Content "$env:APPDATA\baoyu-skills\gemini-web\consent.json" 2>$null
Step 2: If consent exists and accepted: true with matching disclaimerVersion: "1.0":
Print warning and proceed:
⚠️ Warning: Using reverse-engineered Gemini Web API (not official). Accepted on: <acceptedAt date>
Step 3: If consent file doesn't exist or disclaimerVersion mismatch:
Display disclaimer and ask user:
⚠️ DISCLAIMER
This tool uses a reverse-engineered Gemini Web API, NOT an official Google API.
Risks:
- May break without notice if Google changes their API
- No official support or guarantees
- Use at your own risk
Do you accept these terms and wish to continue?
Use AskUserQuestion tool with options:
Step 4: On acceptance, create consent file:
# macOS
mkdir -p ~/Library/Application\ Support/baoyu-skills/gemini-web
cat > ~/Library/Application\ Support/baoyu-skills/gemini-web/consent.json << 'EOF'
{
"version": 1,
"accepted": true,
"acceptedAt": "<ISO timestamp>",
"disclaimerVersion": "1.0"
}
EOF
# Linux
mkdir -p ~/.local/share/baoyu-skills/gemini-web
cat > ~/.local/share/baoyu-skills/gemini-web/consent.json << 'EOF'
{
"version": 1,
"accepted": true,
"acceptedAt": "<ISO timestamp>",
"disclaimerVersion": "1.0"
}
EOF
Step 5: On decline, output message and stop:
User declined the disclaimer. Exiting.
npx -y bun ${SKILL_DIR}/scripts/main.ts "Hello, Gemini"
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "Explain quantum computing"
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cute cat" --image cat.png
npx -y bun ${SKILL_DIR}/scripts/main.ts --promptfiles system.md content.md --image out.png
# Multi-turn conversation (agent generates unique sessionId)
npx -y bun ${SKILL_DIR}/scripts/main.ts "Remember this: 42" --sessionId my-unique-id-123
npx -y bun ${SKILL_DIR}/scripts/main.ts "What number?" --sessionId my-unique-id-123
# Simple prompt (positional)
npx -y bun ${SKILL_DIR}/scripts/main.ts "Your prompt here"
# Explicit prompt flag
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "Your prompt here"
npx -y bun ${SKILL_DIR}/scripts/main.ts -p "Your prompt here"
# With model selection
npx -y bun ${SKILL_DIR}/scripts/main.ts -p "Hello" -m gemini-2.5-pro
# Pipe from stdin
echo "Summarize this" | npx -y bun ${SKILL_DIR}/scripts/main.ts
# Generate image with default path (./generated.png)
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A sunset over mountains" --image
# Generate image with custom path
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cute robot" --image robot.png
# Shorthand
npx -y bun ${SKILL_DIR}/scripts/main.ts "A dragon" --image=dragon.png
# Text + image -> text
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "Describe this image" --reference a.png
# Text + image -> image
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "Generate a variation" --reference a.png --image out.png
# Plain text (default)
npx -y bun ${SKILL_DIR}/scripts/main.ts "Hello"
# JSON output
npx -y bun ${SKILL_DIR}/scripts/main.ts "Hello" --json
| Option | Description |
|---|---|
--prompt <text>, -p | Prompt text |
--promptfiles <files...> | Read prompt from files (concatenated in order) |
--model <id>, -m | Model: gemini-3-pro (default), gemini-2.5-pro, gemini-2.5-flash |
--image [path] | Generate image, save to path (default: generated.png) |
--reference <files...>, --ref <files...> | Reference images for vision input |
--sessionId <id> | Session ID for multi-turn conversation (agent generates unique ID) |
--list-sessions | List saved sessions (max 100, sorted by update time) |
--json | Output as JSON |
--login | Refresh cookies only, then exit |
--cookie-path <path> | Custom cookie file path |
--profile-dir <path> | Chrome profile directory |
--help, -h | Show help |
CLI note: scripts/main.ts supports text generation, image generation, reference images (--reference/--ref), and multi-turn conversations via --sessionId.
gemini-3-pro - Default, latest modelgemini-2.5-pro - Previous generation progemini-2.5-flash - Fast, lightweightFirst run opens Chrome to authenticate with Google. Cookies are cached for subsequent runs.
# Force cookie refresh
npx -y bun ${SKILL_DIR}/scripts/main.ts --login
| Variable | Description |
|---|---|
GEMINI_WEB_DATA_DIR | Data directory |
GEMINI_WEB_COOKIE_PATH | Cookie file path |
GEMINI_WEB_CHROME_PROFILE_DIR | Chrome profile directory |
GEMINI_WEB_CHROME_PATH | Chrome executable path |
npx -y bun ${SKILL_DIR}/scripts/main.ts "What is the capital of France?"
npx -y bun ${SKILL_DIR}/scripts/main.ts "A photorealistic image of a golden retriever puppy" --image puppy.png
npx -y bun ${SKILL_DIR}/scripts/main.ts "Hello" --json | jq '.text'
# Concatenate system.md + content.md as prompt
npx -y bun ${SKILL_DIR}/scripts/main.ts --promptfiles system.md content.md --image output.png
# Start a session with unique ID (agent generates this)
npx -y bun ${SKILL_DIR}/scripts/main.ts "You are a helpful math tutor." --sessionId task-abc123
# Continue the conversation (remembers context)
npx -y bun ${SKILL_DIR}/scripts/main.ts "What is 2+2?" --sessionId task-abc123
npx -y bun ${SKILL_DIR}/scripts/main.ts "Now multiply that by 10" --sessionId task-abc123
# List recent sessions (max 100, sorted by update time)
npx -y bun ${SKILL_DIR}/scripts/main.ts --list-sessions
Session files are stored in ~/Library/Application Support/baoyu-skills/gemini-web/sessions/<id>.json and contain:
id: Session IDmetadata: Gemini chat metadata for continuationmessages: Array of {role, content, timestamp, error?}createdAt, updatedAt: TimestampsCustom configurations via EXTEND.md.
Check paths (priority order):
.baoyu-skills/baoyu-danger-gemini-web/EXTEND.md (project)~/.baoyu-skills/baoyu-danger-gemini-web/EXTEND.md (user)If found, load before workflow. Extension content overrides defaults.