Help us improve
Share bugs, ideas, or general feedback.
From workflows
Analyzes media files (PDFs, images, diagrams, screenshots) using a vision backend to extract structured data, descriptions, or summaries instead of literal file reading.
How this skill is triggered — by the user, by Claude, or both
Slash command
/workflows:look-atThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Multi-backend vision tool for PDFs, images, diagrams, and other media files. Routes to Gemini CLI (default), GitHub Copilot (GPT-5.4), or the legacy Python API.
Share bugs, ideas, or general feedback.
Multi-backend vision tool for PDFs, images, diagrams, and other media files. Routes to Gemini CLI (default), GitHub Copilot (GPT-5.4), or the legacy Python API.
| Scenario | Read Tool | look_at Tool |
|---|---|---|
| PDF with table | Extracts raw text (~1000 tokens), loses table structure | Extracts table as structured data (~100 tokens) |
| Screenshot | Loads entire image (~500 tokens), requires interpretation | Describes content (~50 tokens) |
| Diagram | Shows image (~800 tokens), requires analysis | Explains architecture (~100 tokens) |
| Multi-page PDF | All pages loaded (~5000 tokens) | Extracts specific sections (~200 tokens) |
look_at saves 80-95% of context tokens by extracting only relevant information.
Use look_at when you need:
Never use look_at when:
look_at.sh routes to the selected backend (Gemini CLI by default)CRITICAL - Display Requirement:
Always set the Bash tool description parameter to show a clean invocation:
description: "look-at: [goal text]"
# Default (Gemini CLI — uses bundled quota, no API key needed)
"${CLAUDE_SKILL_DIR}/scripts/look_at.sh" \
--file "/path/to/file.pdf" \
--goal "Extract the title and date from this document"
# GPT-5.4 via GitHub Copilot
"${CLAUDE_SKILL_DIR}/scripts/look_at.sh" \
--file "/path/to/diagram.png" \
--goal "Describe the architecture" \
--backend copilot
# Multi-model consensus (gemini + copilot in parallel)
"${CLAUDE_SKILL_DIR}/scripts/look_at.sh" \
--file "/path/to/diagram.png" \
--goal "Score this diagram 0-10" \
--consensus
# Legacy Python API (uses your GOOGLE_API_KEY)
"${CLAUDE_SKILL_DIR}/scripts/look_at.sh" \
--file "/path/to/file.pdf" \
--goal "Extract the table data" \
--backend api
${CLAUDE_SKILL_DIR} is substituted at skill load time, so the full path is already resolved — no per-call discovery needed.
IMPORTANT:
description to "look-at: [goal]" for clean UX| Backend | CLI | Model | Cost | Best For |
|---|---|---|---|---|
gemini (default) | gemini CLI | Gemini (CLI default) | Bundled quota | General vision, diagrams, documents |
copilot | GitHub Copilot CLI | GPT-5.4 | Copilot subscription | Second opinions, consensus |
api | look_at.py | Gemini API (configurable) | Your API key | Agentic mode, custom models |
--consensus runs gemini and copilot in parallel and outputs both results under labeled headers (=== GEMINI ===, === COPILOT (GPT-5.4) ===).
When to use: Visual verification of diagrams where a single model may miss or underscore defects. Trust the stricter score — if any backend flags BLOCKING, treat it as BLOCKING.
When using look_at, the response includes:
Use this extracted information directly in continued work without loading the full file into context.
| Type | Extensions | MIME Types |
|---|---|---|
| Images | .jpg, .jpeg, .png, .webp, .heic, .heif | image/* |
| Videos | .mp4, .mpeg, .mov, .avi, .webm | video/* |
| Audio | .wav, .mp3, .aiff, .aac, .ogg, .flac | audio/* |
| Documents | .pdf, .txt, .csv, .md, .html | application/pdf, text/* |
| Model | Use Case | Speed | Cost |
|---|---|---|---|
gemini-2.5-flash-lite | Default - fast, cheap analysis | Fastest | Lowest |
gemini-3-flash | More complex extraction needs | Fast | Low |
gemini-3-flash-preview | Agentic vision with code execution | Fast | Low |
gemini-3-pro-preview | Highest accuracy required | Medium | Medium |
Default is gemini-2.5-flash-lite for optimal speed/cost ratio.
For complex visual reasoning tasks, use the --agentic flag to enable code execution. This allows Gemini to:
When to use --agentic:
Usage:
"${CLAUDE_SKILL_DIR}/scripts/look_at.sh" \
--file "photo.jpg" \
--goal "Count the number of people in this image" \
--agentic
Note: Agentic mode automatically uses gemini-3-flash-preview regardless of the --model setting.
REMEMBER: Always use description: "look-at: [goal]" in the Bash tool call.
# Bash tool call with:
# description: "look-at: Extract the executive summary section"
"${CLAUDE_SKILL_DIR}/scripts/look_at.sh" \
--file "report.pdf" \
--goal "Extract the executive summary section"
# Bash tool call with:
# description: "look-at: List all UI elements and their layout"
"${CLAUDE_SKILL_DIR}/scripts/look_at.sh" \
--file "screenshot.png" \
--goal "List all UI elements and their layout"
# Bash tool call with:
# description: "look-at: Explain the data flow and component relationships"
"${CLAUDE_SKILL_DIR}/scripts/look_at.sh" \
--file "architecture.png" \
--goal "Explain the data flow and component relationships"
# Bash tool call with:
# description: "look-at: Extract the table data as JSON"
"${CLAUDE_SKILL_DIR}/scripts/look_at.sh" \
--file "table.pdf" \
--goal "Extract the table data as JSON with columns: name, value, date"
# Bash tool call with:
# description: "look-at: Count the number of people in the photo"
"${CLAUDE_SKILL_DIR}/scripts/look_at.sh" \
--file "crowd.jpg" \
--goal "Count the number of people visible in this image" \
--agentic
# Bash tool call with:
# description: "look-at: Extract specific data points from the chart"
"${CLAUDE_SKILL_DIR}/scripts/look_at.sh" \
--file "quarterly_chart.png" \
--goal "Extract the exact values for each quarter and calculate the year-over-year change" \
--agentic
Required environment variable:
export GOOGLE_API_KEY="your-api-key-here"
Required Python package:
pip install google-genai
For pixi-managed projects, add to pixi.toml:
[dependencies]
google-genai = ">=1.0.0"
| Issue | Solution |
|---|---|
| API key not set | Set GOOGLE_API_KEY environment variable |
| File not found | Use absolute paths, verify file exists |
| Large file timeout | Break into smaller files or use lower-quality images |
| Rate limit errors | Add retry logic or use batch processing |
| Empty response | Check that goal is clear and specific |
See examples/ directory for:
analyze_pdf.sh - PDF document extractiondescribe_image.sh - Image analysisextract_table.sh - Structured data extraction/gemini-batch - For batch processing of many filesRead tool - For text files needing exact contentsnpx claudepluginhub edwinhu/workflows --plugin workflowsAnalyzes images with MiniMax vision tool for description, OCR, text extraction, UI mockup review, chart data parsing, diagrams. Auto-triggers on image shares or analysis requests.
Describes UI screenshots, architecture diagrams, charts, photos, code screenshots, and terminal output using Read tool, documenting only visible elements.
Reads and understands files in multiple formats (Word, PDF, PPTX, Excel, CSV, images, video) beyond text extraction, capturing structure, logic, charts, and data for actionable summaries. Automatically activated when files are present or requested.