From workflows
This skill should be used when the user asks to 'look at', 'analyze', 'describe', 'extract from', or 'what's in' media files like PDFs, images, diagrams, screenshots, or charts. Triggers include: 'what does this image show', 'extract the table from this PDF', 'describe this diagram', 'what's in this screenshot', 'analyze this chart', 'read this image', 'get text from this PDF', 'summarize this document', or requests for specific data extraction from visual or document files. Use when analyzed/interpreted content is needed rather than literal file reading (which uses Read tool).
npx claudepluginhub edwinhu/workflows --plugin workflowsThis skill uses the workspace's default tool permissions.
Multi-backend vision tool for PDFs, images, diagrams, and other media files. Routes to Gemini CLI (default), GitHub Copilot (GPT-5.4), or the legacy Python API.
Implements Playwright E2E testing patterns: Page Object Model, test organization, configuration, reporters, artifacts, and CI/CD integration for stable suites.
Guides Next.js 16+ Turbopack for faster dev via incremental bundling, FS caching, and HMR; covers webpack comparison, bundle analysis, and production builds.
Discovers and evaluates Laravel packages via LaraPlugins.io MCP. Searches by keyword/feature, filters by health score, Laravel/PHP compatibility; fetches details, metrics, and version history.
Multi-backend vision tool for PDFs, images, diagrams, and other media files. Routes to Gemini CLI (default), GitHub Copilot (GPT-5.4), or the legacy Python API.
| Excuse | Reality | Do Instead |
|---|---|---|
| "I can read images directly with Read" | You'll waste thousands of context tokens showing the full image | Use look_at for analysis |
| "I'll use Read for this PDF" | You'll lose table structure and visual information by extracting raw text | Use look_at for PDFs with tables/charts/diagrams |
| "Just a quick glance at the file" | Your quick glances still consume full context tokens | Use look_at for targeted extraction |
| "I need exact text, so Read is required" | Gemini's extraction is accurate for most use cases | Use look_at first, Read only if extraction insufficient |
| "look_at adds complexity" | You gain context savings and faster processing | Use look_at for media files |
| "The file is small" | Your small files still waste context if uninterpreted | Size doesn't determine tool choice, content type does |
| "I'll process it myself" | You waste reasoning tokens on trivial extraction | Delegate to look_at |
| Scenario | Read Tool | look_at Tool |
|---|---|---|
| PDF with table | Extracts raw text (~1000 tokens), loses table structure | Extracts table as structured data (~100 tokens) |
| Screenshot | Loads entire image (~500 tokens), requires interpretation | Describes content (~50 tokens) |
| Diagram | Shows image (~800 tokens), requires analysis | Explains architecture (~100 tokens) |
| Multi-page PDF | All pages loaded (~5000 tokens) | Extracts specific sections (~200 tokens) |
look_at saves 80-95% of context tokens by extracting only relevant information.
Use look_at when you need:
Never use look_at when:
look_at.sh routes to the selected backend (Gemini CLI by default)CRITICAL - Display Requirement:
Always set the Bash tool description parameter to show a clean invocation:
description: "look-at: [goal text]"
# Default (Gemini CLI — uses bundled quota, no API key needed)
"${CLAUDE_SKILL_DIR}/scripts/look_at.sh" \
--file "/path/to/file.pdf" \
--goal "Extract the title and date from this document"
# GPT-5.4 via GitHub Copilot
"${CLAUDE_SKILL_DIR}/scripts/look_at.sh" \
--file "/path/to/diagram.png" \
--goal "Describe the architecture" \
--backend copilot
# Multi-model consensus (gemini + copilot in parallel)
"${CLAUDE_SKILL_DIR}/scripts/look_at.sh" \
--file "/path/to/diagram.png" \
--goal "Score this diagram 0-10" \
--consensus
# Legacy Python API (uses your GOOGLE_API_KEY)
"${CLAUDE_SKILL_DIR}/scripts/look_at.sh" \
--file "/path/to/file.pdf" \
--goal "Extract the table data" \
--backend api
${CLAUDE_SKILL_DIR} is substituted at skill load time, so the full path is already resolved — no per-call discovery needed.
IMPORTANT:
description to "look-at: [goal]" for clean UX| Backend | CLI | Model | Cost | Best For |
|---|---|---|---|---|
gemini (default) | gemini CLI | Gemini (CLI default) | Bundled quota | General vision, diagrams, documents |
copilot | GitHub Copilot CLI | GPT-5.4 | Copilot subscription | Second opinions, consensus |
api | look_at.py | Gemini API (configurable) | Your API key | Agentic mode, custom models |
--consensus runs gemini and copilot in parallel and outputs both results under labeled headers (=== GEMINI ===, === COPILOT (GPT-5.4) ===).
When to use: Visual verification of diagrams where a single model may miss or underscore defects. Trust the stricter score — if any backend flags BLOCKING, treat it as BLOCKING.
When using look_at, the response includes:
Use this extracted information directly in continued work without loading the full file into context.
| Type | Extensions | MIME Types |
|---|---|---|
| Images | .jpg, .jpeg, .png, .webp, .heic, .heif | image/* |
| Videos | .mp4, .mpeg, .mov, .avi, .webm | video/* |
| Audio | .wav, .mp3, .aiff, .aac, .ogg, .flac | audio/* |
| Documents | .pdf, .txt, .csv, .md, .html | application/pdf, text/* |
| Model | Use Case | Speed | Cost |
|---|---|---|---|
gemini-2.5-flash-lite | Default - fast, cheap analysis | Fastest | Lowest |
gemini-3-flash | More complex extraction needs | Fast | Low |
gemini-3-flash-preview | Agentic vision with code execution | Fast | Low |
gemini-3-pro-preview | Highest accuracy required | Medium | Medium |
Default is gemini-2.5-flash-lite for optimal speed/cost ratio.
For complex visual reasoning tasks, use the --agentic flag to enable code execution. This allows Gemini to:
When to use --agentic:
Usage:
"${CLAUDE_SKILL_DIR}/scripts/look_at.sh" \
--file "photo.jpg" \
--goal "Count the number of people in this image" \
--agentic
Note: Agentic mode automatically uses gemini-3-flash-preview regardless of the --model setting.
REMEMBER: Always use description: "look-at: [goal]" in the Bash tool call.
# Bash tool call with:
# description: "look-at: Extract the executive summary section"
"${CLAUDE_SKILL_DIR}/scripts/look_at.sh" \
--file "report.pdf" \
--goal "Extract the executive summary section"
# Bash tool call with:
# description: "look-at: List all UI elements and their layout"
"${CLAUDE_SKILL_DIR}/scripts/look_at.sh" \
--file "screenshot.png" \
--goal "List all UI elements and their layout"
# Bash tool call with:
# description: "look-at: Explain the data flow and component relationships"
"${CLAUDE_SKILL_DIR}/scripts/look_at.sh" \
--file "architecture.png" \
--goal "Explain the data flow and component relationships"
# Bash tool call with:
# description: "look-at: Extract the table data as JSON"
"${CLAUDE_SKILL_DIR}/scripts/look_at.sh" \
--file "table.pdf" \
--goal "Extract the table data as JSON with columns: name, value, date"
# Bash tool call with:
# description: "look-at: Count the number of people in the photo"
"${CLAUDE_SKILL_DIR}/scripts/look_at.sh" \
--file "crowd.jpg" \
--goal "Count the number of people visible in this image" \
--agentic
# Bash tool call with:
# description: "look-at: Extract specific data points from the chart"
"${CLAUDE_SKILL_DIR}/scripts/look_at.sh" \
--file "quarterly_chart.png" \
--goal "Extract the exact values for each quarter and calculate the year-over-year change" \
--agentic
Required environment variable:
export GOOGLE_API_KEY="your-api-key-here"
Required Python package:
pip install google-genai
For pixi-managed projects, add to pixi.toml:
[dependencies]
google-genai = ">=1.0.0"
| Issue | Solution |
|---|---|
| API key not set | Set GOOGLE_API_KEY environment variable |
| File not found | Use absolute paths, verify file exists |
| Large file timeout | Break into smaller files or use lower-quality images |
| Rate limit errors | Add retry logic or use batch processing |
| Empty response | Check that goal is clear and specific |
See examples/ directory for:
analyze_pdf.sh - PDF document extractiondescribe_image.sh - Image analysisextract_table.sh - Structured data extraction/gemini-batch - For batch processing of many filesRead tool - For text files needing exact contents