This skill should be used when the user asks to 'look at', 'analyze', 'describe', 'extract from', or 'what's in' media files like PDFs, images, diagrams, screenshots, or charts. Triggers include: 'what does this image show', 'extract the table from this PDF', 'describe this diagram', 'what's in this screenshot', 'analyze this chart', 'read this image', 'get text from this PDF', 'summarize this document', or requests for specific data extraction from visual or document files. Use when analyzed/interpreted content is needed rather than literal file reading (which uses Read tool).
/plugin marketplace add edwinhu/workflows/plugin install workflows@edwinhu-pluginsThis skill inherits all available tools. When active, it can use any tool Claude has access to.
README.mdexamples/analyze_pdf.shexamples/describe_image.shexamples/extract_table.shreferences/api-details.mdreferences/use-cases.mdrequirements.txtscripts/look_at.pyFast, cost-effective file analysis using Google's Gemini 2.5 Flash Lite model for PDFs, images, diagrams, and other media files.
| Excuse | Reality | Do Instead |
|---|---|---|
| "I can read images directly with Read" | You'll waste thousands of context tokens showing the full image | Use look_at for analysis |
| "I'll use Read for this PDF" | You'll lose table structure and visual information by extracting raw text | Use look_at for PDFs with tables/charts/diagrams |
| "Just a quick glance at the file" | Your quick glances still consume full context tokens | Use look_at for targeted extraction |
| "I need exact text, so Read is required" | Gemini's extraction is accurate for most use cases | Use look_at first, Read only if extraction insufficient |
| "look_at adds complexity" | You gain context savings and faster processing | Use look_at for media files |
| "The file is small" | Your small files still waste context if uninterpreted | Size doesn't determine tool choice, content type does |
| "I'll process it myself" | You waste reasoning tokens on trivial extraction | Delegate to look_at |
| Scenario | Read Tool | look_at Tool |
|---|---|---|
| PDF with table | Extracts raw text (~1000 tokens), loses table structure | Extracts table as structured data (~100 tokens) |
| Screenshot | Loads entire image (~500 tokens), requires interpretation | Describes content (~50 tokens) |
| Diagram | Shows image (~800 tokens), requires analysis | Explains architecture (~100 tokens) |
| Multi-page PDF | All pages loaded (~5000 tokens) | Extracts specific sections (~200 tokens) |
look_at saves 80-95% of context tokens by extracting only relevant information.
Use look_at when you need:
Never use look_at when:
CRITICAL - Display Requirement:
Always set the Bash tool description parameter to show a clean invocation:
description: "look-at: [goal text]"
Never display the full Python command to the user.
# Basic usage
python3 ${CLAUDE_PLUGIN_ROOT}/skills/look-at/scripts/look_at.py \
--file "/path/to/file.pdf" \
--goal "Extract the title and date from this document"
# With custom model
python3 ${CLAUDE_PLUGIN_ROOT}/skills/look-at/scripts/look_at.py \
--file "/path/to/diagram.png" \
--goal "Describe the architecture shown in this diagram" \
--model "gemini-2.5-flash"
IMPORTANT:
description to "look-at: [goal]" for clean UXWhen using look_at, the response includes:
Use this extracted information directly in continued work without loading the full file into context.
| Type | Extensions | MIME Types |
|---|---|---|
| Images | .jpg, .jpeg, .png, .webp, .heic, .heif | image/* |
| Videos | .mp4, .mpeg, .mov, .avi, .webm | video/* |
| Audio | .wav, .mp3, .aiff, .aac, .ogg, .flac | audio/* |
| Documents | .pdf, .txt, .csv, .md, .html | application/pdf, text/* |
| Model | Use Case | Speed | Cost |
|---|---|---|---|
gemini-2.5-flash-lite | Default - fast, cheap analysis | Fastest | Lowest |
gemini-3-flash | More complex extraction needs | Fast | Low |
gemini-3-pro-preview | Highest accuracy required | Medium | Medium |
Default is gemini-2.5-flash-lite for optimal speed/cost ratio.
REMEMBER: Always use description: "look-at: [goal]" in the Bash tool call.
# Bash tool call with:
# description: "look-at: Extract the executive summary section"
python3 ${CLAUDE_PLUGIN_ROOT}/skills/look-at/scripts/look_at.py \
--file "report.pdf" \
--goal "Extract the executive summary section"
# Bash tool call with:
# description: "look-at: List all UI elements and their layout"
python3 ${CLAUDE_PLUGIN_ROOT}/skills/look-at/scripts/look_at.py \
--file "screenshot.png" \
--goal "List all UI elements and their layout"
# Bash tool call with:
# description: "look-at: Explain the data flow and component relationships"
python3 ${CLAUDE_PLUGIN_ROOT}/skills/look-at/scripts/look_at.py \
--file "architecture.png" \
--goal "Explain the data flow and component relationships"
# Bash tool call with:
# description: "look-at: Extract the table data as JSON"
python3 ${CLAUDE_PLUGIN_ROOT}/skills/look-at/scripts/look_at.py \
--file "table.pdf" \
--goal "Extract the table data as JSON with columns: name, value, date"
Required environment variable:
export GOOGLE_API_KEY="your-api-key-here"
Required Python package:
pip install google-genai
For pixi-managed projects, add to pixi.toml:
[dependencies]
google-genai = ">=1.0.0"
| Issue | Solution |
|---|---|
| API key not set | Set GOOGLE_API_KEY environment variable |
| File not found | Use absolute paths, verify file exists |
| Large file timeout | Break into smaller files or use lower-quality images |
| Rate limit errors | Add retry logic or use batch processing |
| Empty response | Check that goal is clear and specific |
See examples/ directory for:
analyze_pdf.sh - PDF document extractiondescribe_image.sh - Image analysisextract_table.sh - Structured data extraction/gemini-batch - For batch processing of many filesRead tool - For text files needing exact contentsThis skill should be used when the user asks to "create an agent", "add an agent", "write a subagent", "agent frontmatter", "when to use description", "agent examples", "agent tools", "agent colors", "autonomous agent", or needs guidance on agent structure, system prompts, triggering conditions, or agent development best practices for Claude Code plugins.
This skill should be used when the user asks to "create a slash command", "add a command", "write a custom command", "define command arguments", "use command frontmatter", "organize commands", "create command with file references", "interactive command", "use AskUserQuestion in command", or needs guidance on slash command structure, YAML frontmatter fields, dynamic arguments, bash execution in commands, user interaction patterns, or command development best practices for Claude Code.
This skill should be used when the user asks to "create a hook", "add a PreToolUse/PostToolUse/Stop hook", "validate tool use", "implement prompt-based hooks", "use ${CLAUDE_PLUGIN_ROOT}", "set up event-driven automation", "block dangerous commands", or mentions hook events (PreToolUse, PostToolUse, Stop, SubagentStop, SessionStart, SessionEnd, UserPromptSubmit, PreCompact, Notification). Provides comprehensive guidance for creating and implementing Claude Code plugin hooks with focus on advanced prompt-based hooks API.