npx claudepluginhub tan-yong-sheng/ai-vision-mcpThis skill uses the workspace's default tool permissions.
AI-powered image and video analysis CLI using Google Gemini and Vertex AI models.
Analyzes images with MiniMax vision tool for description, OCR, text extraction, UI mockup review, chart data parsing, diagrams. Auto-triggers on image shares or analysis requests.
Analyzes images with Gemini Pro vision via CLI for OCR/text extraction, code from screenshots, UI/error analysis, diagrams, design feedback, data/charts, and before/after comparisons. Ideal for debugging visuals.
Processes audio, images, videos, PDFs via Google Gemini API: transcribe/summarize audio/video, caption/OCR/analyze images, extract tables/forms from docs, generate images. For multimodal AI tasks.
Share bugs, ideas, or general feedback.
AI-powered image and video analysis CLI using Google Gemini and Vertex AI models.
npm install -g ai-vision-mcp
# or use directly
npx ai-vision-mcp <command> [options]
Set your provider via environment variables:
Google AI Studio (Recommended)
export IMAGE_PROVIDER="google"
export VIDEO_PROVIDER="google"
export GEMINI_API_KEY="your-api-key"
Get your API key at aistudio.google.com/app/api-keys
Vertex AI
export IMAGE_PROVIDER="vertex_ai"
export VIDEO_PROVIDER="vertex_ai"
export VERTEX_CLIENT_EMAIL="your-service-account@project.iam.gserviceaccount.com"
export VERTEX_PRIVATE_KEY="-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n"
export VERTEX_PROJECT_ID="your-gcp-project-id"
export GCS_BUCKET_NAME="your-gcs-bucket"
Audit a website or UI design for accessibility, visual quality, WCAG contrast compliance, and design best practices.
ai-vision audit-design <source> [--prompt <text>] [options]
Options:
--prompt <text> — Custom audit prompt (optional)--temperature <num> — Temperature 0-2 (default: 0.7)--top-p <num> — Top P 0-1--top-k <num> — Top K 1-100--max-tokens <num> — Max output tokens--json — Output raw JSONOutput includes:
Examples:
ai-vision audit-design https://example.com/hero.jpg
ai-vision audit-design screenshot.png --prompt "Evaluate accessibility"
ai-vision audit-design design.jpg --json
Analyze an image with AI vision models.
ai-vision analyze-image <source> --prompt <text> [options]
Options:
--prompt <text> — Analysis prompt (required)--temperature <num> — Temperature 0-2 (default: 0.7)--top-p <num> — Top P 0-1--top-k <num> — Top K 1-100--max-tokens <num> — Max output tokens--json — Output raw JSONExamples:
ai-vision analyze-image https://example.com/image.jpg --prompt "describe the scene"
ai-vision analyze-image screenshot.png --prompt "extract design tokens"
ai-vision analyze-image image.jpg --prompt "analyze" --json
Compare 2-4 images to identify differences, similarities, or changes.
ai-vision compare-images <source1> <source2> [source3] [source4] --prompt <text> [options]
Options:
--prompt <text> — Comparison prompt (required)--temperature <num> — Temperature 0-2 (default: 0.7)--top-p <num> — Top P 0-1--top-k <num> — Top K 1-100--max-tokens <num> — Max output tokens--json — Output raw JSONExamples:
ai-vision compare-images before.jpg after.jpg --prompt "what changed?"
ai-vision compare-images v1.png v2.png v3.png --prompt "which is best?"
ai-vision compare-images baseline.png current.png --prompt "find visual bugs" --json
Detect and identify objects in an image with bounding boxes and confidence scores.
ai-vision detect-objects <source> --prompt <text> [--output <path>] [options]
Options:
--prompt <text> — Detection prompt (required)--output <path> — Save annotated image (optional)--viewport-width <number> — Logical viewport width for web screenshots--viewport-height <number> — Logical viewport height for web screenshots--temperature <num> — Temperature 0-2 (default: 0.7)--top-p <num> — Top P 0-1--top-k <num> — Top K 1-100--max-tokens <num> — Max output tokens--json — Output raw JSONOutput includes:
Examples:
ai-vision detect-objects photo.jpg --prompt "find all cars"
ai-vision detect-objects scene.jpg --prompt "detect people" --output annotated.jpg
ai-vision detect-objects screenshot.png --prompt "find buttons" --viewport-width 1920 --viewport-height 1080
ai-vision detect-objects image.jpg --prompt "find text" --json
Analyze video content frame-by-frame or as a whole. Supports URLs, local files, and YouTube videos.
ai-vision analyze-video <source> --prompt <text> [options]
Options:
--prompt <text> — Analysis prompt (required)--start-offset <time> — Start time (e.g., "40s", "2m30s", "00:02:30")--end-offset <time> — End time (e.g., "80s", "3m", "00:03:00")--fps <number> — Frame sampling rate (0.1-30, default: 1)--temperature <num> — Temperature 0-2 (default: 0.7)--top-p <num> — Top P 0-1--top-k <num> — Top K 1-100--max-tokens <num> — Max output tokens--json — Output raw JSONExamples:
ai-vision analyze-video recording.mp4 --prompt "describe what happens"
ai-vision analyze-video https://www.youtube.com/watch?v=dQw4w9WgXcQ --prompt "summarize content"
ai-vision analyze-video video.mp4 --prompt "detect bugs" --start-offset 1m --end-offset 3m --fps 2
ai-vision analyze-video playwright-video.webm --prompt "detect interaction bugs"
ai-vision analyze-video video.mp4 --prompt "summarize" --json
--prompt <text> Analysis prompt (required for most commands)
--json Output raw JSON instead of formatted text
--temperature <num> Temperature 0-2 (default: 0.7)
--top-p <num> Top P 0-1
--top-k <num> Top K 1-100
--max-tokens <num> Max output tokens
--help Show help
All commands accept multiple input formats:
https://example.com/image.jpg./path/to/image.jpgdata:image/jpeg;base64,...gs://bucket/path/to/image.jpgfiles/... (reuse previously uploaded files)https://www.youtube.com/watch?v=...Images: jpg, jpeg, png, bmp, gif, webp
Videos: mp4, mov, avi, webm, flv, mpeg, mpg, wmv, 3gp
Remote video handling:
Design System Analysis
ai-vision audit-design design-system.png
ai-vision analyze-image components.png --prompt "catalog all UI components"
Visual Regression Testing
ai-vision compare-images baseline.png current.png --prompt "identify visual differences"
Content Moderation
ai-vision detect-objects user-upload.jpg --prompt "find inappropriate content"
Video Analysis
ai-vision analyze-video playwright-recording.webm --prompt "detect UI interaction bugs"
Configure defaults via environment variables:
# Temperature settings
export TEMPERATURE=0.7
export TEMPERATURE_FOR_IMAGE=0.5
export TEMPERATURE_FOR_ANALYZE_IMAGE=0.3
# Token limits
export MAX_TOKENS=2048
export MAX_TOKENS_FOR_IMAGE=1024
export MAX_TOKENS_FOR_ANALYZE_IMAGE=512
# Sampling parameters
export TOP_P=0.9
export TOP_K=40
Use as an MCP server in Claude Desktop, Claude Code, or other MCP clients:
{
"mcpServers": {
"ai-vision-mcp": {
"command": "npx",
"args": ["ai-vision-mcp"],
"env": {
"IMAGE_PROVIDER": "google",
"GEMINI_API_KEY": "your-api-key"
}
}
}
}
MIT