From OCR Skill
OCR skill using PaddleOCR model via SiliconFlow API. This skill should be used when the user asks to "recognize text from an image", "extract text from a photo", "OCR this image", "read text from screenshot", or mentions "PaddleOCR", "image text recognition", "text extraction from images".
npx claudepluginhub aotenjou/silicon-paddleocrThis skill uses the workspace's default tool permissions.
Use PaddleOCR to extract text content from images. Supports single image or batch processing.
Guides Next.js Cache Components and Partial Prerendering (PPR) with cacheComponents enabled. Implements 'use cache', cacheLife(), cacheTag(), revalidateTag(), static/dynamic optimization, and cache debugging.
Migrates code, prompts, and API calls from Claude Sonnet 4.0/4.5 or Opus 4.1 to Opus 4.5, updating model strings on Anthropic, AWS, GCP, Azure platforms.
Analyzes BMad project state from catalog CSV, configs, artifacts, and query to recommend next skills or answer questions. Useful for help requests, 'what next', or starting BMad.
Use PaddleOCR to extract text content from images. Supports single image or batch processing.
This skill provides optical character recognition (OCR) capabilities using the PaddlePaddle/PaddleOCR-VL-1.5 model via the SiliconFlow API. Extract text from JPG, PNG, WebP, BMP, and GIF images.
Invoke this skill when:
Ensure the SILICONFLOW_API_KEY environment variable is set:
export SILICONFLOW_API_KEY="your_api_key"
Execute the OCR script:
python3 scripts/ocr_skill.py [options] image_path
| Argument | Description |
|---|---|
images | Image file path(s) or glob pattern (required) |
-k, --api-key | API key (default: from SILICONFLOW_API_KEY env) |
-m, --model | OCR model name (default: PaddlePaddle/PaddleOCR-VL-1.5) |
-p, --prompt | Recognition prompt for custom behavior |
-j, --json | Output results in JSON format |
-o, --output | Save results to specified file |
--max-tokens | Maximum tokens in response (default: 2000) |
Single image:
python3 scripts/ocr_skill.py /path/to/image.jpg
Multiple images with glob:
python3 scripts/ocr_skill.py /path/to/images/*.png
JSON output format:
python3 scripts/ocr_skill.py --json /path/to/image.jpg
Custom prompt for table extraction:
python3 scripts/ocr_skill.py -p "Please identify and format table content as Markdown" /path/to/table.jpg
Save to file:
python3 scripts/ocr_skill.py --json --output results.json /path/to/images/*.jpg
Text output (default):
--- image.jpg ---
识别到的文字内容
识别到 X 处文字区域
JSON output:
{
"image.jpg": {
"image_path": "/path/to/image.jpg",
"image_size": [width, height],
"texts": [
{
"text": "识别的文字",
"box": [[x1, y1], [x2, y2], [x3, y3], [x4, y4]]
}
],
"full_text": "所有文本的组合"
},
"image2.png": { ... }
}
Coordinates Explanation:
box field provides the four corner coordinates of each text region in pixel formatIf processing fails:
Images that fail to process will show an error message, and other images will continue processing.
references/api-configuration.md - API configuration detailsexamples/sample-usage.sh - Example usage scriptscripts/ocr_skill.py - The main OCR implementation