Skill

silicon-paddle-ocr

OCR skill using PaddleOCR model via SiliconFlow API. This skill should be used when the user asks to "recognize text from an image", "extract text from a photo", "OCR this image", "read text from screenshot", or mentions "PaddleOCR", "image text recognition", "text extraction from images".

Install

npx claudepluginhub aotenjou/silicon-paddleocr

Tool Access

This skill uses the workspace's default tool permissions.

Preview

Use PaddleOCR to extract text content from images. Supports single image or batch processing.

Supporting Assets

examples/sample-usage.shmetadata.jsonreferences/api-configuration.mdscripts/ocr_skill.py

SKILL.md

Similar Skills

cache-components

Guides Next.js Cache Components and Partial Prerendering (PPR) with cacheComponents enabled. Implements 'use cache', cacheLife(), cacheTag(), revalidateTag(), static/dynamic optimization, and cache debugging.

cache-components

139.0k

claude-opus-4-5-migration

2 files

Migrates code, prompts, and API calls from Claude Sonnet 4.0/4.5 or Opus 4.1 to Opus 4.5, updating model strings on Anthropic, AWS, GCP, Azure platforms.

claude-opus-4-5-migration

83.2k

bmad-help

Analyzes BMad project state from catalog CSV, configs, artifacts, and query to recommend next skills or answer questions. Useful for help requests, 'what next', or starting BMad.

bmad-pro-skills

43.8k

Stats

Stars1

Forks0

Last CommitFeb 9, 2026

Actions

View Source View Plugin View on GitHub View README

OCR - Image Text Recognition

Use PaddleOCR to extract text content from images. Supports single image or batch processing.

Overview

This skill provides optical character recognition (OCR) capabilities using the PaddlePaddle/PaddleOCR-VL-1.5 model via the SiliconFlow API. Extract text from JPG, PNG, WebP, BMP, and GIF images.

When to Use

Invoke this skill when:

User wants to extract text from an image
User asks to OCR a screenshot or photo
User needs to read text from an image file
User mentions text recognition from images

How to Use

Prerequisites

Ensure the SILICONFLOW_API_KEY environment variable is set:

export SILICONFLOW_API_KEY="your_api_key"

Basic Usage

Execute the OCR script:

python3 scripts/ocr_skill.py [options] image_path

Arguments

Argument	Description
`images`	Image file path(s) or glob pattern (required)
`-k, --api-key`	API key (default: from SILICONFLOW_API_KEY env)
`-m, --model`	OCR model name (default: PaddlePaddle/PaddleOCR-VL-1.5)
`-p, --prompt`	Recognition prompt for custom behavior
`-j, --json`	Output results in JSON format
`-o, --output`	Save results to specified file
`--max-tokens`	Maximum tokens in response (default: 2000)

Examples

Single image:

python3 scripts/ocr_skill.py /path/to/image.jpg

Multiple images with glob:

python3 scripts/ocr_skill.py /path/to/images/*.png

JSON output format:

python3 scripts/ocr_skill.py --json /path/to/image.jpg

Custom prompt for table extraction:

python3 scripts/ocr_skill.py -p "Please identify and format table content as Markdown" /path/to/table.jpg

Save to file:

python3 scripts/ocr_skill.py --json --output results.json /path/to/images/*.jpg

Output Format

Text output (default):

--- image.jpg ---
识别到的文字内容
识别到 X 处文字区域

JSON output:

{
  "image.jpg": {
    "image_path": "/path/to/image.jpg",
    "image_size": [width, height],
    "texts": [
      {
        "text": "识别的文字",
        "box": [[x1, y1], [x2, y2], [x3, y3], [x4, y4]]
      }
    ],
    "full_text": "所有文本的组合"
  },
  "image2.png": { ... }
}

Coordinates Explanation:

LOC values are normalized coordinates converted to pixel coordinates
Conversion: pixel = LOC × (image_size / LOC_max_value)
LOC max_value is approximately 972 (may vary by model/image)
The box field provides the four corner coordinates of each text region in pixel format

Supported Image Formats

JPG/JPEG
PNG
WebP
BMP
GIF

Error Handling

If processing fails:

Check that the image file exists
Verify the SILICONFLOW_API_KEY is valid
Ensure the API endpoint is reachable

Images that fail to process will show an error message, and other images will continue processing.

Additional Resources

Reference Files

references/api-configuration.md - API configuration details

Example Files

examples/sample-usage.sh - Example usage script

Scripts

scripts/ocr_skill.py - The main OCR implementation