From open-science-skills
Guides post-OCR text cleanup for research corpora using LLM correction, rule-based fixes, quality diagnostics, and multilingual handling. For strategy selection, prompt design, evaluation, and assurance.
npx claudepluginhub scdenney/open-science-skills --plugin open-science-skillsThis skill uses the workspace's default tool permissions.
- **Choose between LLM correction, rule-based fixes, or a hybrid pipeline based on error type.** LLM correction excels at context-dependent errors (wrong but plausible characters, broken words, missing diacritics). Rule-based fixes handle deterministic patterns (control characters, Unicode normalization, repetition artifacts, whitespace) with zero risk of content alteration. Use rule-based fixe...
Guides setup/execution of VLM-based OCR pipelines for scanned historical/multilingual documents: model selection, image/DPI handling, prompts, batch processing on HPC/SLURM, accuracy eval with CER/WER.
Polishes English academic LaTeX text for journal submission via quick-fix or guided multi-pass workflows. Supports in-place editing, change tracking, and journal style adaptation.
Detects AI-generated patterns in Korean text from LLMs like ChatGPT/Claude/Gemini and rewrites to natural human style. Analyzes 40 linguistic markers (comma overuse, spacing rigidity, POS diversity) with S1-S3 severity based on KatFishNet research (94.88% AUC).
Share bugs, ideas, or general feedback.
do_sample=False) for deterministic output.