Skill

answer-processing

Use whenever the user uploads a hand-written or scanned answer PDF to be graded against a reference solution. Converts answer PDFs in `answers/*.pdf` to markdown in `answers/converted/*.md` using the pdf skill (OCR as needed), then performs strategy-based grading against `converted/solutions/*.md` or `quizzes/*_answers.md`. Invoked by `/grade`.

Popularity

Parent stars

Parent forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/paideia:answer-processing

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

- User uploads an answer PDF and asks to grade it

SKILL.md

205 lines · ~2.4k tokens

Stats

LanguagePython

Parent stars91

Parent forks3

MaintenanceExcellent

Last CommitJul 11, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

Answer Processing

When to load

User uploads an answer PDF and asks to grade it
/grade is invoked
User says "I finished the quiz, here's my work"

Core pipeline

answers/<quiz-name>.pdf      ← user uploads hand-written scan
      ↓ (pdf skill, OCR)
answers/converted/<quiz-name>.md
      ↓ (this skill)
grade report → stdout (compact) + errors/log.md (append)

Step-by-step procedure

Step 1: Locate the answer file

If /grade was called with an argument, use it as a hint. Otherwise find the most recently modified file in answers/ (not answers/converted/).

Step 2: Convert PDF to MD (if PDF)

Use the vision-ocr skill — delegates to a local VLM (Qwen3-VL 8B via ollama) for clean prose + LaTeX transcription (the script reads INTERFACE_LANG from .course-meta so the VLM keeps the handwriting in its original language), with pytesseract as automatic fallback.

python3 "${CLAUDE_PLUGIN_ROOT}/scripts/vision_ocr.py" answers/<name>.pdf answers/converted/<name>.md

The script handles model warmup, page-by-page inference, and tier fallback. See .claude/skills/vision-ocr/SKILL.md. The output header tells the grader which tier produced the text:

 → high-confidence
 → degraded; treat results conservatively

Step 3: Graceful handling of OCR noise

Hand-written math OCR will be imperfect. Expect:

Greek letters misread as Latin ($\alpha \to a$, $\beta \to B$, $\pi \to T$ or $n$)
Fractions rendered as flattened text ($\tfrac{dU}{dT} \to dUdT$ or similar)
Subscripts/superscripts lost or inlined

Do not grade on algebraic correctness of OCR output. Instead, apply strategy-based grading:

Step 4: Strategy extraction from noisy MD

Read the converted MD file. For each problem, identify:

Which pattern(s) did the user invoke? Look for:
- Named theorems / techniques the user wrote out ("Maxwell relation", "Stokes theorem", "by induction")
- Variables they held fixed (even if notation is mangled)
- Key intermediate objects (a chosen potential, a change of variable, an ansatz)
Did the reasoning reach the correct end form? Even if algebra is wrong, the user's final expression structure (Does it have a log? A sqrt? A series? Correct variables?) tells you if the approach worked.
Where did they stop? Incomplete work is common; note which step is the last recognizable one.

Step 5: Compare against reference solution

Open the reference (converted/solutions/<hw>.md for HW, quizzes/<name>_answers.md for quizzes, twins/<id>_<ts>_sol.md for twins, chain/<ts>_sol.md for chain).

For each problem/part, produce a verdict:

## P<n>
- Pattern match: ✅ / ⚠️ / ❌   [user invoked <Pk>, solution uses <Pk>]
- Variable choice: ✅ / ⚠️ / ❌  [user held <x> fixed, should be <y>]
- End form: ✅ / ⚠️ / ❌          [user's final: <form>, expected: <form>]
- Completeness: <last step user reached>
- Overall: <PASS | PARTIAL | FAIL>
- Note: <one line — what to study, which Pk to re-drill>

Do not report line-by-line algebra mistakes unless they are specifically about sign errors or notation bugs that matter on the exam (e.g., missing $-$ on $\kappa$ definition, conjugate vs. transpose confusion).

Step 6: Log errors

Canonical errors/log.md schema — single source of truth. Every command that appends here (/grade, /blind, future drills) MUST use exactly these keys. Downstream readers (statusline.py, weakmap, session_start.py) pattern-match on pattern: and problem_id: lines; any drift silently hides entries.

- problem_id: <id>
  pattern: <Pk>
  error_type: pattern-missed | wrong-variable | wrong-end-form | algebraic | sign | definition
  phase: reading | comprehension | transformation | execution | encoding   # optional (F2) — inferred from error_type when absent
  nature: slip | misconception | gap   # optional (F3) — inferred from error_type when absent
  summary: "<1 line>"
  source: answers/converted/<name>.md
  date: <ISO>
  overridden_by: <source>   # optional — present only on entries superseded by a human override

Only problem_id, pattern, error_type, summary, source, date are required (the six REQUIRED_KEYS log_tool.py enforces). phase, nature, and overridden_by are optional additive keys — omit them and downstream readers promote phase/nature from error_type via DEFAULT_PHASE/DEFAULT_NATURE (paideia_lib.iter_error_entries); write them explicitly only when the grading determined a phase/nature that differs from the inferred default.

Write through log_tool.py — never hand-edit the log. Build the YAML block for every non-✅ entry of this grading, then make ONE call:

python3 "${CLAUDE_PLUGIN_ROOT}/scripts/log_tool.py" append \
  --source="answers/converted/<name>.md" <<'YAML'
- problem_id: <id>
  pattern: <Pk>
  error_type: <type>
  summary: "<1 line>"
  source: answers/converted/<name>.md
  date: <ISO>
YAML

The tool schema-validates every entry (keys, error_type values, date shape, source: equal to --source) and rejects the whole batch on any violation — fix the block and re-run rather than writing around it.

Why the tool exists — idempotent by source:, replace don't pile up. Re-grading the same answer (fix the OCR, re-run /grade) or re-running /blind on the same problem must NOT leave two copies of that attempt's errors in the log — the weakmap histogram would then double-count and over-rank those patterns. log_tool.py append deletes every existing entry whose source: equals --source before appending, atomically, so the log stays a record of the latest grading of each source, not a transcript of every re-grade. (A genuinely new attempt belongs under a new source: — a new upload gets a new filename, so this only collapses true re-grades of the same file.) If a grading produced zero errors on a re-grade, run log_tool.py remove --source=<source> so the stale entries clear.

Misgrade correction — origin-preserving override. When a user disputes a verdict (OCR misread, wrong error classification), use log_tool.py override instead of append. The original entries are preserved in the log with overridden_by: <source> injected after their date: line. The correction entries are appended as the new current verdict (no marker). This is the only path that maintains an audit trail of the original ruling.

python3 "${CLAUDE_PLUGIN_ROOT}/scripts/log_tool.py" override \
  --source="answers/converted/<name>.md" <<'YAML'
- problem_id: <id>
  pattern: <Pk>
  error_type: <corrected-type>
  summary: "<corrected 1-line description>"
  source: answers/converted/<name>.md
  date: <ISO>
YAML

Rules: override when changing the verdict; remove only when the re-grade produced zero errors; never include overridden_by: in stdin (tool assigns it — passing it is rejected). The weakmap histogram counts only entries without overridden_by: as current verdicts; original-marked entries do not double-count. The six required keys and PATTERN_RX are unchanged — overridden_by is an optional additive key, not a seventh required key.

Step 7: Render grade summary (chat output)

Compact table, no verbose explanations:

| Problem | Pattern | Vars | End form | Overall |
|---|---|---|---|---|
| P1 | ✅ | ✅ | ⚠️ | PARTIAL |
| P2 | ❌ | — | — | FAIL |
| P3 | ✅ | ✅ | ✅ | PASS |

Dominant issue: pattern-missed on P2 (used brute-force integration; should use residue theorem, P7).
Drill next: /blind <problem testing P7>, or /pattern P7 for quick review.

Keep this under 15 lines of output.

Handling edge cases

Empty or unreadable PDF

If OCR yields <100 chars total, ask the user (in INTERFACE_LANG from .course-meta, default en): "OCR returned too little. PDF quality may be low or the handwriting too small. Options: (a) re-scan brighter/larger and re-upload (b) type the answer into .md and save it to answers/converted/<name>.md, then /grade again"

User uploads .md directly

Skip PDF conversion. Read answers/<name>.md directly. Everything else is the same.

Multi-page with disordered content

Hand-written work often has margin notes, arrows, struck-through attempts. OCR will render them chaotically. Note in the grade (in $INTERFACE_LANG): "Answer ordering ambiguous. My interpretation: . Let me know if different."

User already in context

If the user pastes their work directly into chat (not as PDF), grade it from context. Still apply strategy-based grading.

Anti-patterns (things NOT to do)

❌ Demand pixel-perfect algebra from OCR output ❌ Mark something wrong because OCR mangled a Greek letter ❌ Require the user to retype their solution in LaTeX ❌ Produce 3-page grade reports (stay compact) ❌ Reveal the reference solution before grading (user might be asking "did I get it right" as a first pass)

Integration

Called by /grade
Uses pdf skill for OCR
Reads course-index/patterns.md (pattern IDs) and converted/solutions/ or equivalent
Writes to errors/log.md and answers/converted/

answer-processing

Popularity

Invocation

Context Preview

SKILL.md

answer-processing

Popularity

Invocation

Context Preview

SKILL.md

Answer Processing

When to load

Core pipeline

Step-by-step procedure

Step 1: Locate the answer file

Step 2: Convert PDF to MD (if PDF)

Step 3: Graceful handling of OCR noise

Step 4: Strategy extraction from noisy MD

Step 5: Compare against reference solution

Step 6: Log errors

Step 7: Render grade summary (chat output)

Handling edge cases

Empty or unreadable PDF

User uploads .md directly

Multi-page with disordered content

User already in context

Anti-patterns (things NOT to do)

Integration

Similar Skills

Answer Processing

When to load

Core pipeline

Step-by-step procedure

Step 1: Locate the answer file

Step 2: Convert PDF to MD (if PDF)

Step 3: Graceful handling of OCR noise

Step 4: Strategy extraction from noisy MD

Step 5: Compare against reference solution

Step 6: Log errors

Step 7: Render grade summary (chat output)

Handling edge cases

Empty or unreadable PDF

User uploads .md directly

Multi-page with disordered content

User already in context

Anti-patterns (things NOT to do)

Integration

Similar Skills