Help us improve
Share bugs, ideas, or general feedback.
From tldr-swinton
Sync interbench eval coverage with tldrs capabilities. Run when tldrs gains new formats, flags, or commands. Reads the tldrs manifest as ground truth and generates minimal targeted edits to 4 interbench files.
npx claudepluginhub mistakeknot/interagency-marketplace --plugin tldr-swintonHow this skill is triggered — by the user, by Claude, or both
Slash command
/tldr-swinton:tldrs-interbench-syncThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Sync interbench's eval coverage with the current tldrs capabilities.
Guides technical evaluation of code review feedback: read fully, restate for understanding, verify against codebase, respond with reasoning or pushback before implementing.
Share bugs, ideas, or general feedback.
Sync interbench's eval coverage with the current tldrs capabilities.
tldrs manifest --pretty
Save this output — it is the single source of truth for all tldrs capabilities.
tldrs manifest | python3 /root/projects/Interverse/infra/interbench/scripts/check_tldrs_sync.py
If exit 0: report "interbench is in sync" and stop. If exit 1: continue with the gaps listed.
Read ALL 4 files to understand existing patterns before editing:
/root/projects/Interverse/infra/interbench/scripts/regression_suite.json/root/projects/Interverse/infra/interbench/scripts/ab_formats.py/root/projects/Interverse/infra/interbench/demo-tldrs.sh/root/projects/Interverse/infra/interbench/scripts/score_tokens.pyEach query entry follows this pattern:
{
"name": "{command}_{qualifier}",
"description": "Human-readable description",
"command": ["command", "entry_or_flags...", "--format", "fmt"],
"metadata": {"tool": "tldrs", "command": "cmd", ...}
}
truncate_output as the default entry--format ultracompact--zoom Lx --format ultracompactcommand_raw (with {project} placeholder) is only for slice since it needs absolute pathsThe DEFAULT_FORMATS list should contain all formats from the context command. Add missing formats to the list in the existing style:
DEFAULT_FORMATS = ["ultracompact", "text", "cache-friendly", "packed-json", "columnar-json"]
Each demo run block follows this pattern:
# -- Run N: Description --
echo "-- Run N: description --"
"$ASHPOOL" run \
-m tool=tldrs \
-m command=context \
-m entry=truncate_output \
-m format=FORMAT_NAME \
-- $TLDRS context truncate_output \
--project "$TLDRS_PROJECT" \
--format FORMAT_NAME
echo
Add new run blocks before the # -- Summary -- section. Increment the run number.
For scoring hints with metrics, add a parse_* function following the existing pattern:
def parse_FORMAT_NAME(context: str) -> dict | None:
"""Extract SIGNAL from tldrs FORMAT output."""
# Parse the signal from context
...
Only add parsers for formats listed in scoring_hints that have non-empty metrics.
After all edits, re-run the sync check:
tldrs manifest | python3 /root/projects/Interverse/infra/interbench/scripts/check_tldrs_sync.py
Report the result. Exit 0 means all gaps are covered.
json and json-pretty which context does not