Sync interbench eval coverage with tldrs capabilities. Run when tldrs gains new formats, flags, or commands. Reads the tldrs manifest as ground truth and generates minimal targeted edits to 4 interbench files.
From tldr-swintonnpx claudepluginhub mistakeknot/interagency-marketplace --plugin tldr-swintonThis skill is limited to using the following tools:
Dispatches parallel agents to independently tackle 2+ tasks like separate test failures or subsystems without shared state or dependencies.
Executes pre-written implementation plans: critically reviews, follows bite-sized steps exactly, runs verifications, tracks progress with checkpoints, uses git worktrees, stops on blockers.
Guides idea refinement into designs: explores context, asks questions one-by-one, proposes approaches, presents sections for approval, writes/review specs before coding.
Sync interbench's eval coverage with the current tldrs capabilities.
tldrs manifest --pretty
Save this output — it is the single source of truth for all tldrs capabilities.
tldrs manifest | python3 /root/projects/Interverse/infra/interbench/scripts/check_tldrs_sync.py
If exit 0: report "interbench is in sync" and stop. If exit 1: continue with the gaps listed.
Read ALL 4 files to understand existing patterns before editing:
/root/projects/Interverse/infra/interbench/scripts/regression_suite.json/root/projects/Interverse/infra/interbench/scripts/ab_formats.py/root/projects/Interverse/infra/interbench/demo-tldrs.sh/root/projects/Interverse/infra/interbench/scripts/score_tokens.pyEach query entry follows this pattern:
{
"name": "{command}_{qualifier}",
"description": "Human-readable description",
"command": ["command", "entry_or_flags...", "--format", "fmt"],
"metadata": {"tool": "tldrs", "command": "cmd", ...}
}
truncate_output as the default entry--format ultracompact--zoom Lx --format ultracompactcommand_raw (with {project} placeholder) is only for slice since it needs absolute pathsThe DEFAULT_FORMATS list should contain all formats from the context command. Add missing formats to the list in the existing style:
DEFAULT_FORMATS = ["ultracompact", "text", "cache-friendly", "packed-json", "columnar-json"]
Each demo run block follows this pattern:
# -- Run N: Description --
echo "-- Run N: description --"
"$ASHPOOL" run \
-m tool=tldrs \
-m command=context \
-m entry=truncate_output \
-m format=FORMAT_NAME \
-- $TLDRS context truncate_output \
--project "$TLDRS_PROJECT" \
--format FORMAT_NAME
echo
Add new run blocks before the # -- Summary -- section. Increment the run number.
For scoring hints with metrics, add a parse_* function following the existing pattern:
def parse_FORMAT_NAME(context: str) -> dict | None:
"""Extract SIGNAL from tldrs FORMAT output."""
# Parse the signal from context
...
Only add parsers for formats listed in scoring_hints that have non-empty metrics.
After all edits, re-run the sync check:
tldrs manifest | python3 /root/projects/Interverse/infra/interbench/scripts/check_tldrs_sync.py
Report the result. Exit 0 means all gaps are covered.
json and json-pretty which context does not