From skill-steward
Maintains repo-local action contracts and harness repos where CLI/MCP adapters are thin over core libraries. Use for steward.yaml actions, probes, benchmarks, adapter refactors, and parity work.
How this skill is triggered — by the user, by Claude, or both
Slash command
/skill-steward:mcp-harness-repo-maintainerAGENTS.mddocs/**plugin/**packages/**src/**mcp_server_*/**MakefilemakefileJustfilejustfilepackage.jsonCargo.tomlpyproject.tomlpubspec.yaml**/mcp.json**/mcp*.json.github/workflows/**tool/**scripts/**The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Build and maintain repo-local action contracts and harnesses where agents execute and humans steer. The historical `mcp-` name remains because many adopters arrive through MCP work, but this skill is not MCP-only. For general app, library, tool, plugin, or meta-repo stewardship baselines, use `repo-quality-system-lifecycle` first; use this skill only when typed actions, probes, benchmarks, or C...
evals/cases/adoption-promotion-trigger.yamlevals/cases/capability-classification-trigger.yamlevals/cases/cold-start-contract-trigger.yamlevals/cases/cooking-recipe-dormant.yamlevals/cases/generational-skeptic-promotion-trigger.yamlevals/cases/goal-first-detour-stop-trigger.yamlevals/cases/interface-split-compress-trigger.yamlevals/cases/native-gate-promotion-trigger.yamlevals/cases/portable-invocation-trigger.yamlevals/cases/protected-local-state-trigger.yamlevals/cases/sibling-layout-trigger.yamlevals/cases/single-transcript-no-promotion.yamlreferences/cli-mcp-pattern.mdreferences/core-and-interfaces.mdreferences/evals.mdreferences/harness-principles.mdreferences/maintainer-checklists.mdreferences/mcp-production-practices.mdreferences/preferred-tooling.mdreferences/repo-archetypes.mdBuild and maintain repo-local action contracts and harnesses where agents execute and humans steer. The historical mcp- name remains because many adopters arrive through MCP work, but this skill is not MCP-only. For general app, library, tool, plugin, or meta-repo stewardship baselines, use repo-quality-system-lifecycle first; use this skill only when typed actions, probes, benchmarks, or CLI/MCP parity are in scope.
MCP and CLI are thin interfaces—APIs for agents and CI. Core contains the real logic, schemas, and registries. Adapters parse wire format (argv, MCP JSON-RPC); they delegate immediately.
Agents / CI → CLI ──┐
├──► Core (logic, contracts, tests)
Agents / chat → MCP ──┘
Full layering: core-and-interfaces.md. Parity: every MCP tool must call the same core entrypoint as its CLI twin.
Progressive Automation (Agent-Driven Workflows): Harnesses should let agents turn repeated friction into reviewed, durable capability. If an agent discovers a complex fix or command sequence, it should capture an unknown case or typed action candidate with owner, risk class, inputs, outputs, effects, provenance, and verification. Permanent steward.yaml changes must go through reviewable diffs and validation; do not teach agents to save raw bash permanently from MCP.
Goal-first adoption: The original user goal remains the acceptance check. Tool repair, install work, wrappers, action candidates, evals, and refactors are detours unless they directly solve that goal or preserve a reusable lesson. After two failed repair/setup attempts, stop tool restoration, use a type-native command or portable fallback when possible, record the friction, return to the task, and do not promote from that same detour.
Product experiment ownership: When the goal is visual quality, shader
behavior, loader correctness, renderer throughput, or performance, the product
repo owns the high-throughput experiment runner and oracle. Steward can validate
or summarize an experiment-campaign-summary/v1 artifact after the product loop
has produced captures/metrics. Do not promote a Steward action, benchmark, or
MCP tool as product acceleration unless the product campaign names what surface
changed or was directly proven.
Skeptic before promotion: A missing capability is a harness gap only after smaller layers fail. First ask whether the fix belongs in a native command, error message, FAQ, docs map, public API, schema/codegen, or deletion/collapse. Promote a Steward action, MCP tool, or benchmark only when it improves a real proof path and carries a falsifier.
Evolutionary simplicity for interfaces: Split core entrypoints when ownership, proof, effects, cadence, or audience diverge. Compress CLI/MCP/help surfaces when one user or CI intent remains, but preserve structured child outcomes so wrappers do not flatten proof, effects, or non-claims.
repository-governance-lifecycle; record an accepted ADR after agreement.Human intent (prompt, plan, review)
│
▼
┌───────────────────┐
│ Skills + AGENTS │ Map & procedures (when to do what)
└─────────┬─────────┘
▼
┌───────────────────┐
│ CLI │ doctor, exec, validate, contracts (deterministic)
└─────────┬─────────┘
▼
┌───────────────────┐
│ MCP server │ fmt_* / tools for chat agents (same schemas)
└─────────┬─────────┘
▼
┌───────────────────┐
│ App / runtime │ Legible UI, logs, metrics per worktree (optional)
└───────────────────┘
Route by primary artifact when the repo is harness or action-contract shaped. For the broader app/library/tool taxonomy, use repo-quality-system-lifecycle.
| Expert lens | Repo examples | Owns | Does not own |
|---|---|---|---|
| Plugin/MCP | <plugin_repo> | plugin/mcp.json, tool prefixing, init utility | Harness scripts, visual comparisons |
| Library | <library_repo> | Platform packages/modules, adapters | Shippable plugin tree, dogfood apps |
| Harness/CLI | <harness_repo> | Harness engine, app registry, fixture lint | MCP server binary, marketplace manifests |
| D — Visual sidecar | <visual_sidecar> | Profile configs, compare/deconstruct CLI | VM/MCP, dynamic registry |
| Meta/governance | skill_steward | skills/, plugins/, validator CLI, docs | Product MCP, domain tools |
| F — Security/Ops | all remotes | OAuth gateway, token brokering | Feature code |
docs/NORTH_STAR.mdx (or root pointer); AGENTS.md = map only.VERSION or release-please manifest.npx skills, git marketplace). Avoid proprietary ecosystem lock-in. Ensure CLI tools natively compile governance to upstream IDE formats (e.g., steward bundle generating .cursor/rules and .clinerules).skills/--json output, stable error codes, action effects, limits, and redaction policy; document in DX_FAQ.AGENTS.md / docs_map row.pnpm run validate or project contract tests.Use this loop before claiming a repo is harness-ready or diagnosing repo-specific symptoms. A fresh repo has no meaningful symptom catalog yet; first prove that agents can discover and safely execute declared contracts.
Use steward <command> only after installing the released CLI or activating a local clone as a global command. Do not teach absolute local paths, private SDK paths, or sibling checkout paths as adoption instructions.
Preferred order:
curl -fsSL https://raw.githubusercontent.com/Arenukvern/skill_steward/main/install.sh | bash, then steward <command>.cd packages/steward_cli && dart run :steward <command>.dart pub global activate --source path packages/steward_cli, then steward <command>.Raw dart --packages=... bin/steward.dart commands are local provenance only. If evidence needs them, pair them with a portable command block and label the machine-specific path as non-copyable.
Declare a small contract — Add or update steward.yaml with one quick-safe action. The first action should inspect state, not mutate it.
Expose the action — Put the action under probes.quick.actions only when it passes quick policy: default_policy: auto, no confirmation, no shell, no network/secrets/destructive effects, no repo mutation, no fs_write.
Add a scenario manifest — Put committed scenarios under steward/scenarios/*.yaml; use a precise name such as contract-status-smoke until the scenario proves navigation or diagnosis.
Run the proof loop:
steward doctor --json
steward schema check-outputs --json
steward schema drift --json
steward actions list --json
steward action inspect <action-id> --json
steward probe --profile quick --json
steward benchmark --scenario <scenario-id> --strict --output .steward/benchmark-summaries/<scenario-id>.json --json
Interpret honestly — doctor/actions list prove discovery, schema check-outputs catches machine-readable output drift, schema drift catches generated contract/schema drift, action inspect proves the executable boundary, probe proves the safe first observation, and benchmark proves durable execution only when it returns result: "pass". durability_blocked is truthful blocked evidence when strict benchmark inputs are modified or untracked; it is not H2 proof.
steward blocked explain --stdin --json to choose config repair, unknown-case capture, or same-benchmark rerun. Use steward blocked explain --input <result.json> --json only for an existing persisted summary.--output. A current ledger or rerun route decides whether historical summaries are still current proof.steward.yaml, dirty declared inputs, schema/output drift, or native launch failures through the owning surface before adding a new action, benchmark scenario, PDSA note, or evidence packet from the same detour.Protect local state — Strict benchmark inputs must be tracked and clean before execution: steward.yaml, file-backed scenario manifests, and any declared action inputs the benchmark reads. Local run outputs such as .steward/benchmark-summaries/*.json, observations, unknown cases, and action candidates stay local unless a review intentionally promotes a redacted artifact. If the repo has temporary dirty files that must remain in place, write a do-not-touch exception and keep those files out of action inputs. Protected local state is not a benchmark blocker unless it is declared as a contract or scenario input.
Grow from evidence — If the probe exposes an unknown failure, capture an unknown case first. Promote a typed action candidate only after owner, effects, limits, redaction, validation command, and benchmark evidence exist. Do not promote diagnostics from the same run that discovered them.
status, evidence_type, claim_tested, proof_level, limitations, non_claims, next_disposition, and current_status_pointer envelope from docs/core/evidence-artifacts.mdx. Do not preserve raw logs, secrets, or private relational memory as evidence.Prefer useful native gates over Steward-only scorekeeping. A repo may promote an existing deterministic script, test, or validation command into a Steward action when it has:
For visual/performance campaign work, a native gate must be able to run variants without rebuild-per-hypothesis when the product stack allows it, keep warm browser/server state for runtime sweeps, and emit the winning evidence rather than raw screenshot piles. If two harness loops do not move the product oracle or metric, stop harness work and return to product-native experimentation.
This is the path for turning evidence into a tool improvement packet. If the result cannot teach a later agent how to maintain or improve the repo's real tool surface, keep it as an observation or unknown case instead of promoting it.
Use the adoption-run/v2 evidence shape before making S/H claims. Record:
user_goal: original prompt, requested outcome, acceptance check, status, and evidence.capability: id, class, scope, user value, and native owner.direct_problem_path: declared surfaces and native gates used before raw shell exploration.tool_detour: reason, attempts, artifacts, stop rule, and return-to-goal step.generational_architecture_check: repeated pattern, smaller layer considered, deletion/collapse option, selected pattern layer, maintenance delta, and promotion guard.outcome: continue, refactor, stop, abandon, or promote.hot_path_claim: problem class, created surface, falsifier, positive proof, observed effect, held-out or future task, and non-claims.Do not use "fully adopted repo" language for one polished proof. Say "capability-level H5" or "capability-level S5/H5" and name the capability. Repo maturity remains a separate, broader claim.
Do not call a repository harness-ready until the proof stage matches the claim.
| Stage | Name | Proof |
|---|---|---|
| H0 | Skills installed | Agent can discover the relevant Skill Steward skills. |
| H1 | Local contract declared | steward.yaml exists with one quick-safe action and docs point to it. |
| H2 | Smoke loop proven | Cold-start proof loop produces a durable benchmark summary with result: "pass"; durability_blocked keeps the repo below H2 until rerun cleanly. |
| H3 | Repo feedback loop | Benchmark summaries, unknown cases, and action candidates accumulate from real work. |
| H4 | Fresh-agent workflow | A fresh agent completes one repo workflow without raw shell spelunking. |
| H5 | Promoted harness capability | Repeated evidence, including at least one held-out benchmark or future-agent repeat, promotes a diagnostic, action, eval, or local harness feature. |
plugin/ is SSOT. Ship a custom server, do not patch community servers for product logic.profiles/*.yaml.mcp.json.<workspace>/
<plugin_repo>/ # toolkit + MCP/plugin init
<library_repo>/ # SDK platform/library packages
<harness_repo>/ # CLI/harness runner
<visual_sidecar>/ # D — comparison sidecar
<meta_governance_repo>/ # meta skills & validation
Common proof:
repository-governance-lifecycle)Archetype-specific proof:
| Archetype | Required proof |
|---|---|
| Product MCP | MCP tools and CLI commands share the same core schema/validation entrypoints. |
| Platform libs | Protocol adapters are thin, core tests cover behavior, and no product-specific fork is embedded. |
| CLI harness | CLI command exists for CI/gates; MCP parity is not required unless the repo exposes MCP. |
| Visual sidecar | Profile/config schemas and compare/deconstruct commands are validated; MCP parity is not required. |
| Meta steward | Skill/plugin/validator surfaces pass skill validation and T1 behavior-critical routing cases; no product runtime is bundled. |
| Security/Ops | Mutation surfaces require explicit risk class, redaction, and authorization policy. |
npx skills add arenukvern/skill_steward --skill mcp-harness-repo-maintainer
See references/sources.md. When researching, follow skill-source-citations.
npx claudepluginhub arenukvern/skill_steward --plugin skill-stewardReviews code diffs for over-engineering: unnecessary complexity, reinvented standard library, speculative abstractions. One-line findings per location. Use for 'review for over-engineering' or /ponytail-review.