Performs Phase 4 quality gate for cli-web Python CLIs: 3-agent implementation review, 75-check checklist, pip package publishing, and read/write smoke tests.
From cli-anything-webnpx claudepluginhub itamarzand88/cli-anything-web --plugin cli-anything-webThis skill uses the workspace's default tool permissions.
references/quality-checklist.mdExecutes pre-written implementation plans: critically reviews, follows bite-sized steps exactly, runs verifications, tracks progress with checkpoints, uses git worktrees, stops on blockers.
Guides idea refinement into designs: explores context, asks questions one-by-one, proposes approaches, presents sections for approval, writes/review specs before coding.
Dispatches parallel agents to independently tackle 2+ tasks like separate test failures or subsystems without shared state or dependencies.
Quality gate for cli-web-* CLIs. This skill owns the complete Phase 4: independent implementation review, structural quality checklist, publishing, and end-user smoke testing. Nothing ships until this phase passes.
Do NOT start unless:
<APP>.md (API map) exists and documents all endpointsIf tests are not passing, invoke the testing skill first.
Not all checks apply to every CLI. When evaluating, consider the site profile:
auth login takes a key argument, not playwright-cli.
auth refresh is not applicable — use auth logout instead.Mark inapplicable checks as "N/A — [reason]" rather than creating dead-code stubs.
Before checking structure or publishing, verify the code actually does the right thing. Tests prove it runs; this step proves it's correct.
Dispatch 3 plugin agents in the same message using the Agent tool:
traffic-fidelity-reviewer — API coverage (reads <APP>.md + client.py + commands/)harness-compliance-reviewer — Code conventions (reads HARNESS.md + all source)output-ux-reviewer — User experience (runs --help, checks REPL, validates JSON)Pass each agent: APP_PATH={app}/agent-harness, APP_NAME={app}, and site
profile (auth_type, is_read_only). The agents are defined in the plugin's
agents/ directory.
| Agent | Focus | What it reads | What it catches |
|---|---|---|---|
| Traffic Fidelity | API coverage | <APP>.md + client.py + commands/ | Missing endpoints, wrong params, broken response parsing, dead client methods, stale API map |
| HARNESS Compliance | Code quality | HARNESS.md + checklist + all source | click.ClickException bypass, missing to_dict(), retry_after lost, auth retry missing, stderr UTF-8 |
| Output & UX | User experience | --help output, --json output, REPL | Protocol leaks, stale REPL help, dead command files, broken entry points |
Each agent scores findings on a 0-100 confidence scale. When all 3 return:
Gate: Do not proceed to Step 2 until Critical count = 0.
Run the 75-check quality checklist from references/quality-checklist.md.
This covers directory structure, required files, CLI patterns, core modules,
tests, documentation, packaging, code quality, REPL, error handling, and UX.
setup.py with:
find_namespace_packages for cli_web.*console_scripts entry point: cli-web-<app>click>=8.0, httpxextras_require={"browser": ["playwright>=1.40.0"]}pip install -e .which cli-web-<app>cli-web-<app> --helpThis is the most critical verification step. The agent MUST simulate what a real
end user would do after pip install cli-web-<app>. If this fails, the pipeline
is NOT complete -- go back and fix the issue.
If no-auth site: Skip steps 5-6 (auth). Go directly to step 7 (READ).
If read-only site: Skip step 8 (WRITE). Verify reads return real data.
5. Authenticate as an end user would:
cli-web-<app> auth login
This uses Python sync_playwright() -- opens a browser, user logs in, cookies saved. This is what end users will run. If this fails, the CLI is broken for end users.
6. Verify auth status shows LIVE VALIDATION OK:
cli-web-<app> auth status
Must show: cookies present, tokens valid. If it shows "expired", "redirect", or any auth failure -- STOP. Fix auth before proceeding.
7. Run a READ operation and verify real data:
cli-web-<app> --json <first-resource> list
This must return real data from the live API -- NOT an error, NOT empty, NOT "auth not configured". Verify the JSON response contains expected fields.
8. Run a WRITE operation and verify it actually worked: This is the step the agent most commonly skips. Reading data is easy -- the real test is whether the CLI can CREATE, UPDATE, or GENERATE something.
# For CRUD apps (Monday, Notion, Jira):
cli-web-<app> --json <resource> create --name "smoke-test-$(date +%s)"
cli-web-<app> --json <resource> list # verify the created item appears
cli-web-<app> --json <resource> delete --id <id-from-create>
# For generation apps (Suno, Midjourney, NotebookLM audio):
cli-web-<app> --json <resource> generate --prompt "test" --wait
# Verify: JSON response contains a real ID, status=complete, not an error
# If the command has --output, verify the file was downloaded and size > 0
# For search/query apps:
cli-web-<app> --json search "test query"
# Verify: results array is non-empty
If ANY write/generate command fails, the pipeline is NOT complete. Reading a list of existing items only proves auth works -- it does NOT prove the CLI can actually do useful work. The whole point is to CREATE things, not just read them.
9. Only after steps 5-8 ALL pass, declare the pipeline complete.
auth login works (Python playwright, API key, or N/A for no-auth)auth status shows valid (or N/A for no-auth)--json output (see below)Run every command with --json and check for raw protocol leaks (wrb.fr, af.httprm,
empty [], null required fields). See methodology/SKILL.md "Mandatory Smoke Check" for
the full red flags list.
#1 gap to watch for: Agent runs list (GET with auth — easy), declares done, but
never tests create/generate (POST with CSRF, encoding). Always test at least one write.
After smoke tests pass, these tasks remain — all independent, dispatch in parallel:
┌─ Agent 1: Generate Claude Skill (.claude/skills/<app>-cli/SKILL.md)
│ ALSO copy to cli_web/<app>/skills/SKILL.md (package-portable)
├─ Agent 2: Update repository README.md (add CLI to examples table)
├─ Agent 3: Write/update cli_web/<app>/README.md (package docs)
├─ Agent 4: Update registry.json + CLAUDE.md Generated CLIs table
└─ Agent 5: Add CLI to CI test matrix (.github/workflows/tests.yml)
│ + Add entry to CHANGELOG.md under [Unreleased]
All are independent — launch in one message with run_in_background: true
Use the templates at cli-anything-web-plugin/templates/ as the canonical
structure for SKILL.md and README.md — fill in the {{placeholders}} with
actual CLI data from <app> --help and <APP>.md.
Goal: Create a project-local Claude skill so that Claude can use this CLI automatically in future conversations — no manual lookup required.
IMPORTANT: The skill must exist in TWO locations:
.claude/skills/<app>-cli/SKILL.md — for Claude Code discovery (project-level)<app>/agent-harness/cli_web/<app>/skills/SKILL.md — portable with pip install
(included via package_data in setup.py)Create the skill once, then copy it to both locations.
Create <git-root>/.claude/skills/<app>-cli/SKILL.md:
cli-web-<app> --help + <resource> --help<app>-cli, description with specific trigger phrases
("whenever the user asks about X, Y, Z. Always prefer cli-web-<app> over manually
fetching the website.")--jsonnotebooklm-cli, futbin-cli) as reference examplesAdd the new CLI to the examples table in README.md (CLI name, website, protocol,
auth type, description) and add a quick-start example in the "Try Them" section.
Add the new CLI to registry.json at the repo root:
{
"name": "cli-web-<app>",
"website": "<website>",
"protocol": "<detected protocol>",
"auth": "<auth type>",
"directory": "<app>/agent-harness",
"namespace": "cli_web.<app>",
"commands": ["<cmd1>", "<cmd2>", ...],
"install": "pip install -e <app>/agent-harness"
}
Also add to the Generated CLIs table in CLAUDE.md.
The pipeline is NOT done until ALL of these are checked:
.claude/skills/<app>-cli/SKILL.md exists (Claude Code discovery)cli_web/<app>/skills/SKILL.md exists (portable with pip install)cli-anything-web-plugin/templates/SKILL.md.template as starting pointsetup.py has package_data={"": ["skills/*.md", "*.md"]}__main__.py exists for python -m cli_web.<app> supportcli_web/<app>/README.md exists (used templates/README.md.template)<APP>.md API map existstests/TEST.md has Part 1 (plan) + Part 2 (results)README.md — new row in examples table + "Try them" sectionREADME.md — badge count updated (CLIs_generated-N and N_CLIs hero badge)CLAUDE.md — new row in Generated CLIs tableregistry.json — entry with name, website, protocol, auth, commands, installdocs/registry/index.html — entry added to JS data array with correct categoryCHANGELOG.md — entry added under [Unreleased] → Added.github/workflows/tests.yml — new CLI added to CI test matrix (see below)Every new CLI MUST be added to .github/workflows/tests.yml so unit tests run
on every push/PR. Do both steps — missing either blocks merges.
Step 1: Add to test matrix in .github/workflows/tests.yml:
- { name: <app>, dir: <app>/agent-harness, pkg: <app_underscore> }
Where <app_underscore> replaces hyphens with underscores (e.g., gh-trending → gh_trending).
Step 2: Add to branch protection required checks so PRs require the new check:
# Get current checks, append the new one, update
gh api repos/<owner>/<repo>/branches/main/protection/required_status_checks \
-X PATCH --input - <<EOF
{"strict": true, "contexts": [...existing..., "<app>"]}
EOF
Verify the entry runs: python -m pytest <dir>/cli_web/<pkg>/tests/test_core.py -v
All key rules (naming, auth, --json, REPL, rate limits) are defined in HARNESS.md "Critical Rules" and CLAUDE.md "Critical Conventions".
| Relationship | Skill |
|---|---|
| Preceded by | testing (Phase 3) |
| Followed by | None — this is the final phase |
| References | HARNESS.md (Generated CLI Structure, Naming Conventions) |
testing skill -- Phase 3 test planning/writing/documentationmethodology skill -- Phase 2 analyze/design/implementcapture skill -- Phase 1 traffic recording/cli-anything-web:validate -- Command to run the full 75-check validation