Analyzes captured HTTP traffic from raw-traffic.json, identifies protocols and endpoints, designs Click CLI architecture, and implements Python CLI package for API wrappers.
From cli-anything-webnpx claudepluginhub itamarzand88/cli-anything-web --plugin cli-anything-webThis skill uses the workspace's default tool permissions.
references/auth-strategies.mdreferences/client-architecture-example.pyreferences/exception-hierarchy-example.pyreferences/google-batchexecute.mdreferences/helpers-module-example.pyreferences/persistent-context-example.pyreferences/polling-backoff-example.pyreferences/rich-output-example.pyreferences/ssr-patterns.mdreferences/traffic-patterns.mdExecutes pre-written implementation plans: critically reviews, follows bite-sized steps exactly, runs verifications, tracks progress with checkpoints, uses git worktrees, stops on blockers.
Guides idea refinement into designs: explores context, asks questions one-by-one, proposes approaches, presents sections for approval, writes/review specs before coding.
Dispatches parallel agents to independently tackle 2+ tasks like separate test failures or subsystems without shared state or dependencies.
Analyze captured traffic, design the CLI command structure, and implement the complete Python CLI package. This skill owns the core transformation from raw HTTP traffic to a production-ready CLI.
Do NOT start unless:
raw-traffic.json exists (with WRITE operations, or read-only GET-only traffic)If raw-traffic.json is missing or has no WRITE operations, invoke the
capture skill first.
Exception for read-only sites: If the site is genuinely read-only (search engine,
dashboard, analytics viewer with no create/update/delete), the trace may contain only
GET requests. In this case, note "read-only site — no write operations" in <APP>.md
and proceed. The generated CLI will have read-only commands (list, get, search) but
no create/update/delete commands. This is valid.
No-auth sites: If the target site requires no authentication (public API,
no login needed), the "Auth state captured" prerequisite does not apply. Note
"no-auth site" in <APP>.md and proceed.
Goal: Map raw traffic to a structured API model.
Process:
Read traffic-analysis.json first (if it exists alongside raw-traffic.json).
This file is auto-generated by parse-trace.py or mitmproxy-capture.py → analyze-traffic.py and contains
pre-detected protocol type, auth pattern, endpoint grouping, GraphQL operations,
batchexecute RPC IDs, and suggested CLI commands. Use it as a starting point —
verify its findings and fill in anything marked "unknown" by reading raw-traffic.json
manually.
Enhanced analysis (v1.3.0, when captured via mitmproxy-capture.py):
request_sequence: Timeline-ordered requests with auth flow detection (login → token → API calls)session_lifecycle: Cookie inventory, auth cookie identification, session pattern (cookie_auth/token_refresh/no_session)endpoint_sizes: Response body size classification per endpoint (small/medium/large) and total data transferred
These fields are only present when mitmproxy-capture.py was used. If missing (has_timestamps: false), rely on manual analysis.If traffic-analysis.json doesn't exist, run the analyzer:
python ${CLAUDE_PLUGIN_ROOT}/scripts/analyze-traffic.py \
<app>/traffic-capture/raw-traffic.json --summary
Parse raw-traffic.json (for details the analyzer couldn't extract)
Group requests by base path (e.g., /api/v1/boards/, /api/v1/items/)
For each endpoint group, identify:
:id)Identify RPC protocol type -- classify the API transport:
| Protocol | Detection Signal | Client Pattern |
|---|---|---|
| REST | Resource URLs (/api/v1/boards/:id), standard HTTP methods | client.py with method-per-endpoint |
| GraphQL | Single /graphql endpoint, query/mutation in body | client.py with query templates |
| gRPC-Web | application/grpc-web content type, binary payloads | Proto-based client |
| Google batchexecute | batchexecute in URL, f.req= body, )]}'\n prefix | rpc/ subpackage (see references/google-batchexecute.md) |
| Custom RPC | Single endpoint, method name in body, proprietary encoding | Custom codec module |
| Public REST API | Documented /api/ endpoints, OpenAPI spec, JSON responses | Standard client.py with httpx |
| Plain HTML (no framework) | No SPA root, no framework globals, data in <table>/<div> | client.py with httpx + BeautifulSoup4 |
This determines client architecture in Step B -- REST uses simple client.py,
non-REST protocols need a dedicated rpc/ subpackage with encoder/decoder/types.
Detect data model:
Detect auth pattern:
WIZ_global_data),
not in HTTP headers. Requires CDP for initial cookies, HTTP for token extraction.
See references/auth-strategies.md "Browser-Delegated Auth" section.Write <APP>.md -- software-specific SOP document
Output: <APP>.md with API map, data model, auth scheme.
References: traffic-patterns.md, google-batchexecute.md, ssr-patterns.md
Before implementing, read an existing CLI that uses the same protocol as your target. These are battle-tested implementations that solved the same problems you'll face.
| Protocol | Reference CLI | Key files to read |
|---|---|---|
| Google batchexecute | notebooklm/agent-harness/cli_web/notebooklm/ | core/rpc/encoder.py, core/rpc/decoder.py, core/client.py, core/auth.py |
| GraphQL + WAF | booking/agent-harness/cli_web/booking/ | core/client.py (curl_cffi + GraphQL), core/auth.py (WAF tokens) |
| HTML scraping | futbin/agent-harness/cli_web/futbin/ | core/client.py (httpx + BS4), commands/players.py |
| HTML + Cloudflare | producthunt/agent-harness/cli_web/producthunt/ | core/client.py (curl_cffi impersonate) |
| REST API | unsplash/agent-harness/cli_web/unsplash/ | core/client.py, commands/photos.py |
| Simple HTML | gh-trending/agent-harness/cli_web/gh_trending/ | Minimal structure example |
How to use reference CLIs:
core/client.py — understand the request/response patterncore/auth.py — copy the login_browser() pattern exactly for Google appscore/rpc/ (for batchexecute) — understand encoder/decoder, DO NOT reinventcommands/ — see how Click commands are structured, how --json worksutils/helpers.py — see handle_errors(), _resolve_cli(), repl patternsFor batchexecute apps specifically, the notebooklm CLI is your bible:
The agent implementing the CLI MUST read these files before writing code. Use the
Agent tool to dispatch a research agent that reads
the reference implementation while you design the command structure.
Before writing any code, note the command structure in <APP>.md (10 minutes max):
/api/v1/boards/* → boards command group/api/v1/items/* → items command grouplist, GET single → get,
POST → create, PUT/PATCH → update, DELETE → delete)auth login, auth status, auth refresh; credentials at
~/.config/cli-web-<app>/auth.jsonrepl_skin.pyGoal: Generate the complete Python CLI package.
See HARNESS.md "Generated CLI Structure" for the complete package template.
Key points: cli_web/ namespace (NO __init__.py), <app>/ sub-package (HAS __init__.py),
core/, commands/, utils/, tests/ directories.
Before writing implementation code, read ${CLAUDE_PLUGIN_ROOT}/skills/boilerplate/SKILL.md
and follow its instructions to scaffold the core/ modules. This generates exceptions.py,
client.py skeleton, helpers.py, config.py, and (for batchexecute) the rpc/ subpackage.
After scaffolding, review the generated files and customize client.py with actual
endpoint methods from <APP>.md.
exceptions.py -- implement first. Required types: AppError (base), AuthError(recoverable), RateLimitError(retry_after), NetworkError, ServerError(status_code), NotFoundError. See references/exception-hierarchy-example.py for the complete template.
client.py -- HTTP client with exception mapping and auth retry:
httpx (default) — for most sites (REST, GraphQL, batchexecute)curl_cffi — for Cloudflare-protected sites. Uses Chrome TLS fingerprint
impersonation to bypass bot detection without cookies or auth:
from curl_cffi import requests as curl_requests
resp = curl_requests.get(url, impersonate="chrome")
Use curl_cffi when Phase 1 detects Cloudflare (cf-ray header, challenge page).
Add curl_cffi, beautifulsoup4 to setup.py instead of httpx.AuthError, 404→NotFoundError, 429→RateLimitError, 5xx→ServerErrorAuthError(recoverable=True), refresh tokens and retry oncereferences/polling-backoff-example.py)client.notebooks.list(), client.sources.add())references/client-architecture-example.py for the full patternauth.py -- handles token storage, refresh, expiry. Implementation depends on auth type:
For no-auth sites: DO NOT create auth.py, session.py, or auth command groups.
These files are dead code for public APIs and confuse users. The CLI should have
NO auth-related files or commands. The only exception is if the site has optional
auth (e.g., API key for write operations) — in that case, implement a minimal
auth module.
For browser-delegated auth (Google, Microsoft, etc.): Full playwright-cli login flow with cookie domain priority for international users.
See references/auth-strategies.md for all patterns (browser login, cookie priority, API key, env var, context commands).
Store cookies at ~/.config/cli-web-<app>/auth.json with chmod 600.
Anti-bot resilient client construction (when detected in Phase 2):
bl), session IDs (f.sid), or CSRF tokens -- extract dynamically at runtimex-same-domain: 1 for Google apps)references/google-batchexecute.md for the complete Google patternRPC codec subpackage (for non-REST protocols like batchexecute):
When the API uses a non-REST protocol, add core/rpc/ with:
types.py -- method ID enum, URL constantsencoder.py -- request encoding (protocol-specific format)decoder.py -- response decoding (strip prefix, parse chunks, extract results)
The client.py still exists but delegates encoding/decoding to rpc/.Progress feedback -- Use rich>=13.0 spinners for operations >2s (suppress in --json mode). See references/rich-output-example.py.
JSON error output -- --json mode errors are JSON too, not plain text. Standard codes: AUTH_EXPIRED, RATE_LIMITED, NOT_FOUND, SERVER_ERROR, NETWORK_ERROR. Implement via utils/output.py json_error().
All commands use handle_errors(json_mode) context manager — centralizes error handling, exit codes (1=user, 2=system, 130=interrupt), and JSON errors. See references/helpers-module-example.py.
Generation commands support --wait, --retry N, --output path — for agent-scriptable end-to-end workflows. See references/polling-backoff-example.py.
Windows UTF-8 fix — Add at the top of <app>_cli.py before any imports that print:
import sys
if sys.stdout.encoding and sys.stdout.encoding.lower() not in ("utf-8", "utf8"):
try: sys.stdout.reconfigure(encoding="utf-8", errors="replace")
except AttributeError: pass
HTML table parsers MUST extract ALL visible columns — not just name/price,
because missing fields in --json output make the CLI useless for filtering and analysis.
If the site shows version, club, nation, stats, skills, weak foot — parse all of them.
Empty fields in --json output = incomplete parser.
Entry point: cli-web-<app> via setup.py console_scripts
Namespace: cli_web.*
Copy repl_skin.py from plugin for consistent REPL experience
utils/helpers.py -- shared CLI helpers (generate for every CLI):
resolve_partial_id(partial, items) — prefix-match UUIDs for get/rename/deletehandle_errors(json_mode) — context manager replacing try/except in all commandsrequire_notebook(notebook_arg) — gets notebook ID from arg or persistent contextsanitize_filename(name) — safe filenames from artifact titlespoll_until_complete(check_fn) — exponential backoff pollingget_context_value(key) / set_context_value(key, value) — persistent context.json
See references/helpers-module-example.py for the complete module.Not all helpers apply to every CLI. Include only what the CLI uses:
handle_errorsandprint_jsonare always needed.resolve_partial_idonly for UUID-based apps.require_notebook/context helpers only for apps with persistent context.poll_until_completeonly for generation/async operations.
These three bugs appear in almost every generated REPL. Get them right the first time:
1. Use shlex.split(), never line.split()
# ✓ Correct — handles quoted args: players search "messi" -> ['players', 'search', 'messi']
import shlex
args = shlex.split(line)
# ✗ Wrong — produces: ['players', 'search', '"messi"'] — quotes become part of the value
args = line.split()
2. Never pass **ctx.params to cli.main() in REPL dispatch
# ✓ Correct — preserve --json flag by prepending to args
repl_args = ["--json"] + args if ctx.obj.get("json") else args
cli.main(args=repl_args, standalone_mode=False)
# ✗ Wrong — ctx.params = {"json_mode": False} gets passed to Context.__init__()
# which doesn't accept it → TypeError: Context.__init__() got an unexpected
# keyword argument 'json_mode'
cli.main(args=args, standalone_mode=False, **ctx.params)
3. Keep _print_repl_help() in sync with the actual command surface
The _print_repl_help() function in <app>_cli.py is the user's first discovery surface — it's what they see when they type help in the REPL. It must mirror the real commands, including all key options. A REPL that shows outdated or incomplete help is confusing and makes the CLI feel broken.
# ✓ Correct — help lists actual options users can pass
def _print_repl_help():
_skin.info("Available commands:")
print(" players list [OPTIONS]")
print(" --position <GK|ST|CM|...> Filter by position")
print(" --rating-min N --rating-max N Rating range")
print(" --cheapest Sort cheapest first")
# ✗ Wrong — stale help doesn't mention new --position, --rating-min, etc.
def _print_repl_help():
print(" players list [--min-price N] List players with filters")
Rule: every time you add options to a command, update _print_repl_help() in the same commit.
4. Use @click.argument for positional REPL params, not @click.option("--x", required=True)
REPL commands show players search <query> in help. If query is a --query option,
users typing players search messi get "Error: Missing option '--query'".
Use positional arguments for natural command-line style:
# ✓ Correct — users type: players search messi OR players get 21610
@players.command()
@click.argument("query")
def search(query): ...
@players.command()
@click.argument("player_id", type=int)
def get(player_id): ...
# ✗ Wrong — users get an error unless they type: players search --query messi
@players.command()
@click.option("--query", required=True)
def search(query): ...
Rule of thumb: if a command takes a single required value that would be a positional arg
in a shell command (git checkout main, grep pattern), use @click.argument.
Use @click.option only for optional or named parameters (--rating-min, --platform).
When the CLI has 3+ command groups (e.g., notebooks, sources, chat, artifacts), dispatch parallel subagents -- one per command module. Each agent gets:
<APP>.md API spec for its resourceclient.py and auth.py interfaces it depends oncommands/notebooks.py with list, get, create, delete"Parallelization opportunities:
| Independent from each other | Dispatch in parallel |
|---|---|
commands/notebooks.py, commands/sources.py, commands/chat.py | Yes -- each command file only depends on client.py |
rpc/encoder.py and rpc/decoder.py | Yes -- encoder doesn't depend on decoder |
auth.py and models.py | Yes -- no shared logic |
client.py and commands/* | No -- commands depend on client |
<app>_cli.py (entry point) | Last -- imports all commands, write after they're done |
Implementation order (with maximum parallelism):
Phase A (sequential): Write core foundation
exceptions.py → client.py → auth.py (if needed) → models.py
Phase B (parallel): Dispatch ALL independent work simultaneously
┌─ Agent 1: commands/notebooks.py
├─ Agent 2: commands/sources.py
├─ Agent 3: commands/chat.py
├─ Agent 4: commands/artifacts.py
├─ Agent 5: rpc/encoder.py + rpc/decoder.py (if non-REST)
└─ Agent 6 (background): test_core.py (unit tests for core modules)
All run concurrently — each only depends on Phase A modules
Phase C (sequential): Wire everything together
utils/helpers.py → <app>_cli.py → __main__.py → setup.py → copy repl_skin.py
Key parallelism rules:
commands/*.py file)<app>_cli.py, setup.py) must come last (depends on all commands)Before invoking testing, install (pip install -e .) and verify:
cli-web-<app> --help loadscli-web-<app> auth status --json shows valid (if auth-required)cli-web-<app> <resource> list --json returns real dataRed flags — fix before testing:
wrb.fr, af.httprm in output → decoder broken[] or null where data expected → wrong params or client-side operationreferences/google-batchexecute.md "Client-Side Operations"Update phase state:
python ${CLAUDE_PLUGIN_ROOT}/scripts/phase-state.py complete <app> \
--phase methodology --output <app>/agent-harness/
When implementation is complete and the smoke check passes, invoke the testing
skill to plan and write tests.
Do NOT skip testing -- every CLI must have comprehensive tests before publishing.
| Skill | When it activates |
|---|---|
capture | Phase 1 -- traffic recording (prerequisite for this skill) |
testing | Phase 3 -- test writing, documentation |
standards | Phase 4 -- publish, verify, smoke test |
| Relationship | Skill |
|---|---|
| Preceded by | capture (Phase 1) |
| Followed by | testing (Phase 3) |
| References | traffic-patterns.md, auth-strategies.md, google-batchexecute.md, ssr-patterns.md, exception-hierarchy-example.py, client-architecture-example.py, polling-backoff-example.py, rich-output-example.py |
references/traffic-patterns.md -- Common API patterns (REST, GraphQL, RPC)references/auth-strategies.md -- Auth implementation strategiesreferences/google-batchexecute.md -- Google batchexecute RPC protocol specreferences/ssr-patterns.md -- SSR framework patterns and data extraction strategiesreferences/exception-hierarchy-example.py -- Complete exception hierarchy with HTTP status mappingreferences/client-architecture-example.py -- Namespaced sub-client pattern with auth retryreferences/polling-backoff-example.py -- Exponential backoff polling and rate-limit retryreferences/rich-output-example.py -- Rich progress bars, JSON error responses, table formatting