From agent-almanac
Probes runtime state of feature flags in CLI binaries using binary strings, live invocation, on-disk state, and platform cache. Classifies as LIVE/DARK/INDETERMINATE/UNKNOWN for rollout verification, dark-launch audits, and prior probe refresh.
npx claudepluginhub pjt222/agent-almanacThis skill is limited to using the following tools:
Determine whether a named feature flag in a shipped CLI binary is LIVE, DARK, INDETERMINATE, or UNKNOWN, using a four-pronged evidence protocol that pairs every state claim with a specific observation.
Exhaustively extracts feature flag candidates from binary namespaces via grep and bash, inventories occurrences with gate/telemetry tags, cross-references documented sets, tracks completeness until zero undocumented remainder. For verifiable full catalogs before probing.
Checks and configures OpenFeature feature flag infrastructure with providers (GOFF, flagd, LaunchDarkly, Split) in JavaScript, Python, Go, Rust projects. Audits setup, adds SDKs, relay proxies for Docker/Kubernetes.
Adds PostHog feature flags to gate new functionality for safe rollouts after feature implementation or PR reviews. Handles initial SDK setup if needed across multiple platforms.
Share bugs, ideas, or general feedback.
Determine whether a named feature flag in a shipped CLI binary is LIVE, DARK, INDETERMINATE, or UNKNOWN, using a four-pronged evidence protocol that pairs every state claim with a specific observation.
monitor-binary-version-baselines) markers and need to classify each candidate flag's rollout state before moving to Phase 4 wire capture.Extract the candidate flag name from the bundle to confirm it actually exists as a string literal. Without this, all later prongs are probing thin air.
# Locate the bundle (common shapes: .js, .mjs, .bun, packaged binary)
BUNDLE=/path/to/cli/bundle.js
FLAG=acme_widget_v3 # synthetic placeholder — replace with the candidate
# Confirm the literal exists
grep -c "$FLAG" "$BUNDLE"
# Capture every line where it appears, with surrounding context for Step 2
grep -n -C 3 "$FLAG" "$BUNDLE" > /tmp/flag-context.txt
wc -l /tmp/flag-context.txt
Inspect /tmp/flag-context.txt and tag each occurrence as one of:
gate("$FLAG", default), isEnabled("$FLAG"), flag("$FLAG", ...)).process.env.X (or equivalent) lookup.Expected: at least one occurrence of the flag string in the bundle, and each occurrence tagged with its call-site role.
On failure: if grep -c returns 0, the flag is not in this build. Either the input name is wrong (typo, wrong namespace) or the flag was removed in this version. Re-check Phase 1 marker output, then either correct the input or classify as REMOVED and stop.
The same string can appear as a gate, a telemetry event name, an env var, or all three. The classification depends on call-site, not on the string. Mistaking a telemetry name for a gate produces nonsense reasoning ("this gate must be off") about something that was never a gate.
For each tagged occurrence from Step 1:
gate("$FLAG", false) defaults the flag to off; gate("$FLAG", true) defaults it to on). Record both the literal default and the gate function name.UNKNOWN (name present but not a gate).if (process.env.X) { return null; } is a kill switch; if (process.env.X) { enable(); } is an opt-in.Expected: for every occurrence, a definite call-site role and (for gate-calls) the recorded default value.
On failure: if a gate-call's surrounding context is too minified to read the default, expand the grep context (-C 10) and inspect the full callee. If the default still cannot be determined, record it as default=? and downgrade any LIVE/DARK conclusion to INDETERMINATE.
Run the harness in an authenticated session you control and observe whether the gated capability surfaces. This is the single highest-signal prong: the bundle says what can happen, the runtime shows what does happen.
Pick a probe action that would reveal the gate-pass — typically the user-visible behavior the gate guards (a tool appearing in a tool list, a command flag becoming valid, a UI element rendering, an output field appearing in a response).
# Example shape — adapt to the harness
$CLI --list-capabilities | grep -i widget # does the gated capability appear?
$CLI --help 2>&1 | grep -i "$FLAG" # is a flag-related option exposed?
$CLI run-some-command --debug 2>&1 | tee probe-runtime.log
Record one of three outcomes:
LIVE.DARK; default-true → re-check, this is suspicious).INDETERMINATE.Expected: a recorded probe action, the observed outcome, and the candidate classification it points to.
On failure: if the probe action itself errors (auth failure, network unreachable, wrong subcommand), the runtime prong is unusable for this round. Fix the session or pick a different probe action; do not infer DARK from a runtime that never ran.
Many harnesses persist gate evaluations or override values to disk so they need not be re-fetched. Inspecting this state shows what the harness believed about the flag at last evaluation.
Common locations (adapt to the harness — these are shapes, not specific paths):
# User-level config
ls ~/.config/<harness>/ 2>/dev/null
ls ~/.<harness>/ 2>/dev/null
# Per-project state
ls .<harness>/ 2>/dev/null
# Cache directories
ls ~/.cache/<harness>/ 2>/dev/null
# Search any of these for the flag name
grep -r "$FLAG" ~/.config/<harness>/ ~/.cache/<harness>/ .<harness>/ 2>/dev/null
Record each hit's path, the value associated with the flag, and the file's last-modified time. A recently-modified cache entry overriding a binary default is the strongest possible evidence either way.
Expected: either a confirmed override value with timestamp, or a confirmed absence (no on-disk state mentions this flag).
On failure: if you find the flag mentioned but cannot tell whether the recorded value is a cached server response, a user override, or a stale value, flag the entry for Step 5 (platform cache) reconciliation rather than guessing.
If the harness uses an external feature-flag service (LaunchDarkly, Statsig, GrowthBook, vendor-internal, etc.), the locally-cached service response is the authoritative current rollout state. Inspect it where available.
# Look for service-shaped cache files
find ~/.cache ~/.config -name "*flag*" -o -name "*feature*" -o -name "*config*" 2>/dev/null | head
# If a cache file is present, parse it for the flag name
jq ".[] | select(.key == \"$FLAG\")" ~/.cache/<harness>/flags.json 2>/dev/null
Record the cached value, the cache timestamp, and (if present) the cache TTL. A platform cache that says false overrides a binary default of true; a platform cache that says true overrides a binary default of false.
Expected: either a definite cached value with timestamp, or confirmed absence of a flag-service cache for this harness.
On failure: if the harness has no flag-service or you cannot locate the cache, this prong contributes nothing — that is acceptable. Note "Prong D: not applicable" in the evidence table; do not guess.
Some capabilities are guarded by multiple flags that must all be true: gate("A") && gate("B") && gate("C"). Any one being DARK is sufficient to make the capability DARK, but the per-flag classification still belongs to each flag individually.
# After finding the gate-call site for the primary flag in Step 2, scan the
# enclosing predicate for other gate(...) calls
grep -n -C 5 "$FLAG" "$BUNDLE" | grep -oE 'gate\("[^"]+"' | sort -u
For each co-gate string surfaced:
Expected: every conjunct identified and individually classified, plus a derived capability-level classification.
On failure: if the predicate is too minified to enumerate cleanly (call site is inlined or wrapped), record the conjunction as "≥1 additional gate, structure unreadable" and downgrade the capability-level classification to INDETERMINATE even if the primary flag looks LIVE.
A flag may legitimately be DARK while the user-facing capability it would unlock is reachable through a different, fully-supported route — a different command, a user-invocable skill, an alternate API. The honest finding "flag DARK, capability LIVE via substitution" is common and important; missing it produces panicked dark-launch reports about capabilities users actually have.
For any candidate classification of DARK or INDETERMINATE, ask:
If yes to any, append a substitution: note to the evidence row recording the alternate route and its observability (how a user reaches it, whether it is documented).
Expected: for every DARK / INDETERMINATE classification, an explicit substitution check — either the route, or the explicit note "no substitution route identified."
On failure: if you suspect a substitution exists but cannot confirm the route, mark "substitution suspected; not confirmed" rather than asserting either way.
Combine the four prongs into a single table. Every state claim must be paired with the observation that supports it; re-running the probe at a new version produces a diff-able artifact.
| Field | Value |
|---|---|
| Flag | acme_widget_v3 (synthetic placeholder) |
| Binary version | <version-id> |
| Probe date | YYYY-MM-DD |
| Prong A — strings | present (3 occurrences: 1 gate-call default=false, 2 telemetry) |
| Prong B — runtime | gate-pass not observed in capability list |
| Prong C — on-disk | no override found in ~/.config/<harness>/ |
| Prong D — platform cache | service cache absent / not applicable |
| Conjunction | none — single-gate predicate |
| Substitution | user-invokable widget slash command delivers equivalent UX |
| Final state | DARK (capability LIVE via substitution) |
Apply the classification rules:
false, no prong observed gate-pass, no override flips it on.Save the table as a probe artifact (e.g., probes/<flag>-<version>.md) so future probes diff against it.
Expected: a complete evidence table covering all four prongs, conjunction status, substitution status, and a single final classification.
On failure: if no prong yields a usable signal (binary cannot be read, runtime cannot be invoked, on-disk and platform cache both absent), do not invent a classification. Record INDETERMINATE with the reason "no prong yielded signal" and stop.
redact-for-public-disclosure).emit("$FLAG", ...) is a label, not a gate. A flag that is "telemetry-only" has no rollout state and should be classified UNKNOWN, not DARK.default=false) is not the same as runtime evidence (the capability did not appear). A flag with default-false in the binary may be flipped to true by a server-side override; only the runtime probe shows what the session actually got.default=true while ignoring the surrounding && gate("B") && gate("C") produces a falsely confident LIVE for a capability that is actually gated by B or C.monitor-binary-version-baselines — Phase 1 of the parent guide; the marker tracking this skill builds on supplies the candidate flag inventory.conduct-empirical-wire-capture — Phase 4; deeper runtime evidence (network capture, lifecycle hooks) when Prong B's surface-level probe is insufficient.security-audit-codebase — dark-launched code is part of attack-surface archaeology; this skill is the discovery half of that audit.redact-for-public-disclosure — Phase 5; the redaction discipline that decides which probe artifacts can leave the private workspace.