From harness-claude
Detects feature flag providers and usage in code, catalogs definitions and categories, audits staleness, recommends naming conventions, A/B testing, and rollout strategies.
npx claudepluginhub intense-visions/harness-engineering --plugin harness-claudeThis skill uses the workspace's default tool permissions.
> Flag lifecycle management, A/B testing infrastructure, and gradual rollout design. Ship features safely with controlled exposure and clean retirement.
Guides feature flag design, gradual rollouts, A/B testing, kill switches. Recommends LaunchDarkly, Unleash, Flagsmith based on flag count and needs for safer deployments.
Identifies and cleans up stale feature flags in PostHog projects by detecting unused or fully rolled out flags, checking dependencies and experiments, and guiding safe removal workflows.
Implements feature flags for gradual rollouts, A/B testing, canary deployments, and kill switches. Provides TypeScript services, React hooks, analytics integration, LaunchDarkly-style SDK, and admin UI.
Share bugs, ideas, or general feedback.
Flag lifecycle management, A/B testing infrastructure, and gradual rollout design. Ship features safely with controlled exposure and clean retirement.
Identify flag provider. Scan the project for feature flag infrastructure:
launchdarkly-node-server-sdk), Unleash (unleash-client), Flagsmith (flagsmith-nodejs), Split (@splitsoftware/splitio)*feature-flag*, *toggle*, *flags*flags.json, features.json, feature-flags.yamlFEATURE_*, FF_*)Catalog all flag definitions. For each flag found, record:
Map flag usage in code. For each flag, find all evaluation points:
Detect flag categories. Classify each flag:
Present detection summary:
Feature Flag Detection:
Provider: LaunchDarkly (SDK v7.x)
Flags defined: 23
Flags evaluated in code: 19 (4 defined but unused)
Categories: 12 release, 5 experiment, 4 ops, 2 permission
Stale candidates: 6 (release flags older than 90 days)
Evaluate flag naming convention. Check for consistency:
{team}.{feature}.{variant} (e.g., payments.stripe-v2.enabled)flag_123 or test_flag)ops. for operational flags)Design flag evaluation architecture. Recommend patterns for:
Design rollout strategy. For release flags:
Design A/B testing infrastructure. For experiment flags:
Design fallback behavior. Ensure resilience:
Check test coverage per flag. For each flag, verify:
Validate flag consistency. Check for common issues:
Check for flag coupling. Identify problematic patterns:
Validate rollout configuration. For active rollouts:
Generate validation report:
Flag Validation: [PASS/WARN/FAIL]
Test coverage: WARN (3 flags missing flag-off tests)
Consistency: PASS (all code references match definitions)
Coupling: WARN (2 flags have interdependencies)
Rollout config: PASS (active rollouts correctly configured)
Issues:
1. payments.stripe-v2 -- no test for flag-off fallback path
2. search.new-algorithm + search.reranking -- coupled flags
3. checkout.express -- missing kill switch test
Identify stale flags. Flag candidates for removal:
Generate cleanup plan. For each stale flag:
Assess technical debt. Calculate flag burden:
Recommend lifecycle policies. Design governance:
Generate lifecycle report:
Flag Lifecycle Report:
Active flags: 23
Stale flags: 6 (candidates for removal)
Technical debt: ~340 lines of dead code across 6 stale flags
Cleanup plan:
1. payments.old-checkout -- at 100% for 45 days, 4 files, ~80 LOC to remove
2. search.v1-algorithm -- experiment ended 60 days ago, 2 files, ~40 LOC
3. onboarding.legacy-flow -- at 100% for 90 days, 6 files, ~120 LOC
4. notifications.email-v1 -- unused for 120 days, 3 files, ~50 LOC
5. dashboard.old-charts -- at 100% for 35 days, 2 files, ~30 LOC
6. auth.password-reset-v1 -- at 100% for 60 days, 1 file, ~20 LOC
Recommendations:
- Adopt 90-day maximum lifetime policy for release flags
- Add flag expiry dates to LaunchDarkly metadata
- Schedule monthly flag cleanup sprint
- Target: reduce active flag count from 23 to 17
harness skill run harness-feature-flags -- Primary invocation for flag analysis and lifecycle management.harness validate -- Run after flag cleanup to verify project health.harness check-deps -- Verify flag provider SDK dependencies are installed.emit_interaction -- Present lifecycle report and gather decisions on cleanup priorities.| Rationalization | Why It Is Wrong |
|---|---|
| "This release flag has been at 100% for a while, but removing it is risky" | Release flags at 100% for more than 30 days are stale candidates. Every stale flag adds dead code branches and test matrix complexity. |
| "We only need to test the flag-on path since that is the path we ship" | No flags without test coverage for both paths. The flag-off path IS the fallback when the flag provider is unreachable. |
| "These two flags depend on each other, but they work fine together" | No coupled flag dependencies is a blocking finding. Flags that require other flags creates combinatorial complexity. |
| "Setting the flag default to true makes the rollout easier" | Every flag must default to safe (feature disabled). A default of true means a provider outage enables the feature for everyone. |
| "We do not need a naming convention -- our flag count is small" | Inconsistent naming becomes unmanageable as flag count grows. The skill flags inconsistency as a warning even at small scale. |
Phase 1: DETECT
Provider: LaunchDarkly (React SDK v3.x)
Flags defined: 15 (in LaunchDarkly dashboard)
Flags in code: 13 (2 unused in dashboard)
Categories: 8 release, 3 experiment, 2 ops
Phase 2: DESIGN
Naming: WARN -- inconsistent (mix of camelCase and kebab-case)
Recommended convention: team.feature.variant (kebab-case)
Evaluation: Client-side (React SDK), 200ms init timeout
Default values: WARN -- 2 flags default to true (should default to false)
Rollout: checkout.express-pay at 25% (targeting premium users)
Phase 3: VALIDATE
Test coverage: WARN -- 4 flags missing useFlags mock in component tests
Consistency: PASS
Coupling: PASS (no interdependent flags)
Kill switch: PASS (all release flags can be instantly disabled)
Phase 4: LIFECYCLE
Stale: 3 flags at 100% for 30+ days
Cleanup estimate: 180 LOC across 8 files
Recommendation: Remove search.v2-results (at 100% for 65 days, 3 components)
Result: WARN -- 3 stale flags, 4 missing tests, 2 unsafe defaults
Phase 1: DETECT
Provider: Custom implementation (FeatureFlagService.java)
Flag store: PostgreSQL table (feature_flags)
Flags defined: 28
Flags in code: 24 (4 in database but unreferenced)
Categories: 15 release, 6 experiment, 5 ops, 2 permission
Phase 2: DESIGN
Architecture: Server-side evaluation with 60s cache refresh
Concern: No SDK resilience -- database outage disables all flags
Recommend: Add in-memory cache with stale-while-revalidate pattern
Recommend: Add health check endpoint for flag service status
Rollout: Using percentage field in database -- no segment targeting
Recommend: Add user segment support for targeted rollouts
Phase 3: VALIDATE
Test coverage: WARN -- @FeatureFlag annotation used but no test helper
to toggle flags in test context (tests rely on database state)
Consistency: WARN -- 4 database entries have no code references
Coupling: FAIL -- 3 flags form a dependency chain
(billing.new-pricing requires billing.tax-calc-v2 requires billing.currency-v3)
Recommend: Consolidate into single billing.pricing-v3 flag
Phase 4: LIFECYCLE
Stale: 8 flags enabled for all users for 60+ days
Technical debt: ~520 LOC of dead code
Flag count: 28 (above recommended 20 per service)
Recommendation: Immediate cleanup sprint targeting 8 stale flags
Recommendation: Enforce 60-day lifetime policy going forward
Result: FAIL -- flag coupling detected, high stale flag count