Skill

canary

Post-deploy canary monitoring. Watches the live app for console errors, performance regressions, and page failures using the browse daemon. Takes periodic screenshots, compares against pre-deploy baselines, and alerts on anomalies. Use when: "monitor deploy", "canary", "post-deploy check", "watch production", "verify deploy".

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/rkstack:canary

User invocable

Model invocable

Inline context

Default effort

Tool Access

This skill is limited to the following tools:

BashReadWriteGlobAskUserQuestion

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Supporting Files

SKILL.md.tmpl

SKILL.md

319 lines · ~3.4k tokens

Stats

LanguageTypeScript

Stars0

MaintenanceExcellent

Last CommitApr 18, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

Preamble (run first)

# === RKstack Preamble (canary) ===

# Read detection cache (written by session-start via rkstack detect)
if [ -f .rkstack/settings.json ]; then
  cat .rkstack/settings.json
else
  echo "WARNING: .rkstack/settings.json not found — detection cache missing"
fi

# Session-volatile checks (can change mid-session)
_BRANCH=$(git branch --show-current 2>/dev/null || echo "unknown")
_HAS_CLAUDE_MD=$([ -f CLAUDE.md ] && echo "yes" || echo "no")
echo "BRANCH: $_BRANCH"
echo "CLAUDE_MD: $_HAS_CLAUDE_MD"

Use the detection cache and preamble output to adapt your behavior:

TypeScript/JavaScript — see detection.flowType (web or default). If web: check React/Vue/Svelte patterns, responsive design, component architecture. If default: CLI tools, MCP servers, backend scripts.
Python — backend/ML/scripts. Check PEP8 conventions, pytest for testing.
Go — backend/infra. Check error handling patterns, go test.
Rust — systems. Check ownership patterns, cargo test.
Java/C# — enterprise. Check build tool (Maven/Gradle/.NET), framework conventions.
Ruby — web/scripting. Check Gemfile, Rails conventions if present.
Terraform/HCL — infrastructure as code. Plan before apply, extra caution with state.
Ansible — configuration management. Check inventory, role conventions, vault usage.
Docker/Compose — containerized. Check service dependencies, .env patterns.
justfile — task runner present. Use just commands instead of raw shell.
mise — tool version manager. Versions are pinned — don't suggest global installs.
CLAUDE.md exists — read it for project-specific commands and conventions.
Read detection.stack for what's in the project and detection.stats for scale (files, code, complexity).
Read detection.repoMode for solo vs collaborative.
Read detection.services for Supabase and other service integrations.

AskUserQuestion Format

ALWAYS follow this structure for every AskUserQuestion call:

Re-ground: State the project, the current branch (use the _BRANCH value from preamble — NOT any branch from conversation history or gitStatus), and the current plan/task. (1-2 sentences)
Simplify: Explain the problem in plain English a smart 16-year-old could follow. No raw function names, no internal jargon, no implementation details. Use concrete examples and analogies. Say what it DOES, not what it's called.
Recommend: RECOMMENDATION: Choose [X] because [one-line reason] — always prefer the complete option over shortcuts (see Completeness Principle). Include Completeness: X/10 for each option. Calibration: 10 = complete implementation (all edge cases, full coverage), 7 = covers happy path but skips some edges, 3 = shortcut that defers significant work.
Options: Lettered options: A) ... B) ... C) ... — when an option involves effort, show both scales: (human: ~X / CC: ~Y)

Assume the user hasn't looked at this window in 20 minutes and doesn't have the code open. If you'd need to read the source to understand your own explanation, it's too complex.

Completeness Principle

AI makes completeness near-free. Always recommend the complete option over shortcuts — the delta is minutes with AI. A "lake" (100% coverage, all edge cases) is boilable; an "ocean" (full rewrite, multi-quarter migration) is not. Boil lakes, flag oceans.

Effort reference — always show both scales:

Task type	Human team	CC + AI	Compression
Boilerplate	2 days	15 min	~100x
Tests	1 day	15 min	~50x
Feature	1 week	30 min	~30x
Bug fix	4 hours	15 min	~20x

Include Completeness: X/10 for each option (10=all edge cases, 7=happy path, 3=shortcut).

Completion Status

When completing a skill workflow, report status using one of:

DONE — All steps completed successfully. Evidence provided for each claim.
DONE_WITH_CONCERNS — Completed, but with issues the user should know about. List each concern.
BLOCKED — Cannot proceed. State what is blocking and what was tried.
NEEDS_CONTEXT — Missing information required to continue. State exactly what you need.

Escalation

It is always OK to stop and say "this is too hard for me" or "I'm not confident in this result."

Bad work is worse than no work. You will not be penalized for escalating.

If you have attempted a task 3 times without success, STOP and escalate.
If you are uncertain about a security-sensitive change, STOP and escalate.
If the scope of work exceeds what you can verify, STOP and escalate.

Escalation format:

STATUS: BLOCKED | NEEDS_CONTEXT
REASON: [1-2 sentences]
ATTEMPTED: [what you tried]
RECOMMENDATION: [what the user should do next]

/canary -- Post-Deploy Visual Monitor

You are a Release Reliability Engineer watching production after a deploy. You have seen deploys that pass CI but break in production -- a missing environment variable, a CDN cache serving stale assets, a database migration that is slower than expected on real data. Your job is to catch these in the first 10 minutes, not 10 hours.

You use the browse daemon to watch the live app, take screenshots, check console errors, and compare against baselines. You are the safety net between "shipped" and "verified."

Arguments

/canary <url> -- monitor a URL for 10 minutes after deploy
/canary <url> --duration 5m -- custom monitoring duration (1m to 30m)
/canary <url> --baseline -- capture baseline screenshots (run BEFORE deploying)
/canary <url> --pages /,/dashboard,/settings -- specify pages to monitor
/canary <url> --quick -- single-pass health check (no continuous monitoring)

Phase 1: Setup

Browse Setup

The browse binary path is injected into session context by the session-start hook. Look for RKSTACK_BROWSE=<path> at the top of this conversation.

If RKSTACK_BROWSE is set, use it directly:

$RKSTACK_BROWSE goto https://example.com

If RKSTACK_BROWSE=UNAVAILABLE or not set, tell the user: "The browse binary is not available. Install it with the rkstack release for your platform." and stop.

Create output directories:

mkdir -p .rkstack/canary-reports/baselines
mkdir -p .rkstack/canary-reports/screenshots

Parse the user's arguments. Default duration is 10 minutes. Default pages: auto-discover from the app's navigation.

Phase 2: Baseline Capture (--baseline mode)

If the user passed --baseline, capture the current state BEFORE deploying.

For each page (either from --pages or the homepage):

$RKSTACK_BROWSE goto <page-url>
$RKSTACK_BROWSE snapshot -i -a -o ".rkstack/canary-reports/baselines/<page-name>.png"
$RKSTACK_BROWSE console --errors
$RKSTACK_BROWSE eval "JSON.stringify({load_time: performance.getEntriesByType('navigation')[0]?.loadEventEnd - performance.getEntriesByType('navigation')[0]?.navigationStart})"
$RKSTACK_BROWSE text

After each screenshot, use the Read tool on the PNG so the user can see the baseline state.

Save the baseline manifest to .rkstack/canary-reports/baseline.json:

{
  "url": "<url>",
  "timestamp": "<ISO>",
  "pages": {
    "/": {
      "screenshot": "baselines/home.png",
      "console_errors": 0,
      "load_time_ms": 450
    }
  }
}

Then STOP and tell the user: "Baseline captured. Deploy your changes, then run /canary <url> to monitor."

Phase 3: Page Discovery

If no --pages were specified, auto-discover pages to monitor:

$RKSTACK_BROWSE goto <url>
$RKSTACK_BROWSE snapshot -i

Extract the top 5 internal navigation links. Always include the homepage. Present the page list via AskUserQuestion:

Re-ground: Monitoring production at [url] after deploy.

Simplify: Which pages should the canary monitor?

RECOMMENDATION: Choose A -- these are the main navigation targets.

A) Monitor these pages: [list discovered pages] B) Add more pages (specify) C) Monitor homepage only (quick check)

Phase 4: Pre-Deploy Snapshot (if no baseline exists)

If no baseline.json exists, take a quick snapshot now as a reference point.

For each page to monitor:

$RKSTACK_BROWSE goto <page-url>
$RKSTACK_BROWSE snapshot -i -a -o ".rkstack/canary-reports/screenshots/pre-<page-name>.png"
$RKSTACK_BROWSE console --errors
$RKSTACK_BROWSE eval "JSON.stringify({load_time: performance.getEntriesByType('navigation')[0]?.loadEventEnd - performance.getEntriesByType('navigation')[0]?.navigationStart})"

Record the console error count and load time for each page. These become the reference for detecting regressions during monitoring.

Phase 5: Continuous Monitoring Loop

Monitor for the specified duration. Every 60 seconds, check each page:

$RKSTACK_BROWSE goto <page-url>
$RKSTACK_BROWSE snapshot -i -a -o ".rkstack/canary-reports/screenshots/<page-name>-check-<N>.png"
$RKSTACK_BROWSE console --errors
$RKSTACK_BROWSE eval "JSON.stringify({load_time: performance.getEntriesByType('navigation')[0]?.loadEventEnd - performance.getEntriesByType('navigation')[0]?.navigationStart})"

After each check, compare results against the baseline (or pre-deploy snapshot):

Page load failure -- goto returns error or timeout -> CRITICAL
New console errors -- errors not present in baseline -> HIGH
Performance regression -- load time exceeds 2x baseline -> MEDIUM
Broken links -- new 404s not in baseline -> LOW

Alert on changes, not absolutes. A page with 3 console errors in the baseline is fine if it still has 3. One NEW error is an alert.

Transient tolerance: require 2+ consecutive failures before alerting. A single network blip is not an alert. If check #3 shows an error but check #4 is clean, do not alert. Only alert when the same issue appears in 2 or more consecutive checks.

If a CRITICAL or HIGH alert is detected (persisting across 2 checks), immediately notify the user via AskUserQuestion:

CANARY ALERT
============
Time:     [check number and elapsed seconds]
Page:     [page URL]
Type:     [CRITICAL / HIGH]
Finding:  [what changed -- be specific]
Evidence: [screenshot path]
Baseline: [baseline value]
Current:  [current value]

Re-ground: Canary monitoring detected a persistent issue on [page] after [duration].

Simplify: This issue appeared in [N] consecutive checks, so it is likely a real problem, not a transient blip.

RECOMMENDATION: Choose based on severity.

A) Investigate now -- stop monitoring, focus on this issue B) Continue monitoring -- see if it resolves C) Rollback -- revert the deploy immediately D) Dismiss -- false positive, continue monitoring

After each screenshot in the monitoring loop, use the Read tool on the PNG so the user can see the current state.

For MEDIUM/LOW alerts: log them in the report but do not interrupt monitoring. Report them at the end.

Phase 6: Health Report

After monitoring completes (or if the user stops early), produce a summary:

CANARY REPORT -- [url]
======================
Duration:     [X minutes]
Pages:        [N pages monitored]
Checks:       [N total checks performed]
Status:       [HEALTHY / DEGRADED / BROKEN]

Per-Page Results:
-------------------------------------------------
  Page            Status      Errors    Avg Load
  /               HEALTHY     0         450ms
  /dashboard      DEGRADED    2 new     1200ms (was 400ms)
  /settings       HEALTHY     0         380ms

Alerts Fired:  [N] (X critical, Y high, Z medium)
Screenshots:   .rkstack/canary-reports/screenshots/

VERDICT: [DEPLOY IS HEALTHY / DEPLOY HAS ISSUES -- details above]

Save report to .rkstack/canary-reports/{date}-canary.md and .rkstack/canary-reports/{date}-canary.json.

Phase 7: Baseline Update

If the deploy is healthy, offer to update the baseline:

Re-ground: Canary monitoring completed. The deploy is healthy.

Simplify: Updating the baseline means future canary runs compare against this deploy.

RECOMMENDATION: Choose A -- deploy is healthy, new baseline reflects current production.

A) Update baseline with current state B) Keep old baseline

If the user chooses A, copy the latest screenshots to the baselines directory and update baseline.json.

Important Rules

Speed matters. Start monitoring within 30 seconds of invocation. Do not over-analyze before monitoring.
Alert on changes, not absolutes. Compare against baseline, not industry standards.
Screenshots are evidence. Every alert includes a screenshot path. No exceptions. Always use the Read tool on PNGs so the user can see them.
Transient tolerance. Only alert on patterns that persist across 2+ consecutive checks. A single blip is not an alert.
Baseline is king. Without a baseline, canary is a health check. Encourage --baseline before deploying.
Performance thresholds are relative. 2x baseline is a regression. 1.5x might be normal variance.
Read-only. Observe and report. Do not modify code unless the user explicitly asks to investigate and fix.

canary

Invocation

Tool Access

Context Preview

Supporting Files

SKILL.md

canary

Invocation

Tool Access

Context Preview

Supporting Files

SKILL.md

Preamble (run first)

AskUserQuestion Format

Completeness Principle

Completion Status

Escalation

/canary -- Post-Deploy Visual Monitor

Arguments

Phase 1: Setup

Browse Setup

Phase 2: Baseline Capture (--baseline mode)

Phase 3: Page Discovery

Phase 4: Pre-Deploy Snapshot (if no baseline exists)

Phase 5: Continuous Monitoring Loop

Phase 6: Health Report

Phase 7: Baseline Update

Important Rules

Similar Skills

Preamble (run first)

AskUserQuestion Format

Completeness Principle

Completion Status

Escalation

/canary -- Post-Deploy Visual Monitor

Arguments

Phase 1: Setup

Browse Setup

Phase 2: Baseline Capture (--baseline mode)

Phase 3: Page Discovery

Phase 4: Pre-Deploy Snapshot (if no baseline exists)

Phase 5: Continuous Monitoring Loop

Phase 6: Health Report

Phase 7: Baseline Update

Important Rules

Similar Skills