Gate 2 of the development cycle. VALIDATES that observability was correctly implemented by developers. Does not implement observability code - only validates it.
/plugin marketplace add lerianstudio/ring/plugin install ring-dev-team@ringThis skill inherits all available tools. When active, it can use any tool Claude has access to.
This skill VALIDATES that observability was correctly implemented by developers:
Developers IMPLEMENT observability. SRE VALIDATES it.
| Who | Responsibility |
|---|---|
| Developers (Gate 0) | IMPLEMENT observability following Ring Standards |
| SRE Agent (Gate 2) | VALIDATE that observability is correctly implemented |
| Implementation Agent | FIX issues found by SRE (if any) |
If observability is missing or incorrect:
<verify_before_proceed>
REQUIRED INPUT (from dev-cycle orchestrator):
- unit_id: [task/subtask being validated]
- language: [go|typescript|python]
- service_type: [api|worker|batch|cli|library]
- implementation_agent: [agent that did Gate 0]
- implementation_files: [list of files from Gate 0]
OPTIONAL INPUT:
- external_dependencies: [HTTP clients, gRPC clients, queues]
- gate0_handoff: [summary from Gate 0]
- gate1_handoff: [summary from Gate 1]
if any REQUIRED input is missing:
→ STOP and report: "Missing required input: [field]"
→ Return to orchestrator with error
validation_state = {
iteration: 1,
max_iterations: 3,
sre_result: null,
issues: [],
instrumentation_coverage: null
}
<dispatch_required agent="sre" model="opus"> Validate observability implementation for unit_id. </dispatch_required>
Task:
subagent_type: "ring-dev-team:sre"
model: "opus"
description: "Validate observability for [unit_id]"
prompt: |
⛔ VALIDATE Observability Implementation
## Input Context
- **Unit ID:** [unit_id]
- **Language:** [language]
- **Service Type:** [service_type]
- **Implementation Agent:** [implementation_agent]
- **Files to Validate:** [implementation_files]
- **External Dependencies:** [external_dependencies or "None"]
## Standards Reference
WebFetch: https://raw.githubusercontent.com/LerianStudio/ring/main/dev-team/docs/standards/sre.md
## Your Role
- VALIDATE that observability is implemented correctly
- Do not implement - only verify and report
- Check structured JSON logging
- Check OpenTelemetry instrumentation coverage
- Check context propagation for external calls
## Validation Checklist
### 0. FORBIDDEN Logging Patterns (CRITICAL - Check FIRST)
Any occurrence = CRITICAL severity, automatic FAIL verdict.
<forbidden>
- fmt.Println() in Go code
- fmt.Printf() in Go code
- log.Println() in Go code
- log.Printf() in Go code
- log.Fatal() in Go code
- println() in Go code
- console.log() in TypeScript
- console.error() in TypeScript
- console.warn() in TypeScript
</forbidden>
**MUST search for and report all occurrences of FORBIDDEN patterns:**
| Language | FORBIDDEN Pattern | Search For |
|----------|-------------------|------------|
| Go | `fmt.Println()` | `fmt.Println` in *.go files |
| Go | `fmt.Printf()` | `fmt.Printf` in *.go files |
| Go | `log.Println()` | `log.Println` in *.go files |
| Go | `log.Printf()` | `log.Printf` in *.go files |
| Go | `log.Fatal()` | `log.Fatal` in *.go files |
| Go | `println()` | `println(` in *.go files |
| TypeScript | `console.log()` | `console.log` in *.ts files |
| TypeScript | `console.error()` | `console.error` in *.ts files |
| TypeScript | `console.warn()` | `console.warn` in *.ts files |
**If any FORBIDDEN pattern found:**
- Severity: **CRITICAL**
- Verdict: **FAIL** (automatic, no exceptions)
- Each occurrence MUST be listed with file:line
### 1. Structured Logging (lib-commons)
- [ ] Uses `libCommons.NewTrackingFromContext(ctx)` for logger (Go)
- [ ] Uses `initializeLogger()` from lib-common-js (TypeScript)
- [ ] JSON format with timestamp, level, message, service
- [ ] trace_id correlation in logs
- [ ] **no FORBIDDEN patterns** (see check 0 above)
### 2. Instrumentation Coverage (90%+ required)
For [language], check these patterns:
**Go (lib-commons):**
```go
logger, tracer, _, _ := libCommons.NewTrackingFromContext(ctx)
ctx, span := tracer.Start(ctx, "layer.operation")
defer span.End()
```
**TypeScript:**
```typescript
const span = tracer.startSpan('layer.operation');
try { /* work */ } finally { span.end(); }
```
Count spans in:
- Handlers: grep "tracer.Start" in *handler*.go or *controller*.ts
- Services: grep "tracer.Start" in *service*.go or *service*.ts
- Repositories: grep "tracer.Start" in *repo*.go or *repository*.ts
### 3. Context Propagation
For external calls, verify:
- HTTP: InjectHTTPContext (Go) or equivalent
- gRPC: InjectGRPCContext (Go) or equivalent
- Queues: PrepareQueueHeaders (Go) or equivalent
## Required Output Format
### Validation Summary
| Check | Status | Evidence |
|-------|--------|----------|
| Structured Logging | ✅/❌ | [file:line or "not FOUND"] |
| Tracing Enabled | ✅/❌ | [file:line or "not FOUND"] |
| Instrumentation ≥90% | ✅/❌ | [X%] |
| Context Propagation | ✅/❌/N/A | [file:line or "N/A"] |
### Instrumentation Coverage Table
| Layer | Instrumented | Total | Coverage |
|-------|--------------|-------|----------|
| Handlers | X | Y | Z% |
| Services | X | Y | Z% |
| Repositories | X | Y | Z% |
| HTTP Clients | X | Y | Z% |
| gRPC Clients | X | Y | Z% |
| **TOTAL** | X | Y | **Z%** |
### Issues Found (if any)
For each issue:
- **Severity:** CRITICAL/HIGH/MEDIUM/LOW
- **Category:** [Logging|Tracing|Instrumentation|Propagation]
- **Description:** [what's wrong]
- **File:** [path:line]
- **Expected:** [what should exist]
- **Fix Required By:** [implementation_agent]
### Verdict
- **all CHECKS PASSED:** ✅ YES / ❌ no
- **Instrumentation Coverage:** [X%]
- **If no, blocking issues:** [list]
Parse agent output:
1. Extract Validation Summary table
2. Extract Instrumentation Coverage table
3. Extract Issues Found list
4. Extract Verdict
validation_state.sre_result = {
logging_ok: [true/false],
tracing_ok: [true/false],
instrumentation_coverage: [percentage],
context_propagation_ok: [true/false/na],
issues: [list of issues],
verdict: [PASS/FAIL]
}
if validation_state.sre_result.verdict == "PASS"
and validation_state.sre_result.instrumentation_coverage >= 90:
→ Go to Step 8 (Success)
if validation_state.sre_result.verdict == "FAIL"
or validation_state.sre_result.instrumentation_coverage < 90:
→ Go to Step 6 (Dispatch Fix)
if validation_state.iteration >= validation_state.max_iterations:
→ Go to Step 9 (Escalate)
Task:
subagent_type: "[implementation_agent from input]" # e.g., "backend-engineer-golang"
model: "opus"
description: "Fix observability issues for [unit_id]"
prompt: |
⛔ FIX REQUIRED - Observability Issues Found
## Context
- **Unit ID:** [unit_id]
- **Iteration:** [validation_state.iteration] of [validation_state.max_iterations]
- **Your Previous Implementation:** [implementation_files]
## Issues to Fix (from SRE Validation)
[paste issues from validation_state.sre_result.issues]
## Current Instrumentation Coverage
[paste Instrumentation Coverage table from SRE output]
**Required:** ≥90%
**Current:** [validation_state.sre_result.instrumentation_coverage]%
## Standards Reference
For Go: https://raw.githubusercontent.com/LerianStudio/ring/main/dev-team/docs/standards/golang.md
For TS: https://raw.githubusercontent.com/LerianStudio/ring/main/dev-team/docs/standards/typescript.md
Focus on: Telemetry & Observability section
## Required Fixes
### If Logging Issues:
- Replace fmt.Println/console.log with structured logger
- Add trace_id to log context
- Use JSON format
### If Instrumentation Coverage < 90%:
- Add spans to all handlers: `tracer.Start(ctx, "handler.name")`
- Add spans to all services: `tracer.Start(ctx, "service.domain.operation")`
- Add spans to all repositories: `tracer.Start(ctx, "db.operation")`
- Add `defer span.End()` after each span creation
### If Context Propagation Issues:
- Add InjectHTTPContext for outgoing HTTP calls
- Add InjectGRPCContext for outgoing gRPC calls
- Add PrepareQueueHeaders for queue publishing
## Required Output
- Files modified with fixes
- New Instrumentation Coverage calculation
- Confirmation all issues addressed
validation_state.iteration += 1
if validation_state.iteration > validation_state.max_iterations:
→ Go to Step 9 (Escalate)
→ Go back to Step 3 (Dispatch SRE Agent)
Generate skill output:
## Validation Result
**Status:** PASS
**Iterations:** [validation_state.iteration]
**Instrumentation Coverage:** [validation_state.sre_result.instrumentation_coverage]%
## Instrumentation Coverage
[paste final Instrumentation Coverage table]
## Issues Found
None (all resolved)
## Handoff to Next Gate
- SRE validation: COMPLETE
- Logging: ✅ Structured JSON with trace_id
- Tracing: ✅ OpenTelemetry instrumented
- Instrumentation: ✅ [X]% coverage
- Ready for Gate 3 (Testing): YES
Generate skill output:
## Validation Result
**Status:** FAIL
**Iterations:** [validation_state.iteration] (MAX REACHED)
**Instrumentation Coverage:** [validation_state.sre_result.instrumentation_coverage]%
## Instrumentation Coverage
[paste final Instrumentation Coverage table]
## Issues Found
[list remaining unresolved issues]
## Handoff to Next Gate
- SRE validation: FAILED
- Remaining issues: [count]
- Ready for Gate 3 (Testing): no
- **Action Required:** User must manually resolve remaining issues
⛔ ESCALATION: Max iterations (3) reached. User intervention required.
| Severity | Scenario | Gate 2 Status | Action |
|---|---|---|---|
| CRITICAL | Missing all observability (no structured logs) | FAIL | ❌ Return to Gate 0 |
| CRITICAL | fmt.Println/echo instead of JSON logs | FAIL | ❌ Return to Gate 0 |
| CRITICAL | Instrumentation coverage < 50% | FAIL | ❌ Return to Gate 0 |
| CRITICAL | "DEFERRED" appears in validation output | FAIL | ❌ Return to Gate 0 |
| HIGH | Instrumentation coverage 50-89% | NEEDS_FIXES | ⚠️ Fix and re-validate |
| MEDIUM | Missing context propagation | NEEDS_FIXES | ⚠️ Fix and re-validate |
| LOW | Minor logging improvements | PASS | ✅ Note for future |
<block_condition> If any condition is true, STOP and dispatch fix or escalate to user.
| Decision Type | Examples | Action |
|---|---|---|
| HARD BLOCK | Service lacks JSON structured logs | STOP - Dispatch fix to implementation agent |
| HARD BLOCK | Instrumentation coverage < 50% | STOP - Dispatch fix to implementation agent |
| HARD BLOCK | Max iterations reached | STOP - Escalate to user |
<cannot_skip>
| Requirement | Cannot Be Waived By | Rationale |
|---|---|---|
| Gate 2 execution | CTO, PM, "MVP" arguments | Observability prevents production blindness |
| 90% instrumentation coverage | "We'll add spans later" | Later = never. Instrument during implementation. |
| JSON structured logs | "Plain text is enough" | Plain text is unsearchable in production |
See shared-patterns/shared-pressure-resistance.md for universal pressure scenarios.
| User Says | Your Response |
|---|---|
| "Skip SRE validation" | "Observability is MANDATORY. Dispatching SRE agent now." |
| "90% coverage is too high" | "90% is the Ring Standard minimum. Cannot lower." |
| "Will add instrumentation later" | "Instrumentation is part of implementation. Fix now." |
See shared-patterns/shared-anti-rationalization.md for universal anti-rationalizations.
| Rationalization | Why It's WRONG | Required Action |
|---|---|---|
| "OpenTelemetry library is installed" | Installation ≠ Instrumentation | Verify spans exist in code |
| "Middleware handles tracing" | Middleware = root span only | Add child spans in all layers |
| "Small function doesn't need span" | Size is irrelevant | Add span to every function |
| "Only external calls need tracing" | Internal ops need tracing too | Instrument all layers |
| "Feature complete, observability later" | Observability IS completion | Fix NOW before Gate 3 |
| Type | JSON Logs | Tracing | Instrumentation |
|---|---|---|---|
| API Service | REQUIRED | REQUIRED | 90%+ |
| Background Worker | REQUIRED | REQUIRED | 90%+ |
| CLI Tool | REQUIRED | N/A | N/A |
| Library | N/A | N/A | N/A |
## Validation Result
**Status:** [PASS|FAIL|NEEDS_FIXES]
**Iterations:** [N]
**Duration:** [Xm Ys]
## Instrumentation Coverage
| Layer | Instrumented | Total | Coverage |
|-------|--------------|-------|----------|
| Handlers | X | Y | Z% |
| Services | X | Y | Z% |
| Repositories | X | Y | Z% |
| HTTP Clients | X | Y | Z% |
| gRPC Clients | X | Y | Z% |
| **TOTAL** | X | Y | **Z%** |
**Coverage Status:** [PASS (≥90%) | NEEDS_FIXES (50-89%) | FAIL (<50%)]
## Issues Found
- [List by severity or "None"]
## Handoff to Next Gate
- SRE validation status: [complete|needs_fixes|failed]
- Instrumentation coverage: [X%]
- Ready for testing: [YES|no]
This skill should be used when the user asks to "create a slash command", "add a command", "write a custom command", "define command arguments", "use command frontmatter", "organize commands", "create command with file references", "interactive command", "use AskUserQuestion in command", or needs guidance on slash command structure, YAML frontmatter fields, dynamic arguments, bash execution in commands, user interaction patterns, or command development best practices for Claude Code.
This skill should be used when the user asks to "create an agent", "add an agent", "write a subagent", "agent frontmatter", "when to use description", "agent examples", "agent tools", "agent colors", "autonomous agent", or needs guidance on agent structure, system prompts, triggering conditions, or agent development best practices for Claude Code plugins.
This skill should be used when the user asks to "create a hook", "add a PreToolUse/PostToolUse/Stop hook", "validate tool use", "implement prompt-based hooks", "use ${CLAUDE_PLUGIN_ROOT}", "set up event-driven automation", "block dangerous commands", or mentions hook events (PreToolUse, PostToolUse, Stop, SubagentStop, SessionStart, SessionEnd, UserPromptSubmit, PreCompact, Notification). Provides comprehensive guidance for creating and implementing Claude Code plugin hooks with focus on advanced prompt-based hooks API.