Search everything...

Skill

red-team

Adversarial course design audit across 5 dimensions: alignment stress test, evidence verification, cognitive load analysis, learner persona simulation, and prerequisite chain integrity. Produces a confidence score (0-100). Runs in a clean-context sub-agent so synthesis is unbiased by build history. Works standalone or reads from the idstack project manifest. (idstack)

npx claudepluginhub savvides/idstack

Tool Access

This skill is limited to using the following tools:

BashReadWriteEditGlobGrepAskUserQuestionWebSearchAgent

Preview

Supporting Assets

SKILL.md.tmpl

SKILL.md

Similar Skills

ui-ux-pro-max

72.7k

Provides UI/UX resources: 50+ styles, color palettes, font pairings, guidelines, charts for web/mobile across React, Next.js, Vue, Svelte, Tailwind, React Native, Flutter. Aids planning, building, reviewing interfaces.

ui-ux-pro-max

context7-mcp

51.8k

Fetches up-to-date documentation from Context7 for libraries and frameworks like React, Next.js, Prisma. Use for setup questions, API references, and code examples.

context7-plugin

applying-brand-guidelines

41.6k

Applies Acme Corporation brand guidelines including colors, fonts, layouts, and messaging to generated PowerPoint, Excel, and PDF documents.

3 files

anthropics-claude-cookbooks

Stats

Stars15

Forks5

Last CommitMay 2, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

red-team | idstack | ClaudePluginHub

Back to Skills

Skill

red-team

From idstack

npx claudepluginhub savvides/idstack

Tool Access

This skill is limited to using the following tools:

BashReadWriteEditGlobGrepAskUserQuestionWebSearchAgent

Preview

Supporting Assets

SKILL.md.tmpl

SKILL.md

Preamble: Update Check

if [ -n "${CLAUDE_PLUGIN_ROOT:-}" ]; then
  _IDSTACK="$CLAUDE_PLUGIN_ROOT"
elif [ -n "${IDSTACK_HOME:-}" ]; then
  _IDSTACK="$IDSTACK_HOME"
else
  _IDSTACK="$HOME/.claude/plugins/idstack"
fi
_UPD=$("$_IDSTACK/bin/idstack-update-check" 2>/dev/null || true)
[ -n "$_UPD" ] && echo "$_UPD"

If the output contains UPDATE_AVAILABLE: tell the user "A newer version of idstack is available. Run cd ${IDSTACK_HOME:-~/.claude/plugins/idstack} && git pull && ./setup to update. (The ./setup step is required — it cleans up legacy symlinks.)" Then continue normally.

Preamble: Project Manifest

Before starting, check for an existing project manifest.

if [ -f ".idstack/project.json" ]; then
  echo "MANIFEST_EXISTS"
  "$_IDSTACK/bin/idstack-migrate" .idstack/project.json 2>/dev/null || cat .idstack/project.json
else
  echo "NO_MANIFEST"
fi

If MANIFEST_EXISTS:

Read the manifest. If the JSON is malformed, report the specific parse error to the user, offer to fix it, and STOP until it is valid. Never silently overwrite corrupt JSON.
Preserve all existing sections when writing back.

If NO_MANIFEST:

This skill will create or update the manifest during its workflow.

Preamble: Preferences

if [ -f ".idstack/project.json" ] && command -v python3 &>/dev/null; then
  python3 -c "
import json, sys
try:
    data = json.load(open('.idstack/project.json'))
    prefs = data.get('preferences', {})
    v = prefs.get('verbosity', 'normal')
    if v != 'normal':
        print(f'VERBOSITY:{v}')
except: pass
" 2>/dev/null || true
fi

If VERBOSITY:concise: Keep explanations brief. Skip evidence citations inline (still follow evidence-based recommendations, just don't cite tier codes in output). If VERBOSITY:detailed: Include full evidence citations, alternative approaches considered, and rationale for each recommendation. If VERBOSITY:normal or not shown: Default behavior — cite evidence tiers inline, explain key decisions, skip exhaustive alternatives.

Preamble: Designer Profile

_PROFILE="$HOME/.idstack/profile.yaml"
if [ -f "$_PROFILE" ]; then
  # Simple YAML parsing for experience_level (no dependency needed)
  _EXP=$(grep -E '^experience_level:' "$_PROFILE" 2>/dev/null | sed 's/experience_level:[[:space:]]*//' | tr -d '"' | tr -d "'")
  [ -n "$_EXP" ] && echo "EXPERIENCE:$_EXP"
else
  echo "NO_PROFILE"
fi

If EXPERIENCE:novice: Provide more context for recommendations. Explain WHY each step matters, not just what to do. Define jargon on first use. Offer examples. If EXPERIENCE:intermediate: Standard explanations. Assume familiarity with instructional design concepts but explain idstack-specific patterns. If EXPERIENCE:expert: Be concise. Skip basic explanations. Focus on evidence tiers, edge cases, and advanced considerations. Trust the user's domain knowledge. If NO_PROFILE: On first run, after the main workflow is underway (not before), mention: "Tip: create ~/.idstack/profile.yaml with experience_level: novice|intermediate|expert to adjust how much detail idstack provides."

Preamble: Context Recovery

Check for session history and learnings from prior runs.

# Context recovery: timeline + learnings
_HAS_TIMELINE=0
_HAS_LEARNINGS=0
if [ -f ".idstack/timeline.jsonl" ]; then
  _HAS_TIMELINE=1
  if command -v python3 &>/dev/null; then
    python3 -c "
import json, sys
lines = open('.idstack/timeline.jsonl').readlines()[-200:]
events = []
for line in lines:
    try: events.append(json.loads(line))
    except: pass
if not events:
    sys.exit(0)

# Quality score trend
scores = [e for e in events if e.get('skill') == 'course-quality-review' and 'score' in e]
if scores:
    trend = ' -> '.join(str(s['score']) for s in scores[-5:])
    print(f'QUALITY_TREND: {trend}')
    last = scores[-1]
    dims = last.get('dimensions', {})
    if dims:
        tp = dims.get('teaching_presence', '?')
        sp = dims.get('social_presence', '?')
        cp = dims.get('cognitive_presence', '?')
        print(f'LAST_PRESENCE: T={tp} S={sp} C={cp}')

# Skills completed
completed = set()
for e in events:
    if e.get('event') == 'completed':
        completed.add(e.get('skill', ''))
print(f'SKILLS_COMPLETED: {','.join(sorted(completed))}')

# Last skill run
last_completed = [e for e in events if e.get('event') == 'completed']
if last_completed:
    last = last_completed[-1]
    print(f'LAST_SKILL: {last.get(\"skill\",\"?\")} at {last.get(\"ts\",\"?\")}')

# Pipeline progression
pipeline = [
    ('needs-analysis', 'learning-objectives'),
    ('learning-objectives', 'assessment-design'),
    ('assessment-design', 'course-builder'),
    ('course-builder', 'course-quality-review'),
    ('course-quality-review', 'accessibility-review'),
    ('accessibility-review', 'red-team'),
    ('red-team', 'course-export'),
]
for prev, nxt in pipeline:
    if prev in completed and nxt not in completed:
        print(f'SUGGESTED_NEXT: {nxt}')
        break
" 2>/dev/null || true
  else
    # No python3: show last 3 skill names only
    tail -3 .idstack/timeline.jsonl 2>/dev/null | grep -o '"skill":"[^"]*"' | sed 's/"skill":"//;s/"//' | while read s; do echo "RECENT_SKILL: $s"; done
  fi
fi
if [ -f ".idstack/learnings.jsonl" ]; then
  _HAS_LEARNINGS=1
  _LEARN_COUNT=$(wc -l < .idstack/learnings.jsonl 2>/dev/null | tr -d ' ')
  echo "LEARNINGS: $_LEARN_COUNT"
  if [ "$_LEARN_COUNT" -gt 0 ] 2>/dev/null; then
    "$_IDSTACK/bin/idstack-learnings-search" --limit 3 2>/dev/null || true
  fi
fi

If QUALITY_TREND is shown: Synthesize a welcome-back message. Example: "Welcome back. Quality score trend: 62 -> 68 -> 72 over 3 reviews. Last skill: /learning-objectives." Keep it to 2-3 sentences. If any dimension in LAST_PRESENCE is consistently below 5/10, mention it as a recurring pattern with its evidence citation.

If LAST_SKILL is shown but no QUALITY_TREND: Just mention the last skill run. Example: "Welcome back. Last session you ran /course-import."

If SUGGESTED_NEXT is shown: Mention the suggested next skill naturally. Example: "Based on your progress, /assessment-design is the natural next step."

If LEARNINGS > 0: Mention relevant learnings if they apply to this skill's domain. Example: "Reminder: this Canvas instance uses custom rubric formatting (discovered during import)."

Skill-specific manifest check: If the manifest red_team_audit section already has data, ask the user: "I see you've already run this skill. Want to update the results or start fresh?"

Red Team — Adversarial Course Design Audit

This skill audits the course adversarially. It assumes the course is broken until proven otherwise.

It is NOT a quality review (/idstack:course-quality-review does that). Quality review asks "does this course meet standards?" Red team asks "prove this course actually works."

Five adversarial dimensions:

Alignment Stress Test — Do assessments actually measure what objectives claim?
Evidence Verification — Are the evidence citations accurate and current?
Cognitive Load Analysis — Will learners' working memory be overwhelmed?
Learner Persona Simulation — Would specific learner types actually succeed?
Prerequisite Chain Integrity — Are there hidden dependency gaps?

The output is a confidence score (0-100): "How confident are we this course works?"

Why a clean-context sub-agent

If the same Claude session helped build the course, it has sunk-cost bias toward its own design choices. Red team work happens in a freshly-spawned sub-agent that has no prior conversation history — only the manifest and course files, which is the same view a real student gets.

The sub-agent (the orchestrator) runs the full audit, writes a structured report to .idstack/reports/red-team.md, and returns a short executive summary. The parent (this skill) then offers to apply fixes in-context, since the parent already knows the course structure and is good at editing.

Workflow

Pre-flight: confirm scope and focus
Spawn orchestrator: clean-context sub-agent runs the audit, writes the report
Surface summary: parent shows score + severity counts + top critical finding
Triage: ask which severity bucket to address
Apply fixes: parent edits course files in-context
Update manifest: parent writes red_team_audit section from the report file

No automatic re-verification. If the user wants to confirm fixes hold, they re-run /idstack:red-team.

Step 1: Pre-flight

The preamble above already ran the manifest check. Now confirm scope.

Determine course inputs:

If MANIFEST_EXISTS: the orchestrator will read all sections (needs_analysis, learning_objectives, assessment_design, course_builder, quality_review, accessibility_review).
If NO_MANIFEST: ask the user to provide objectives, assessments, module sequence, and target audience. Capture answers as a brief block to pass to the orchestrator. Standalone mode reduces precision on Dimensions 1 (alignment) and 5 (prerequisites).

Ask one focus question via AskUserQuestion:

"Any specific angle to red-team, or a full sweep?"

Options:

Full sweep (recommended) — all 5 dimensions at equal depth
Assessment gaming — bias depth toward Dimension 1 (alignment)
Cognitive overload — bias depth toward Dimension 3 (cognitive load)
Persona accessibility — bias depth toward Dimension 4 (personas)
Evidence accuracy — bias depth toward Dimension 2 (evidence)

Save the user's choice as FOCUS for the orchestrator brief.

Step 2: Spawn the red-team orchestrator

Use the Agent tool with subagent_type=general-purpose. The prompt is the full contents of the <orchestrator-brief> block below, with these substitutions performed before invoking:

{{FOCUS}} → the user's choice from Step 1 (or Full sweep by default)
{{MANIFEST_INFO}} → either Manifest at .idstack/project.json — read it directly. or, in standalone mode, the captured course information from Step 1
{{COURSE_FILES_HINT}} → if the manifest has course_builder.output_path, set this to that path; otherwise Look under ./course/ or ./modules/ for generated course files.

Then call Agent. Block on its return.

You are an adversarial course design auditor. You have NO context from prior sessions. You did not help build this course; you are seeing it fresh. Your job is to find every way it could fail learners — not to validate the design.

This is a stress test, not a quality review. Assume the course is broken until proven otherwise.

Inputs:

{{MANIFEST_INFO}}
Course files: {{COURSE_FILES_HINT}}
Focus area: {{FOCUS}}

Manifest integrity: if the manifest JSON is malformed, stop and return an error message naming the parse error. Never silently overwrite.

Evidence Tiers

Every challenge cites its evidence tier:

[T1] RCTs, meta-analyses with learning outcome measures
[T2] Quasi-experimental with appropriate controls
[T3] Systematic reviews (synthesis of mixed evidence)
[T4] Observational / pre-post without comparison groups
[T5] Expert opinion, literature reviews, theoretical frameworks

When multiple tiers apply, cite the strongest.

Focus handling

If {{FOCUS}} is Full sweep, audit all 5 dimensions at equal depth. Otherwise, audit the named dimension at full depth and cover the others at reduced depth (3-5 findings each, no exhaustive matrices).

Dispatch

If you have access to the Agent tool, dispatch the 5 dimensions in parallel as nested sub-agents using the briefs in "Dimension Briefs" below. Wait for all 5 to return, then deduplicate findings.

If you do NOT have Agent access, run the dimensions sequentially using the same briefs.

Dimension Briefs

Dimension 1 — Alignment Stress Test

For every learning objective and assessment pair, challenge the alignment.

Objective → Assessment match:

Does the assessment actually measure the stated objective, or does it test something adjacent?
If the objective says "analyze" (Bloom's level 4), does the assessment require analysis or just recall (level 1)? Flag Bloom's level mismatches. [Alignment-14] [T1] — retrieval practice and Bloom's levels interact. [Alignment-7] [T3] — measurable verbs alone cannot guarantee correct Bloom's classification. [Alignment-12] [T2] — internal assumptions of revised Bloom's taxonomy require probing.
Flag any objective with no matching assessment (untested objective). [Alignment-2] [T5] — constructive alignment requires every objective to be assessed.
Flag any assessment with no matching objective (orphaned assessment). [Alignment-2] [T5] — assessments without aligned objectives violate constructive alignment.

Activity → Objective match:

Does the course include activities that prepare learners for each assessment?
Flag objectives where the assessment tests something learners never practiced. [Alignment-1] [T5] — constructive alignment requires objective-activity-assessment coherence. [Alignment-16] [T4] — students perceive misalignment between activities and assessments as unfair.

Dimension 2 — Evidence Verification

Check every evidence citation in the manifest or course design for accuracy.

Tier verification:

Is each citation assigned the correct evidence tier? [Evaluation-1] [T3] — evaluation rigor requires method-matched evidence claims.
Flag any citation where the tier seems too high for the study type. [Evaluation-2] [T5] — program evaluation models define study-type-to-evidence-level mappings.
Flag T4/T5 citations used to support high-stakes design decisions. [Evaluation-5] [T5] — overreliance on low-tier evidence undermines validity.

Currency check (if WebSearch available):

For each T1/T2 citation, search for newer meta-analyses or RCTs that might update or contradict the finding. [Assessment-18] [T3] — systematic reviews of meta-analyses reveal how evidence evolves.
Only flag contradictions from clearly relevant papers. Ignore tangential matches.
Check for retractions of cited papers.
If WebSearch is unavailable, set mode: limited in the report and note: "currency verification requires internet."

Dimension 3 — Cognitive Load Analysis

Estimate cognitive load per module using proxy measures.

Limitation: the manifest contains structure, not the actual content learners see. These are proxies based on structural indicators. Note this limitation in the report.

Proxy indicators:

Number of new concepts introduced per module (flag if >7). [CogLoad-4] [T5] — intrinsic load increases with element interactivity. [CogLoad-5] [T5] — working memory capacity limits are real design constraints. [CogLoad-6] [T1] — working memory resource depletion compounds across tasks.
Number of prerequisite concepts required (flag if prerequisites span >3 prior modules). [CogLoad-1] [T1] — problem-solving support interacts with sequence to affect load.
Assessment complexity relative to objective Bloom's level. [CogLoad-16] [T1] — format affects cognitive load independently of content.
Module sequencing: are related concepts spaced or massed? [CogLoad-17] [T1] — sequencing significantly affects learning outcomes. [CogLoad-13] [T3] — five strategies for optimizing instructional materials.

Expertise reversal check:

Are scaffolds present that would hurt expert learners? [CogLoad-19] [T5] — expertise reversal effect. [CogLoad-11] [T3] — digital/online learning amplifies cognitive load concerns.
Are there adaptive elements that adjust based on learner expertise? [Learner-16] [T1] — differentiated instruction produces measurable gains. [Learner-18] [T5] — personalized adaptive learning framework.

Dimension 4 — Learner Persona Simulation

Simulate 4 learner personas walking through the course.

Limitation: simulation operates on structural/metadata signals, not actual content text. Content-level analysis (e.g., detecting idioms that challenge ESL learners) requires the actual course materials. Note this in the report.

Persona A — Complete Novice (no prior knowledge in domain)

Can they access the content without assumed background? [Learner-14] [T5] — personalized education must account for starting knowledge.
Do early modules build sufficient foundation for later ones? [CogLoad-1] [T1] — instructional sequence and problem-solving support interact for novices.
Is the pacing appropriate for someone learning everything for the first time? [Learner-6] [T1] — differentiated pacing produces measurable gains.

Persona B — Expert Learner (expertise reversal risk)

Are there unnecessary scaffolds that would frustrate an expert? [CogLoad-19] [T5] — expertise reversal effect.
Can experts skip introductory content or are they forced through it? [Learner-16] [T1] — effective differentiation allows bypassing known material.
Does the course adapt to prior knowledge or treat everyone as novice? [Learner-11] [T2] — data-based differentiation responds to individual learner state.

Persona C — ESL Learner (language complexity, cultural references)

Are key terms defined when first introduced? [Access-4] [T3] — universal instructional design includes clear vocabulary introduction.
Are instructions clear without idiomatic expressions? [CogLoad-11] [T3] — extraneous load from language complexity compounds online.
Are cultural references universal or region-specific? [Learner-13] [T4] — diverse student needs require culturally responsive design.
Is reading level appropriate? (Flag if above Flesch-Kincaid grade 10 for ESL audiences.) [Learner-2] [T2] — differentiated instruction varies with language proficiency.

Persona D — Learner with Accessibility Needs

Do assessments offer alternative formats (extended time, alternative submission)? [Access-3] [T5] — UDL 3.0 requires multiple means of action and expression. [Access-6] [T2] — universal design for instruction supports flexible assessment.
Are multimedia elements accessible (captions, transcripts, alt text)? [Access-1] [T5] — WCAG 2.1 requires text alternatives for non-text content. [Access-5] [T3] — universal design includes multimedia accessibility.
Can the course be navigated with keyboard only? [Access-1] [T5] — WCAG 2.1 keyboard accessibility. [Access-2] [T5] — WCAG 2.2 extends keyboard navigation standards.

Per-persona checklist (evaluate for every module):

Can this persona access the content? [Access-4] [T3]
Does this persona have the prerequisite knowledge? [CogLoad-1] [T1]
Is the cognitive load appropriate for this persona's expertise level? [CogLoad-19] [T5]
Does the assessment format work for this persona? [Assessment-8] [T1]
Is the feedback actionable for this persona? [Assessment-9] [T5]

Dimension 5 — Prerequisite Chain Integrity

Trace prerequisite dependencies across all modules.

Check for:

Circular dependencies (A requires B requires A). [CogLoad-17] [T1] — circular paths make valid sequencing impossible.
Missing prerequisites (module assumes knowledge not taught earlier). [CogLoad-1] [T1] — problem-solving support must match prerequisite state. [Alignment-10] [T2] — high challenge without high support undermines learning.
Orphaned content (nothing depends on it, no prerequisites). [Alignment-1] [T5] — every component must serve the objective chain.
Ordering violations (prerequisite module appears after the module that needs it). [CogLoad-17] [T1] — sequencing violations create impossible learning paths. [CogLoad-4] [T5] — intrinsic load becomes unmanageable when prereqs are unavailable.

Confidence Score

After all dimensions return, compute the confidence score:

Start at 100
Deduct per finding:
- Critical = -15
- Warning = -5
- Info = -1
Floor at 0

Severity weights reflect that structural misalignment and cognitive overload are the strongest predictors of learner failure: [Alignment-14] [T1], [CogLoad-6] [T1].

Contextualize:

80+ "High confidence" — minor issues only
60-79 "Moderate, needs work" — several design gaps
40-59 "Low confidence, significant gaps" — multiple problem dimensions
<40 "Course needs redesign" — structural issues across most dimensions

Output

The Markdown report follows the canonical structure documented in templates/report-format.md (observation → evidence → why-it-matters → suggestion, with severity and evidence tier on every finding). The structure below is the red-team-specific shape; treat the canonical format as the contract for tone and per-finding fields.

Before writing the report, ensure the directory exists:

mkdir -p .idstack/reports

Then write .idstack/reports/red-team.md with this structure (Markdown):

# Red Team Audit Report

**Date:** <ISO-8601 timestamp>
**Confidence Score:** <0-100>
**Focus:** <{{FOCUS}}>
**Mode:** <full | limited>

## Severity Counts
- Critical: <N>
- Warning: <N>
- Info: <N>

## Critical Findings
For each: dimension, description, affected module/objective/assessment, evidence citation, suggested fix direction.

## Warning Findings
Same structure.

## Info Findings
Same structure.

## Per-Dimension Summary
- Alignment: <pass | warning | critical> — 1-line summary
- Evidence: <pass | warning | critical> — 1-line summary
- Cognitive Load: <pass | warning | critical> — 1-line summary
- Personas: <pass | warning | critical> — 1-line summary
- Prerequisites: <pass | warning | critical> — 1-line summary

## Top 3 Actions
The three changes that would most improve the score.

## Limitations
What this audit could not assess (content-level analysis, actual learner behavior, LMS-specific implementation, etc.).

Each finding must have a stable id of the form <dimension>-<n> (e.g., alignment-1, cogload-3) so the parent can reference findings when applying fixes.

Return value

After writing the report, return ONLY a short executive summary (≤200 words) to the parent:

Confidence score and band ("Moderate, needs work")
Severity counts
Top 1 critical finding (one line)
Path: .idstack/reports/red-team.md

Do NOT return the full report inline. The parent will read the file.

Step 3: Surface the summary

After the orchestrator returns:

Read .idstack/reports/red-team.md (full report).
Show the user the executive summary in your own words: confidence score, severity counts, top critical finding, and the report path.
Mention: "Full report at .idstack/reports/red-team.md — open it for the complete finding list."

Step 4: Triage — choose fix scope

Ask one AskUserQuestion:

"Which findings would you like to address?"

Options:

Critical only (recommended) — highest-impact fixes, smallest scope
Critical + Warning — broader cleanup
All findings — including Info; can be a lot
Skip — review report manually — no fixes now; user will read the file themselves

If the user chooses Skip, jump straight to Step 6.

Step 5: Apply fixes in-context

For each finding in the chosen severity bucket, in order of severity:

Read the affected course file(s) (module content, assessment definition, manifest section).
Propose the fix in 1-2 sentences. State which file and which finding id you're addressing.
Apply via Edit.
Track the finding id in fixes_applied. If you decide a finding is not actionable in-context (e.g., requires re-running /idstack:assessment-design), record it in fixes_deferred with a one-line reason.

Do not spawn additional sub-agents for fixes. The parent has the relevant context to edit course files directly.

If the user pushes back on any specific fix, mark it deferred and continue.

Step 6: Update manifest

Save results to .idstack/project.json via bin/idstack-manifest-merge, which replaces only the red_team_audit section, preserves every other section verbatim, validates JSON, and atomically updates the top-level updated timestamp. Pull the score and findings from .idstack/reports/red-team.md (the report is the source of truth — do not re-derive from the orchestrator's return summary, which is lossy).

"$_IDSTACK/bin/idstack-manifest-merge" --section red_team_audit --payload - <<'PAYLOAD'
{
  "updated": "<ISO-8601 timestamp>",
  "confidence_score": 0,
  "focus": "Full sweep",
  "report_path": ".idstack/reports/red-team.md",
  "findings_summary": {"critical": 0, "warning": 0, "info": 0},
  "dimensions": {
    "alignment":      {"score": "pass|warning|critical", "findings": []},
    "evidence":       {"score": "pass|warning|critical", "mode": "full|limited", "findings": []},
    "cognitive_load": {"score": "pass|warning|critical", "findings": []},
    "personas":       {"score": "pass|warning|critical", "findings": []},
    "prerequisites":  {"score": "pass|warning|critical", "findings": []}
  },
  "top_actions": [],
  "limitations": [],
  "fixes_applied": [],
  "fixes_deferred": []
}
PAYLOAD

Each finding object: {"id": "alignment-1", "description": "...", "module": "Module 3", "severity": "critical|warning|info"}.

fixes_applied[] — each item: {"id": "alignment-1", "description": "Optional one-line summary of the change applied"}.

fixes_deferred[] — each item: {"id": "alignment-3", "reason": "One-line reason — e.g., requires re-running /idstack:assessment-design"}.

The merge tool exits non-zero (and prints a diagnostic on stderr) if the payload is malformed, the manifest is corrupt, or the section name is misspelled — never silently overwriting. If .idstack/project.json doesn't exist yet, run bin/idstack-migrate .idstack/project.json first (it creates a fresh canonical manifest).

Fallback (if bin/idstack-manifest-merge is unavailable): Read the full manifest, modify only the red_team_audit section, Write back. Preserve all other sections verbatim. The canonical schema for reference is in templates/manifest-schema.md.

Step 7: Final summary to user

Two sentences:

"Confidence: X/100. Applied N fixes (M deferred). Report at .idstack/reports/red-team.md."
If confidence is <60 after fixes, recommend re-running /idstack:learning-objectives or /idstack:assessment-design. If 60+, recommend /idstack:course-export.

If the user wants to verify fixes hold, they can re-run /idstack:red-team — that's deliberately manual to avoid token costs of automatic re-verification.

Feedback

Have feedback or a feature request? Share it here — no GitHub account needed.

Completion: Timeline Logging

After the skill workflow completes successfully, log the session to the timeline:

"$_IDSTACK/bin/idstack-timeline-log" '{"skill":"red-team","event":"completed"}'

Include skill-specific fields where available (confidence_score, focus, fixes_applied count). Log synchronously (no background &).

If you discover a non-obvious project-specific quirk during this session (LMS behavior, import format issue, course structure pattern), also log it as a learning:

"$_IDSTACK/bin/idstack-learnings-log" '{"skill":"red-team","type":"operational","key":"SHORT_KEY","insight":"DESCRIPTION","confidence":8,"source":"observed"}'

Similar Skills

ui-ux-pro-max

72.7k

ui-ux-pro-max

context7-mcp

51.8k

Fetches up-to-date documentation from Context7 for libraries and frameworks like React, Next.js, Prisma. Use for setup questions, API references, and code examples.

context7-plugin

applying-brand-guidelines

41.6k

Applies Acme Corporation brand guidelines including colors, fonts, layouts, and messaging to generated PowerPoint, Excel, and PDF documents.

3 files

anthropics-claude-cookbooks

Stats

Stars15

Forks5

Last CommitMay 2, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Preamble: Update Check

if [ -n "${CLAUDE_PLUGIN_ROOT:-}" ]; then
  _IDSTACK="$CLAUDE_PLUGIN_ROOT"
elif [ -n "${IDSTACK_HOME:-}" ]; then
  _IDSTACK="$IDSTACK_HOME"
else
  _IDSTACK="$HOME/.claude/plugins/idstack"
fi
_UPD=$("$_IDSTACK/bin/idstack-update-check" 2>/dev/null || true)
[ -n "$_UPD" ] && echo "$_UPD"

Preamble: Project Manifest

Before starting, check for an existing project manifest.

if [ -f ".idstack/project.json" ]; then
  echo "MANIFEST_EXISTS"
  "$_IDSTACK/bin/idstack-migrate" .idstack/project.json 2>/dev/null || cat .idstack/project.json
else
  echo "NO_MANIFEST"
fi

If MANIFEST_EXISTS:

Read the manifest. If the JSON is malformed, report the specific parse error to the user, offer to fix it, and STOP until it is valid. Never silently overwrite corrupt JSON.
Preserve all existing sections when writing back.

If NO_MANIFEST:

This skill will create or update the manifest during its workflow.

Preamble: Preferences

if [ -f ".idstack/project.json" ] && command -v python3 &>/dev/null; then
  python3 -c "
import json, sys
try:
    data = json.load(open('.idstack/project.json'))
    prefs = data.get('preferences', {})
    v = prefs.get('verbosity', 'normal')
    if v != 'normal':
        print(f'VERBOSITY:{v}')
except: pass
" 2>/dev/null || true
fi

Preamble: Designer Profile

_PROFILE="$HOME/.idstack/profile.yaml"
if [ -f "$_PROFILE" ]; then
  # Simple YAML parsing for experience_level (no dependency needed)
  _EXP=$(grep -E '^experience_level:' "$_PROFILE" 2>/dev/null | sed 's/experience_level:[[:space:]]*//' | tr -d '"' | tr -d "'")
  [ -n "$_EXP" ] && echo "EXPERIENCE:$_EXP"
else
  echo "NO_PROFILE"
fi

Preamble: Context Recovery

Check for session history and learnings from prior runs.

# Context recovery: timeline + learnings
_HAS_TIMELINE=0
_HAS_LEARNINGS=0
if [ -f ".idstack/timeline.jsonl" ]; then
  _HAS_TIMELINE=1
  if command -v python3 &>/dev/null; then
    python3 -c "
import json, sys
lines = open('.idstack/timeline.jsonl').readlines()[-200:]
events = []
for line in lines:
    try: events.append(json.loads(line))
    except: pass
if not events:
    sys.exit(0)

# Quality score trend
scores = [e for e in events if e.get('skill') == 'course-quality-review' and 'score' in e]
if scores:
    trend = ' -> '.join(str(s['score']) for s in scores[-5:])
    print(f'QUALITY_TREND: {trend}')
    last = scores[-1]
    dims = last.get('dimensions', {})
    if dims:
        tp = dims.get('teaching_presence', '?')
        sp = dims.get('social_presence', '?')
        cp = dims.get('cognitive_presence', '?')
        print(f'LAST_PRESENCE: T={tp} S={sp} C={cp}')

# Skills completed
completed = set()
for e in events:
    if e.get('event') == 'completed':
        completed.add(e.get('skill', ''))
print(f'SKILLS_COMPLETED: {','.join(sorted(completed))}')

# Last skill run
last_completed = [e for e in events if e.get('event') == 'completed']
if last_completed:
    last = last_completed[-1]
    print(f'LAST_SKILL: {last.get(\"skill\",\"?\")} at {last.get(\"ts\",\"?\")}')

# Pipeline progression
pipeline = [
    ('needs-analysis', 'learning-objectives'),
    ('learning-objectives', 'assessment-design'),
    ('assessment-design', 'course-builder'),
    ('course-builder', 'course-quality-review'),
    ('course-quality-review', 'accessibility-review'),
    ('accessibility-review', 'red-team'),
    ('red-team', 'course-export'),
]
for prev, nxt in pipeline:
    if prev in completed and nxt not in completed:
        print(f'SUGGESTED_NEXT: {nxt}')
        break
" 2>/dev/null || true
  else
    # No python3: show last 3 skill names only
    tail -3 .idstack/timeline.jsonl 2>/dev/null | grep -o '"skill":"[^"]*"' | sed 's/"skill":"//;s/"//' | while read s; do echo "RECENT_SKILL: $s"; done
  fi
fi
if [ -f ".idstack/learnings.jsonl" ]; then
  _HAS_LEARNINGS=1
  _LEARN_COUNT=$(wc -l < .idstack/learnings.jsonl 2>/dev/null | tr -d ' ')
  echo "LEARNINGS: $_LEARN_COUNT"
  if [ "$_LEARN_COUNT" -gt 0 ] 2>/dev/null; then
    "$_IDSTACK/bin/idstack-learnings-search" --limit 3 2>/dev/null || true
  fi
fi

If LAST_SKILL is shown but no QUALITY_TREND: Just mention the last skill run. Example: "Welcome back. Last session you ran /course-import."

If SUGGESTED_NEXT is shown: Mention the suggested next skill naturally. Example: "Based on your progress, /assessment-design is the natural next step."

If LEARNINGS > 0: Mention relevant learnings if they apply to this skill's domain. Example: "Reminder: this Canvas instance uses custom rubric formatting (discovered during import)."

Skill-specific manifest check: If the manifest red_team_audit section already has data, ask the user: "I see you've already run this skill. Want to update the results or start fresh?"

Red Team — Adversarial Course Design Audit

This skill audits the course adversarially. It assumes the course is broken until proven otherwise.

It is NOT a quality review (/idstack:course-quality-review does that). Quality review asks "does this course meet standards?" Red team asks "prove this course actually works."

Five adversarial dimensions:

Alignment Stress Test — Do assessments actually measure what objectives claim?
Evidence Verification — Are the evidence citations accurate and current?
Cognitive Load Analysis — Will learners' working memory be overwhelmed?
Learner Persona Simulation — Would specific learner types actually succeed?
Prerequisite Chain Integrity — Are there hidden dependency gaps?

The output is a confidence score (0-100): "How confident are we this course works?"

Why a clean-context sub-agent

Workflow

Pre-flight: confirm scope and focus
Spawn orchestrator: clean-context sub-agent runs the audit, writes the report
Surface summary: parent shows score + severity counts + top critical finding
Triage: ask which severity bucket to address
Apply fixes: parent edits course files in-context
Update manifest: parent writes red_team_audit section from the report file

No automatic re-verification. If the user wants to confirm fixes hold, they re-run /idstack:red-team.

Step 1: Pre-flight

The preamble above already ran the manifest check. Now confirm scope.

Determine course inputs:

If MANIFEST_EXISTS: the orchestrator will read all sections (needs_analysis, learning_objectives, assessment_design, course_builder, quality_review, accessibility_review).
If NO_MANIFEST: ask the user to provide objectives, assessments, module sequence, and target audience. Capture answers as a brief block to pass to the orchestrator. Standalone mode reduces precision on Dimensions 1 (alignment) and 5 (prerequisites).

Ask one focus question via AskUserQuestion:

"Any specific angle to red-team, or a full sweep?"

Options:

Full sweep (recommended) — all 5 dimensions at equal depth
Assessment gaming — bias depth toward Dimension 1 (alignment)
Cognitive overload — bias depth toward Dimension 3 (cognitive load)
Persona accessibility — bias depth toward Dimension 4 (personas)
Evidence accuracy — bias depth toward Dimension 2 (evidence)

Save the user's choice as FOCUS for the orchestrator brief.

Step 2: Spawn the red-team orchestrator

Use the Agent tool with subagent_type=general-purpose. The prompt is the full contents of the <orchestrator-brief> block below, with these substitutions performed before invoking:

{{FOCUS}} → the user's choice from Step 1 (or Full sweep by default)
{{MANIFEST_INFO}} → either Manifest at .idstack/project.json — read it directly. or, in standalone mode, the captured course information from Step 1
{{COURSE_FILES_HINT}} → if the manifest has course_builder.output_path, set this to that path; otherwise Look under ./course/ or ./modules/ for generated course files.

Then call Agent. Block on its return.

This is a stress test, not a quality review. Assume the course is broken until proven otherwise.

Inputs:

{{MANIFEST_INFO}}
Course files: {{COURSE_FILES_HINT}}
Focus area: {{FOCUS}}

Manifest integrity: if the manifest JSON is malformed, stop and return an error message naming the parse error. Never silently overwrite.

Evidence Tiers

Every challenge cites its evidence tier:

[T1] RCTs, meta-analyses with learning outcome measures
[T2] Quasi-experimental with appropriate controls
[T3] Systematic reviews (synthesis of mixed evidence)
[T4] Observational / pre-post without comparison groups
[T5] Expert opinion, literature reviews, theoretical frameworks

When multiple tiers apply, cite the strongest.

Focus handling

Dispatch

If you have access to the Agent tool, dispatch the 5 dimensions in parallel as nested sub-agents using the briefs in "Dimension Briefs" below. Wait for all 5 to return, then deduplicate findings.

If you do NOT have Agent access, run the dimensions sequentially using the same briefs.

Dimension Briefs

Dimension 1 — Alignment Stress Test

For every learning objective and assessment pair, challenge the alignment.

Objective → Assessment match:

Does the assessment actually measure the stated objective, or does it test something adjacent?
If the objective says "analyze" (Bloom's level 4), does the assessment require analysis or just recall (level 1)? Flag Bloom's level mismatches. [Alignment-14] [T1] — retrieval practice and Bloom's levels interact. [Alignment-7] [T3] — measurable verbs alone cannot guarantee correct Bloom's classification. [Alignment-12] [T2] — internal assumptions of revised Bloom's taxonomy require probing.
Flag any objective with no matching assessment (untested objective). [Alignment-2] [T5] — constructive alignment requires every objective to be assessed.
Flag any assessment with no matching objective (orphaned assessment). [Alignment-2] [T5] — assessments without aligned objectives violate constructive alignment.

Activity → Objective match:

Does the course include activities that prepare learners for each assessment?
Flag objectives where the assessment tests something learners never practiced. [Alignment-1] [T5] — constructive alignment requires objective-activity-assessment coherence. [Alignment-16] [T4] — students perceive misalignment between activities and assessments as unfair.

Dimension 2 — Evidence Verification

Check every evidence citation in the manifest or course design for accuracy.

Tier verification:

Is each citation assigned the correct evidence tier? [Evaluation-1] [T3] — evaluation rigor requires method-matched evidence claims.
Flag any citation where the tier seems too high for the study type. [Evaluation-2] [T5] — program evaluation models define study-type-to-evidence-level mappings.
Flag T4/T5 citations used to support high-stakes design decisions. [Evaluation-5] [T5] — overreliance on low-tier evidence undermines validity.

Currency check (if WebSearch available):

For each T1/T2 citation, search for newer meta-analyses or RCTs that might update or contradict the finding. [Assessment-18] [T3] — systematic reviews of meta-analyses reveal how evidence evolves.
Only flag contradictions from clearly relevant papers. Ignore tangential matches.
Check for retractions of cited papers.
If WebSearch is unavailable, set mode: limited in the report and note: "currency verification requires internet."

Dimension 3 — Cognitive Load Analysis

Estimate cognitive load per module using proxy measures.

Limitation: the manifest contains structure, not the actual content learners see. These are proxies based on structural indicators. Note this limitation in the report.

Proxy indicators:

Number of new concepts introduced per module (flag if >7). [CogLoad-4] [T5] — intrinsic load increases with element interactivity. [CogLoad-5] [T5] — working memory capacity limits are real design constraints. [CogLoad-6] [T1] — working memory resource depletion compounds across tasks.
Number of prerequisite concepts required (flag if prerequisites span >3 prior modules). [CogLoad-1] [T1] — problem-solving support interacts with sequence to affect load.
Assessment complexity relative to objective Bloom's level. [CogLoad-16] [T1] — format affects cognitive load independently of content.
Module sequencing: are related concepts spaced or massed? [CogLoad-17] [T1] — sequencing significantly affects learning outcomes. [CogLoad-13] [T3] — five strategies for optimizing instructional materials.

Expertise reversal check:

Are scaffolds present that would hurt expert learners? [CogLoad-19] [T5] — expertise reversal effect. [CogLoad-11] [T3] — digital/online learning amplifies cognitive load concerns.
Are there adaptive elements that adjust based on learner expertise? [Learner-16] [T1] — differentiated instruction produces measurable gains. [Learner-18] [T5] — personalized adaptive learning framework.

Dimension 4 — Learner Persona Simulation

Simulate 4 learner personas walking through the course.

Persona A — Complete Novice (no prior knowledge in domain)

Can they access the content without assumed background? [Learner-14] [T5] — personalized education must account for starting knowledge.
Do early modules build sufficient foundation for later ones? [CogLoad-1] [T1] — instructional sequence and problem-solving support interact for novices.
Is the pacing appropriate for someone learning everything for the first time? [Learner-6] [T1] — differentiated pacing produces measurable gains.

Persona B — Expert Learner (expertise reversal risk)

Are there unnecessary scaffolds that would frustrate an expert? [CogLoad-19] [T5] — expertise reversal effect.
Can experts skip introductory content or are they forced through it? [Learner-16] [T1] — effective differentiation allows bypassing known material.
Does the course adapt to prior knowledge or treat everyone as novice? [Learner-11] [T2] — data-based differentiation responds to individual learner state.

Persona C — ESL Learner (language complexity, cultural references)

Are key terms defined when first introduced? [Access-4] [T3] — universal instructional design includes clear vocabulary introduction.
Are instructions clear without idiomatic expressions? [CogLoad-11] [T3] — extraneous load from language complexity compounds online.
Are cultural references universal or region-specific? [Learner-13] [T4] — diverse student needs require culturally responsive design.
Is reading level appropriate? (Flag if above Flesch-Kincaid grade 10 for ESL audiences.) [Learner-2] [T2] — differentiated instruction varies with language proficiency.

Persona D — Learner with Accessibility Needs

Do assessments offer alternative formats (extended time, alternative submission)? [Access-3] [T5] — UDL 3.0 requires multiple means of action and expression. [Access-6] [T2] — universal design for instruction supports flexible assessment.
Are multimedia elements accessible (captions, transcripts, alt text)? [Access-1] [T5] — WCAG 2.1 requires text alternatives for non-text content. [Access-5] [T3] — universal design includes multimedia accessibility.
Can the course be navigated with keyboard only? [Access-1] [T5] — WCAG 2.1 keyboard accessibility. [Access-2] [T5] — WCAG 2.2 extends keyboard navigation standards.

Per-persona checklist (evaluate for every module):

Can this persona access the content? [Access-4] [T3]
Does this persona have the prerequisite knowledge? [CogLoad-1] [T1]
Is the cognitive load appropriate for this persona's expertise level? [CogLoad-19] [T5]
Does the assessment format work for this persona? [Assessment-8] [T1]
Is the feedback actionable for this persona? [Assessment-9] [T5]

Dimension 5 — Prerequisite Chain Integrity

Trace prerequisite dependencies across all modules.

Check for:

Circular dependencies (A requires B requires A). [CogLoad-17] [T1] — circular paths make valid sequencing impossible.
Missing prerequisites (module assumes knowledge not taught earlier). [CogLoad-1] [T1] — problem-solving support must match prerequisite state. [Alignment-10] [T2] — high challenge without high support undermines learning.
Orphaned content (nothing depends on it, no prerequisites). [Alignment-1] [T5] — every component must serve the objective chain.
Ordering violations (prerequisite module appears after the module that needs it). [CogLoad-17] [T1] — sequencing violations create impossible learning paths. [CogLoad-4] [T5] — intrinsic load becomes unmanageable when prereqs are unavailable.

Confidence Score

After all dimensions return, compute the confidence score:

Start at 100
Deduct per finding:
- Critical = -15
- Warning = -5
- Info = -1
Floor at 0

Severity weights reflect that structural misalignment and cognitive overload are the strongest predictors of learner failure: [Alignment-14] [T1], [CogLoad-6] [T1].

Contextualize:

80+ "High confidence" — minor issues only
60-79 "Moderate, needs work" — several design gaps
40-59 "Low confidence, significant gaps" — multiple problem dimensions
<40 "Course needs redesign" — structural issues across most dimensions

Output

Before writing the report, ensure the directory exists:

mkdir -p .idstack/reports

Then write .idstack/reports/red-team.md with this structure (Markdown):

# Red Team Audit Report

**Date:** <ISO-8601 timestamp>
**Confidence Score:** <0-100>
**Focus:** <{{FOCUS}}>
**Mode:** <full | limited>

## Severity Counts
- Critical: <N>
- Warning: <N>
- Info: <N>

## Critical Findings
For each: dimension, description, affected module/objective/assessment, evidence citation, suggested fix direction.

## Warning Findings
Same structure.

## Info Findings
Same structure.

## Per-Dimension Summary
- Alignment: <pass | warning | critical> — 1-line summary
- Evidence: <pass | warning | critical> — 1-line summary
- Cognitive Load: <pass | warning | critical> — 1-line summary
- Personas: <pass | warning | critical> — 1-line summary
- Prerequisites: <pass | warning | critical> — 1-line summary

## Top 3 Actions
The three changes that would most improve the score.

## Limitations
What this audit could not assess (content-level analysis, actual learner behavior, LMS-specific implementation, etc.).

Each finding must have a stable id of the form <dimension>-<n> (e.g., alignment-1, cogload-3) so the parent can reference findings when applying fixes.

Return value

After writing the report, return ONLY a short executive summary (≤200 words) to the parent:

Confidence score and band ("Moderate, needs work")
Severity counts
Top 1 critical finding (one line)
Path: .idstack/reports/red-team.md

Do NOT return the full report inline. The parent will read the file.

Step 3: Surface the summary

After the orchestrator returns:

Read .idstack/reports/red-team.md (full report).
Show the user the executive summary in your own words: confidence score, severity counts, top critical finding, and the report path.
Mention: "Full report at .idstack/reports/red-team.md — open it for the complete finding list."

Step 4: Triage — choose fix scope

Ask one AskUserQuestion:

"Which findings would you like to address?"

Options:

Critical only (recommended) — highest-impact fixes, smallest scope
Critical + Warning — broader cleanup
All findings — including Info; can be a lot
Skip — review report manually — no fixes now; user will read the file themselves

If the user chooses Skip, jump straight to Step 6.

Step 5: Apply fixes in-context

For each finding in the chosen severity bucket, in order of severity:

Read the affected course file(s) (module content, assessment definition, manifest section).
Propose the fix in 1-2 sentences. State which file and which finding id you're addressing.
Apply via Edit.
Track the finding id in fixes_applied. If you decide a finding is not actionable in-context (e.g., requires re-running /idstack:assessment-design), record it in fixes_deferred with a one-line reason.

Do not spawn additional sub-agents for fixes. The parent has the relevant context to edit course files directly.

If the user pushes back on any specific fix, mark it deferred and continue.

Step 6: Update manifest

"$_IDSTACK/bin/idstack-manifest-merge" --section red_team_audit --payload - <<'PAYLOAD'
{
  "updated": "<ISO-8601 timestamp>",
  "confidence_score": 0,
  "focus": "Full sweep",
  "report_path": ".idstack/reports/red-team.md",
  "findings_summary": {"critical": 0, "warning": 0, "info": 0},
  "dimensions": {
    "alignment":      {"score": "pass|warning|critical", "findings": []},
    "evidence":       {"score": "pass|warning|critical", "mode": "full|limited", "findings": []},
    "cognitive_load": {"score": "pass|warning|critical", "findings": []},
    "personas":       {"score": "pass|warning|critical", "findings": []},
    "prerequisites":  {"score": "pass|warning|critical", "findings": []}
  },
  "top_actions": [],
  "limitations": [],
  "fixes_applied": [],
  "fixes_deferred": []
}
PAYLOAD

Each finding object: {"id": "alignment-1", "description": "...", "module": "Module 3", "severity": "critical|warning|info"}.

fixes_applied[] — each item: {"id": "alignment-1", "description": "Optional one-line summary of the change applied"}.

fixes_deferred[] — each item: {"id": "alignment-3", "reason": "One-line reason — e.g., requires re-running /idstack:assessment-design"}.

Step 7: Final summary to user

Two sentences:

"Confidence: X/100. Applied N fixes (M deferred). Report at .idstack/reports/red-team.md."
If confidence is <60 after fixes, recommend re-running /idstack:learning-objectives or /idstack:assessment-design. If 60+, recommend /idstack:course-export.

If the user wants to verify fixes hold, they can re-run /idstack:red-team — that's deliberately manual to avoid token costs of automatic re-verification.

Feedback

Have feedback or a feature request? Share it here — no GitHub account needed.

Completion: Timeline Logging

After the skill workflow completes successfully, log the session to the timeline:

"$_IDSTACK/bin/idstack-timeline-log" '{"skill":"red-team","event":"completed"}'

Include skill-specific fields where available (confidence_score, focus, fixes_applied count). Log synchronously (no background &).

If you discover a non-obvious project-specific quirk during this session (LMS behavior, import format issue, course structure pattern), also log it as a learning:

"$_IDSTACK/bin/idstack-learnings-log" '{"skill":"red-team","type":"operational","key":"SHORT_KEY","insight":"DESCRIPTION","confidence":8,"source":"observed"}'