Skill

map-reduce

Parallelized workload processing with structured chunking, mapper agents, and reducer synthesis. Use for codebase-wide analysis, bulk transformations, and large file audits (20+ files).

Install

npx claudepluginhub wgordon17/personal-claude-marketplace --plugin code-quality

Tool Access

This skill is limited to using the following tools:

ReadWriteEditGlobGrepBashAgentAskUserQuestionCronCreateCronDeleteSendMessageTaskCreateTaskUpdateTaskListTaskGet

Preview

Splits large workloads into parallelizable chunks, assigns independent mapper agents to each

Supporting Assets

references/agent-prompts.mdreferences/communication-schema.mdreferences/fidelity-guide.md

SKILL.md

Similar Skills

skill-lookup

Searches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.

prompts.chat

159.9k

prompt-lookup

Searches prompts.chat for AI prompt templates by keyword or category, retrieves by ID with variable handling, and improves prompts via AI. Use for discovering or enhancing prompts.

prompts.chat

159.9k

cache-components

Guides Next.js Cache Components and Partial Prerendering (PPR) with cacheComponents enabled. Implements 'use cache', cacheLife(), cacheTag(), revalidateTag(), static/dynamic optimization, and cache debugging.

cache-components

139.1k

Stats

Parent Repo Stars0

Parent Repo Forks0

Last CommitApr 19, 2026

Actions

View Source View Plugin View on GitHub View README

/map-reduce — Parallelized Workload Processing

Splits large workloads into parallelizable chunks, assigns independent mapper agents to each chunk, then synthesizes results through a single reducer agent with cross-chunk validation. Use for codebase-wide analysis, bulk transformations, and large file audits (20+ files).

Quick Start

/map-reduce "find all files with TODO comments"          # Analysis workload
/map-reduce "apply deprecation fix across all files"     # Implementation workload
/map-reduce --split-strategy by-directory "audit auth/"  # Explicit split strategy

Architecture

LEAD (you)
├── Phase 0: Plan & Split
│   ├── Analyze workload
│   ├── Build cross-reference manifest
│   ├── Split into N chunks (max 8, module-aware)
│   └── Write ChunkAssignments
│
├── Phase 1: Map (parallel)
│   ├── Spawn N mapper agents simultaneously
│   ├── Each mapper processes its chunk independently
│   ├── CronCreate watchdog monitors progress
│   └── Collect ChunkResults
│
├── Phase 2: Reduce
│   ├── Spawn single reducer agent (opus)
│   ├── Cross-chunk validation (4-step protocol)
│   └── Produce ReductionResult
│
└── Phase 3: Deliver
    ├── Fidelity report (>20% invalidation threshold)
    ├── Present or apply results
    └── Write map-reduce-report.md

Workflow Phases

Phase 0: Plan & Split

Analyze the workload — determine the split strategy:
- by-file: each mapper gets an explicit list of files (best for heterogeneous workloads)
- by-directory: each mapper gets a subtree (best for large, tree-structured codebases)
- by-item: each mapper gets items from a list (best for non-file workloads: APIs, symbols, records)
- custom: user-provided split logic (ask via AskUserQuestion if needed)
Module-aware splitting: when splitting by-directory, respect module boundaries — keep tightly coupled files in the same chunk. Use import/directory structure to determine coupling. Never split a single module (e.g., auth/) across chunks. If a module is too large for one chunk, treat it as its own chunk even if that makes the chunk larger than average.
Cross-reference manifest: before splitting, build a lightweight manifest of exported symbols per file — function names, class names, and file paths for all files NOT in each chunk. Include this manifest in every ChunkAssignment so mappers can distinguish between "unused in my chunk" vs "might be used elsewhere." See references/fidelity-guide.md.
Cap at 8 mappers: if the workload naturally splits into more, merge the smallest chunks. If the user wants more than 8, use AskUserQuestion to confirm — document in fidelity-guide.md that uncapped splitting is a known fidelity risk.
Create audit trail: Generate a run-ID using the convention in code-quality/references/project-memory-reference.md (Run-ID Naming Convention section). Create {memory_dir}/map-reduce/{run-id}/ and {memory_dir}/map-reduce/{run-id}/chunks/ subdirectory for ChunkResult files.
Create tasks upfront: use TaskCreate with addBlockedBy for the full task graph (one task per chunk + one for reduction + one for delivery) so progress is visible from the start.

Phase 1: Map (parallel)

Spawn N mapper agents in parallel — all at once, not sequentially. Use general-purpose type with sonnet model. Each mapper receives a ChunkAssignment (see references/communication-schema.md).
Mappers are fully isolated — they do NOT communicate with each other. Each processes only the files/items in its chunk.
Boundary-aware findings: mappers MUST classify every finding with a confidence field:
- verified: file-internal issues (syntax, style, complexity, security) — self-contained, no cross-chunk risk
- chunk-local: findings that depend on cross-chunk context — specifically:
  - "Unused code" where the symbol is exported or public
  - "Missing dependency" where the import is from a path outside the chunk
CronCreate watchdog: after spawning all mappers, create a CronCreate job (60-second interval) that checks TaskList for in_progress tasks with no recent updates. If any mapper has been idle for 2+ consecutive checks, the watchdog pings the lead to investigate. The watchdog reports only — it never intervenes directly. CronDelete the watchdog when Phase 1 completes (also delete in error/escalation paths to avoid orphaned crons).
Failure handling: if a mapper fails or times out, retry once with a fresh agent using the same ChunkAssignment. If the retry also fails, mark the chunk as failed in the audit trail, record it in failed_chunks, and continue to Phase 2 with the remaining results.
Each mapper writes its ChunkResult to {run_dir}/chunks/chunk-{id}.json.

Phase 2: Reduce

Spawn a single reducer agent — general-purpose type, opus model (judgment-heavy synthesis). The reducer receives a ReductionInput pointing to all ChunkResult files.
Cross-chunk validation (mandatory 4-step protocol):

Step 1 — Unused code cross-check: for every chunk-local "unused code" finding, search other chunks' results for references to that symbol. If found in another chunk → invalidate the finding. If not found anywhere → promote to verified.

Step 2 — Missing dependency cross-check: for every chunk-local "missing dependency" finding, check if the dependency exists in another chunk's file list. If yes → invalidate. If no → promote to verified.

Step 3 — Deduplication: for duplicate findings across chunks (same issue, different chunks found it), merge by evidence + location, keeping the most detailed description. Never silently drop a finding — if two chunks found the same issue with different evidence, keep both evidence strings in the merged finding.

Step 4 — Conflict resolution: for conflicting findings (chunk A says "unused", chunk B references it), always resolve in favor of "used" — false negatives are better than false positives for destructive actions (deletions, removals).
Reducer output: a ReductionResult written to {run_dir}/reduction-result.json. All findings in the ReductionResult have confidence: "verified" — the reducer promotes or invalidates all chunk-local findings before outputting.
For implementation workloads: reducer also checks cross-chunk consistency — are there conflicting changes proposed by different mappers to the same shared interface or file? Flag these as cross_chunk_issues in the ReductionResult.

Phase 3: Deliver

Fidelity report: check invalidated_findings count in ReductionResult. If more than 20% of total findings were invalidated during cross-chunk validation, warn the user:

"Chunk boundaries may have been poorly chosen — {N} of {total} findings ({pct}%) were invalidated by cross-chunk validation. Consider re-running with different splits or a single-agent analysis."
For analysis workloads: present the synthesized summary and needs-fix findings to the user. For needs-input findings: present each individually via AskUserQuestion (one question per finding, batch up to 4 per call). Each question includes full context: "[{id}] [{category}] {description}\n\nLocation: {file}:{line}\nDecision needed: {input_needed}\n▸dp:file={file},line={line},cat={category},skill=map-reduce" with options from the verifier's options array (if present) plus "Defer" as the last option, OR the binary [{"label": "Fix"}, {"label": "Defer"}] if options is null. multiSelect: false. Map-reduce has no Finding Verifier — the Lead applies the de-escalation test from code-quality/references/finding-classification.md inline before presenting to the user. If the finding has a single correct resolution, reclassify to needs-fix and fix it. Do NOT exit with unresolved needs-input findings. If AskUserQuestion is unavailable, treat all needs-input findings as needs_context in the final report (surface them, don't hide them). Offer to write a detailed report or create actionable TODO items.
For implementation workloads:
- Apply all needs-fix changes first, run tests, verify nothing regressed. Roll back on test failure.
- For needs-input findings: present each individually via AskUserQuestion (one question per finding, batch up to 4 per call). Each question includes full context: "[{id}] [{category}] {description}\n\nLocation: {file}:{line}\nDecision needed: {input_needed}\n▸dp:file={file},line={line},cat={category},skill=map-reduce" with options from the verifier's options array (if present) plus "Defer" as the last option, OR the binary [{"label": "Fix"}, {"label": "Defer"}] if options is null. multiSelect: false. Map-reduce has no Finding Verifier — the Lead applies the de-escalation test from code-quality/references/finding-classification.md inline before presenting to the user. If the finding has a single correct resolution, reclassify to needs-fix and apply it. Selected items are applied, then tests re-run. Do NOT apply needs-input changes without user approval. If AskUserQuestion is unavailable, treat all needs-input findings as needs_context in the final report (surface them, don't hide them).
Write final report to {run_dir}/map-reduce-report.md with:
- Summary statistics (files analyzed, total findings, deduplicated, invalidated)
- Per-classification breakdown (needs-fix / needs-input)
- Cross-chunk issues (if any)
- Fidelity warnings (if any)
- Chunk failure list (if any chunks failed)

Scope Matching

Match the tool to the task scope:

Scenario	Recommended Approach
Analysis across 20+ files	`/map-reduce`
Bulk transformation (same fix, many files)	`/map-reduce`
Large file audit (50+ files)	`/map-reduce`
Architectural analysis (circular deps, data flow)	Single agent (needs full-codebase view)
Small workload (<20 files)	Direct parallel agents (less overhead)
Tightly coupled codebase (>50% cross-imports)	Single agent (chunking is meaningless)
Full implementation task	`/swarm`
Codebase cleanup	`/unfuck`

Work Completion Principle

Never defer, skip, or reduce the scope of work to save tokens or reduce agent count. If the task requires a mapper, spawn the mapper. If a finding needs fixing, fix it. The only valid reasons to skip work are: (1) the user explicitly opted out, (2) the work is genuinely out of scope, or (3) the task shape genuinely doesn't match map-reduce (see table above).

"It would be expensive" is NEVER a valid reason to skip work or reduce mapper count.

Lead Responsibilities

Context Bundle

Every mapper and the reducer receive a context bundle (see references/communication-schema.md). The bundle includes: project name, task description, run_dir, the tool guard reminder, and the cross-reference manifest (embedded in ChunkAssignment for mappers).

Watchdog Management

Create the CronCreate watchdog after all mappers are spawned. Track its job ID. Delete it:

After all Phase 1 mappers complete (success path)
In any error or escalation path
Before returning control to the user

Never leave an orphaned cron job.

Audit Trail

hack/map-reduce/
└── {run-id}/                   # e.g. feat-auth-1711388400
    ├── chunks/
    │   ├── chunk-1.json        # ChunkResult from mapper 1
    │   ├── chunk-2.json        # ChunkResult from mapper 2
    │   └── ...
    ├── reduction-result.json   # ReductionResult from reducer
    └── map-reduce-report.md    # Final human-readable report

References

File	Content
`references/communication-schema.md`	JSON schemas for ChunkAssignment, ChunkResult, ReductionInput, ReductionResult
`references/agent-prompts.md`	Full prompt templates for mapper and reducer agents
`references/fidelity-guide.md`	Fidelity risks, mitigations, and when NOT to use map-reduce