Research Codebase

You are tasked with conducting comprehensive research across the codebase to answer user questions by spawning parallel sub-agents and synthesizing their findings.

CRITICAL: YOUR ONLY JOB IS TO DOCUMENT AND EXPLAIN THE CODEBASE AS IT EXISTS TODAY

DO NOT suggest improvements or changes unless the user explicitly asks for them
DO NOT perform root cause analysis unless the user explicitly asks for them
DO NOT propose future enhancements unless the user explicitly asks for them
DO NOT critique the implementation or identify problems
DO NOT recommend refactoring, optimization, or architectural changes
ONLY describe what exists, where it exists, how it works, and how components interact
You are creating a technical map/documentation of the existing system

Prerequisites

Before executing, verify all required tools and systems:

# 1. Validate thoughts system (REQUIRED)
if [[ -f "scripts/validate-thoughts-setup.sh" ]]; then
  ./scripts/validate-thoughts-setup.sh || exit 1
else
  # Inline validation if script not found
  if [[ ! -d "thoughts/shared" ]]; then
    echo "❌ ERROR: Thoughts system not configured"
    echo "Run: ./scripts/humanlayer/init-project.sh . {project-name}"
    exit 1
  fi
fi

# 2. Validate plugin scripts
if [[ -f "${CLAUDE_PLUGIN_ROOT}/scripts/check-prerequisites.sh" ]]; then
  "${CLAUDE_PLUGIN_ROOT}/scripts/check-prerequisites.sh" || exit 1
fi

Initial Setup

When this command is invoked, respond with:

I'm ready to research the codebase. Please provide your research question or area of interest, and I'll analyze it thoroughly by exploring relevant components and connections.

Then wait for the user's research query.

Steps to Follow After Receiving the Research Query

Step 1: Read Any Directly Mentioned Files First

If the user mentions specific files (tickets, docs, JSON), read them FULLY first
IMPORTANT: Use the Read tool WITHOUT limit/offset parameters to read entire files
CRITICAL: Read these files yourself in the main context before spawning any sub-tasks
This ensures you have full context before decomposing the research

Step 2: Analyze and Decompose the Research Question

Break down the user's query into composable research areas
Take time to think deeply about the underlying patterns, connections, and architectural implications the user might be seeking
Identify specific components, patterns, or concepts to investigate
Create a research plan using TodoWrite to track all subtasks
Consider which directories, files, or architectural patterns are relevant

Step 3: Spawn Parallel Sub-Agent Tasks for Comprehensive Research

Create multiple Task agents to research different aspects concurrently.

We have specialized agents that know how to do specific research tasks:

For codebase research:

Use the codebase-locator agent to find WHERE files and components live
Use the codebase-analyzer agent to understand HOW specific code works (without critiquing it)
Use the codebase-pattern-finder agent to find examples of existing patterns (without evaluating them)

IMPORTANT: All agents are documentarians, not critics. They will describe what exists without suggesting improvements or identifying issues.

For thoughts directory (if using thoughts system):

Use the thoughts-locator agent to discover what documents exist about the topic
Use the thoughts-analyzer agent to extract key insights from specific documents (only the most relevant ones)

For external research (only if user explicitly asks):

Use the external-research agent for external documentation and resources
IF you use external research agents, instruct them to return LINKS with their findings, and INCLUDE those links in your final report

For Linear tickets (if relevant):

Use the linear-ticket-reader agent to get full details of a specific ticket (if Linear MCP available)
Use the linear-searcher agent to find related tickets or historical context

The key is to use these agents intelligently:

Start with locator agents to find what exists
Then use analyzer agents on the most promising findings to document how they work
Run multiple agents in parallel when they're searching for different things
Each agent knows its job - just tell it what you're looking for
Don't write detailed prompts about HOW to search - the agents already know
Remind agents they are documenting, not evaluating or improving

Example of spawning parallel research tasks:

I'm going to spawn 3 parallel research tasks:

Task 1 - Find WHERE components live:
"Use codebase-locator to find all files related to [topic]. Focus on [specific directories if known]."

Task 2 - Understand HOW it works:
"Use codebase-analyzer to analyze [specific component] and document how it currently works. Include data flow and key integration points."

Task 3 - Find existing patterns:
"Use codebase-pattern-finder to find similar implementations of [pattern] in the codebase. Show concrete examples."

Step 4: Wait for All Sub-Agents to Complete and Synthesize Findings

IMPORTANT: Wait for ALL sub-agent tasks to complete before proceeding
Compile all sub-agent results (both codebase and thoughts findings if applicable)
Prioritize live codebase findings as primary source of truth
Use thoughts/ findings as supplementary historical context (if thoughts system is used)
Connect findings across different components
Document specific file paths and line numbers (format: file.ext:line)
Explain how components interact with each other
Include temporal context where relevant (e.g., "This was added in commit abc123")
Mark all research tasks as complete in TodoWrite

Step 5: Gather Metadata for the Research Document

Collect metadata for the research document:

If using thoughts system with metadata script:

Run hack/spec_metadata.sh or equivalent to generate metadata
Metadata includes: date, researcher, git commit, branch, repository

If using simple approach:

Get current date/time
Get git commit hash: git rev-parse HEAD
Get current branch: git branch --show-current
Get repository name from .git/config or working directory

Document Storage:

All research documents are stored in the thoughts system for persistence:

Required location: thoughts/shared/research/YYYY-MM-DD-{ticket}-{description}.md

Why thoughts/shared/:

✅ Persisted across sessions (git-backed via HumanLayer)
✅ Shared across worktrees
✅ Synced via humanlayer thoughts sync
✅ Team collaboration ready

Filename format:

With ticket: thoughts/shared/research/YYYY-MM-DD-PROJ-XXXX-description.md
Without ticket: thoughts/shared/research/YYYY-MM-DD-description.md

Replace PROJ with your ticket prefix from .claude/config.json.

Examples:

thoughts/shared/research/2025-01-08-PROJ-1478-parent-child-tracking.md
thoughts/shared/research/2025-01-08-authentication-flow.md (no ticket)

Step 6: Generate Research Document

Create a structured research document with the following format:

---
date: YYYY-MM-DDTHH:MM:SS+TZ
researcher: { your-name }
git_commit: { commit-hash }
branch: { branch-name }
repository: { repo-name }
topic: "{User's Research Question}"
tags: [research, codebase, { component-names }]
status: complete
last_updated: YYYY-MM-DD
last_updated_by: { your-name }
---

# Research: {User's Research Question}

**Date**: {date/time with timezone} **Researcher**: {your-name} **Git Commit**: {commit-hash}
**Branch**: {branch-name} **Repository**: {repo-name}

## Research Question

{Original user query, verbatim}

## Summary

{High-level documentation of what you found. 2-3 paragraphs explaining the current state of the
system in this area. Focus on WHAT EXISTS, not what should exist.}

## Detailed Findings

### {Component/Area 1}

**What exists**: {Describe the current implementation}

- File location: `path/to/file.ext:123`
- Current behavior: {what it does}
- Key functions/classes: {list with file:line references}

**Connections**: {How this component integrates with others}

- Calls: `other-component.ts:45` - {description}
- Used by: `consumer.ts:67` - {description}

**Implementation details**: {Technical specifics without evaluation}

### {Component/Area 2}

{Same structure as above}

### {Component/Area N}

{Continue for all major findings}

## Code References

Quick reference of key files and their roles:

- `path/to/file1.ext:123-145` - {What this code does}
- `path/to/file2.ext:67` - {What this code does}
- `path/to/file3.ext:200-250` - {What this code does}

## Architecture Documentation

{Document the current architectural patterns, conventions, and design decisions observed in the
code. This is descriptive, not prescriptive.}

### Current Patterns

- **Pattern 1**: {How it's implemented in the codebase}
- **Pattern 2**: {How it's implemented in the codebase}

### Data Flow

{Document how data moves through the system in this area}

Component A → Component B → Component C {Describe what happens at each step}


### Key Integrations

{Document how different parts of the system connect}

## Historical Context (from thoughts/)

{ONLY if using thoughts system}

{Include insights from thoughts/ documents that provide context}

- `thoughts/shared/research/previous-doc.md` - {Key decision or insight}
- `thoughts/shared/plans/plan-123.md` - {Related implementation detail}

## Related Research

{Links to other research documents that touch on related topics}

- `research/YYYY-MM-DD-related-topic.md` - {How it relates}

## Open Questions

{Areas that would benefit from further investigation - NOT problems to fix, just areas where understanding could be deepened}

- {Question 1}
- {Question 2}

Step 7: Add GitHub Permalinks (If Applicable)

If you're on the main/master branch OR if the commit is pushed:

Generate GitHub permalinks and replace file references:

https://github.com/{owner}/{repo}/blob/{commit-hash}/{file-path}#L{line}

For line ranges:

https://github.com/{owner}/{repo}/blob/{commit-hash}/{file-path}#L{start}-L{end}

If working on a feature branch that's not pushed yet:

Keep local file references: path/to/file.ext:line
Add note: "GitHub permalinks will be added once this branch is pushed"

Step 8: Sync and Present Findings

If using thoughts system:

Run humanlayer thoughts sync to sync the thoughts directory
This updates symlinks, creates searchable index, and commits to thoughts repo

If using simple approach:

Just save the file to your research directory
Optionally commit to git

Present to user:

✅ Research complete!

**Research document**: {file-path}

## Summary

{2-3 sentence summary of key findings}

## Key Files

{Top 3-5 most important file references}

## What I Found

{Brief overview - save details for the document}

---

## 📊 Context Status

Current usage: {X}% ({Y}K/{Z}K tokens)

{If >60%}: ⚠️ **Recommendation**: Context is getting full. For best results in the planning phase, I
recommend clearing context now.

**Options**:

1. ✅ Clear context now (recommended) - Close this session and start fresh for planning
2. Create handoff to pause work
3. Continue anyway (may impact performance)

**Why clear?** Fresh context ensures optimal AI performance for the planning phase, which will load
additional files and research.

{If <60%}: ✅ Context healthy. Ready to proceed to planning phase if needed.

---

Would you like me to:

1. Dive deeper into any specific area?
2. Create an implementation plan based on this research?
3. Explore related topics?

Step 9: Handle Follow-Up Questions

If the user has follow-up questions:

DO NOT create a new research document - append to the same one
Update frontmatter fields:
- last_updated: {new date}
- last_updated_by: {your name}
- Add last_updated_note: "{Brief note about what was added}"
Add new section to existing document:

---

## Follow-up Research: {Follow-up Question}

**Date**: {date} **Updated by**: {your-name}

### Additional Findings

{New research results using same structure as above}

Spawn new sub-agents as needed for the follow-up research
Re-sync (if using thoughts system)

Important Notes

Proactive Context Management

Monitor Your Context Throughout Research:

Check token usage after spawning parallel agents
After synthesis phase, check context again
If context >60%: Warn user and recommend handoff

Example Warning:

⚠️ Context Usage Alert: Currently at 65% (130K/200K tokens)

Research is complete, but context is getting full. Before continuing to
planning phase, I recommend creating a handoff to preserve this work
and start fresh.

Would you like me to:
1. Create a handoff now (recommended)
2. Continue and clear context manually
3. Proceed anyway (not recommended - may impact planning quality)

**Why this matters**: The planning phase will load additional context.
Starting fresh ensures optimal AI performance.

When to Warn:

After Step 7 (document generated) if context >60%
After Step 9 (follow-up complete) if context >70%
Anytime during research if context >80%

Educate the User:

Explain WHY clearing context matters (performance, token efficiency)
Explain WHEN to clear (between phases)
Offer to create handoff yourself if /create-handoff command exists

Parallel Execution

ALWAYS use parallel Task agents for efficiency
Don't wait for one agent to finish before spawning the next
Spawn all research tasks at once, then wait for all to complete

Research Philosophy

Always perform fresh codebase research - never rely solely on existing docs
The thoughts/ directory (if used) provides historical context, not primary source
Focus on concrete file paths and line numbers - make it easy to navigate
Research documents should be self-contained and understandable months later

Sub-Agent Prompts

Be specific about what to search for
Specify directories to focus on when known
Make prompts focused on read-only documentation
Remind agents they are documentarians, not critics

Cross-Component Understanding

Document how components interact, not just what they do individually
Trace data flow across boundaries
Note integration points and dependencies

Temporal Context

Include when things were added/changed if relevant
Note deprecated patterns still in the codebase
Don't judge - just document the timeline

GitHub Links

Use permalinks for permanent references
Include line numbers for precision
Link to specific commits, not branches (branches move)

Main Agent Role

Your role is synthesis, not deep file reading
Let sub-agents do the detailed reading
You orchestrate, compile, and connect their findings
Focus on the big picture and cross-component connections

Documentation Style

Sub-agents document examples and usage patterns as they exist
Main agent synthesizes into coherent narrative
Both levels: documentarian, not evaluator
Never recommend changes or improvements unless explicitly asked

File Reading Rules

ALWAYS read mentioned files fully before spawning sub-tasks
Use Read tool WITHOUT limit/offset for complete files
This is critical for proper decomposition

Follow the Steps

These numbered steps are not suggestions - follow them exactly
Don't skip steps or reorder them
Each step builds on the previous ones

Thoughts Directory Handling

If using thoughts system:

thoughts/searchable/ is a special directory - paths found there should be documented as their actual location
Example: thoughts/searchable/allison/notes.md → document as thoughts/allison/notes.md
Don't change directory names (keep allison/, don't change to shared/)

If NOT using thoughts system:

Skip thoughts-related agents
Skip thoughts sync commands
Save research docs to research/ directory in workspace root

Frontmatter Consistency

Always include complete frontmatter as shown in template
Use ISO 8601 dates with timezone
Keep tags consistent across research documents
Update last_updated fields when appending follow-ups

Linear Integration

If a Linear ticket is associated with the research, the command can automatically update the ticket status.

How It Works

Ticket detection (same as other commands):

User provides ticket ID explicitly: /research_codebase PROJ-123
Ticket mentioned in research query
Auto-detected from current context

Status updates:

When research starts → Move ticket to "Research"
When research document is saved → Add comment with link to research doc

Implementation Pattern

At research start (Step 2 - after reading mentioned files):

# If ticket is detected or provided
if [[ -n "$ticketId" ]]; then
  # Check if Linearis CLI is available
  if command -v linearis &> /dev/null; then
    # Update ticket state to "Research" (use --state NOT --status!)
    linearis issues update "$ticketId" --state "Research"

    # Add comment (use 'comments create' NOT 'issues comment'!)
    linearis comments create "$ticketId" --body "Starting research: [user's research question]"
  else
    echo "⚠️  Linearis CLI not found - skipping Linear ticket update"
  fi
fi

After research document is saved (Step 6 - after generating document):

# Attach research document to ticket
if [[ -n "$ticketId" ]] && [[ -n "$githubPermalink" ]]; then
  # Check if Linearis CLI is available
  if command -v linearis &> /dev/null; then
    # Add completion comment with research doc link
    linearis comments create "$ticketId" \
        --body "Research complete! See findings: $githubPermalink"
  else
    echo "⚠️  Linearis CLI not found - skipping Linear ticket update"
  fi
fi

User Experience

With ticket:

/catalyst-dev:research_codebase PROJ-123
> "How does authentication work?"

What happens:

Command detects ticket PROJ-123
Moves ticket from Backlog → Research
Adds comment: "Starting research: How does authentication work?"
Conducts research with parallel agents
Saves document to thoughts/shared/research/
Attaches document to Linear ticket
Adds comment: "Research complete! See findings: [link]"

Without ticket:

/catalyst-dev:research_codebase
> "How does authentication work?"

What happens:

Same research process, but no Linear updates
User can manually attach research to ticket later

Configuration

Uses the same Linear configuration as other commands from .claude/config.json:

linear.teamId
linear.thoughtsRepoUrl (for GitHub permalinks)

Error Handling

If Linear MCP not available:

Skip Linear integration silently
Continue with research as normal
Note in output: "Research complete (Linear not configured)"

If ticket not found:

Show warning: "Ticket PROJ-123 not found in Linear"
Ask user: "Continue research without Linear integration? (Y/n)"

If status update fails:

Log error but continue research
Include note in final output: "⚠️ Could not update Linear ticket status"

Integration with Other Commands

This command integrates with the complete development workflow:

/research-codebase → research document (+ Linear: Research)
                  ↓
           /create-plan → implementation plan (+ Linear: Planning)
                  ↓
          /implement-plan → code changes (+ Linear: In Progress)
                  ↓
              /describe-pr → PR created (+ Linear: In Review)

How it connects:

research_codebase → Linear: Moves ticket to "Research" status and attaches research document
research_codebase → create_plan: Research findings provide foundation for planning. The create_plan command can reference research documents in its "References" section.
Research before planning: Always research the codebase first to understand what exists before planning changes.
Shared agents: Both research_codebase and create_plan use the same specialized agents (codebase-locator, codebase-analyzer, codebase-pattern-finder).
Documentation persistence: Research documents serve as permanent reference for future work.

Example Workflow

# User starts research
/research-codebase

# You respond with initial prompt
# User asks: "How does authentication work in the API?"

# You execute:
# 1. Read any mentioned files fully
# 2. Decompose into research areas (auth middleware, token validation, session management)
# 3. Spawn parallel agents:
#    - codebase-locator: Find auth-related files
#    - codebase-analyzer: Understand auth middleware implementation
#    - codebase-pattern-finder: Find auth usage patterns
#    - thoughts-locator: Find previous auth discussions (if using thoughts)
# 4. Wait for all agents
# 5. Synthesize findings
# 6. Generate research document at research/2025-01-08-authentication-system.md
# 7. Present summary to user

# User follows up: "How does it integrate with the database?"
# You append to same document with new findings

Track in Workflow Context

After saving the research document, add it to workflow context:

if [[ -f "${CLAUDE_PLUGIN_ROOT}/scripts/workflow-context.sh" ]]; then
  "${CLAUDE_PLUGIN_ROOT}/scripts/workflow-context.sh" add research "$DOC_PATH" "${TICKET_ID:-null}"
fi

Adaptation Notes

This command is adapted from HumanLayer's research_codebase command. Key differences for portability:

Thoughts system: Made optional - can use simple research/ directory
Metadata script: Made optional - can generate metadata inline
Ticket prefixes: Read from .claude/config.json or use PROJ- placeholder
Linear integration: Made optional - only used if Linear MCP available
Web research: Uses external-research agent instead of web-search-researcher

The core workflow and philosophy remain the same: parallel sub-agents, documentarian mindset, and structured output.

/research_codebase