Project Indexing

Overview

Purpose: Shannon's codebase compression system that achieves 94% token reduction (58K → 3K tokens) by generating structured SHANNON_INDEX files. Transforms linear file-by-file exploration into instant structured lookups, enabling fast agent onboarding, efficient multi-agent coordination, and sustainable context window usage.

ROI Proven: 40,000+ tokens saved per project analysis, 12-60x speedup, eliminates redundant file reads in multi-agent scenarios.

When to Use

Use this skill when:

Starting ANY project analysis or implementation (always generate index first)
Onboarding new agents to existing codebase
Launching multi-agent wave execution
Switching between multiple projects/codebases
Context window efficiency is critical
After major codebase changes (regenerate index)

DO NOT use when:

NEVER skip - Even "small" or "focused" questions benefit from indexing (see Anti-Rationalization section)
Index already generated and current (< 24 hours old, no major changes)

Core Competencies

94% Token Compression: Reduces 58K full codebase to 3K structured summary through hierarchical summarization and pattern abstraction
Project Scan: Discovers file counts, languages, LOC, dependencies without loading full content (500 token cost)
Architecture Summarization: Identifies core modules, key patterns, tech stack from directory structure and metadata (1,500 token cost)
Context Enrichment: Adds git recent changes, dependency analysis, testing setup (500 token cost)
Template Population: Generates structured SHANNON_INDEX.md following 7-section template (500 token cost)
Serena Persistence: Stores index in memory for cross-session retrieval and wave agent coordination (100 token cost)
Multi-Agent Optimization: Enables 81-95% token savings in parallel wave execution scenarios

Inputs

Required:

project_path (string): Absolute path to project root directory

Optional:

include_tests (boolean): Include test file statistics (default: true)
git_days (int): Number of days of git history to include (default: 7)
max_dependencies (int): Maximum dependencies to list (default: 10)
custom_sections (array): Additional sections to include beyond standard 7

Anti-Rationalization (From Baseline Testing)

CRITICAL: Agents systematically rationalize skipping project indexing, leading to massive token waste. Below are the 6 most common rationalizations detected in baseline testing, with mandatory counters.

Rationalization 1: "I only need to understand one area"

Example: User asks "Where is authentication logic?" → Agent thinks "Just need auth files, no need for full index"

COUNTER:

❌ NEVER skip indexing for "focused" questions
✅ Finding "one area" without index requires exploring 15+ files (22K tokens wasted)
✅ Index lookup answers in 150 tokens (99% reduction)
✅ "Just this area" questions cascade into related areas (auth → utils → config → tests)
✅ Creating index takes 2 minutes, manual exploration takes 5+ minutes

Rule: Generate index first. Even "focused" questions benefit from structure map.

Rationalization 2: "Context window is large enough"

Example: Agent sees 200K token limit → thinks "Plenty of space, can load files directly"

COUNTER:

❌ NEVER rely on "large context window" as excuse to skip compression
✅ Large codebases consume 58K-100K+ tokens (30-50% of window)
✅ Multi-agent waves multiply consumption (3 agents × 20K = 60K wasted)
✅ Context window is shared resource across entire conversation
✅ Token waste compounds over session (10 questions × 5K waste = 50K gone)
✅ Index generation is investment: 3K tokens unlock unlimited queries

Rule: Context window size doesn't eliminate need for compression. Generate index.

Rationalization 3: "I'll remember the structure"

Example: Agent explores codebase once → thinks "I know where things are now, don't need index"

COUNTER:

❌ NEVER rely on agent "memory" instead of persistent index
✅ Memory doesn't scale across agents (each rebuilds mental model)
✅ New agents joining waves have zero context (16K tokens to onboard without index)
✅ Session ends = structure knowledge lost (next session starts from scratch)
✅ Serena stores index, not unstructured mental models
✅ Index enables handoffs: FRONTEND agent → BACKEND agent (zero token overhead)

Rule: Mental models are session-local and agent-specific. Index is persistent and shareable.

Rationalization 4: "Reading files is fast enough"

Example: Agent thinks "Files are only 200-500 tokens each, reading is cheap"

COUNTER:

❌ NEVER use per-file cost to justify skipping index
✅ 247 files × 235 tokens average = 58,000 tokens total
✅ Agent doesn't know which files to read without exploring (exploration = hidden cost)
✅ Multi-agent scenarios: 3 agents × 20K exploration = 60K tokens wasted
✅ "Fast" per-file compounds into "slow" codebase understanding
✅ Index collapses 247 files into 3K tokens (19x compression)

Rule: Per-file cheapness is illusion. Total cost is massive. Generate index.

Rationalization 5: "This is a small project"

Example: Agent sees 50 files → thinks "Small enough to explore manually, index overkill"

COUNTER:

❌ NEVER skip indexing for "small" projects
✅ 50 files × 400 tokens average = 20,000 tokens (still large!)
✅ Index reduces to ~1,500 tokens (93% reduction even for "small" projects)
✅ "Small" projects have complex interdependencies (utils → components → config)
✅ Index generation takes 60 seconds for small projects (minimal investment)
✅ Even 10-file projects benefit from structured summary (Quick Stats + Core Modules)

Rule: "Small" is relative. All projects benefit from compression. Generate index.

Rationalization 6: "User only needs a quick answer"

Example: User asks simple question → Agent thinks "Quick answer, don't need full index"

COUNTER:

❌ NEVER skip index generation for "quick questions"
✅ "Quick" questions cascade: "Where is X?" → "How does X work?" → "What depends on X?"
✅ Answering without index = 5K-10K tokens per question (compounds over conversation)
✅ Generating index = 3K token investment, 40K+ token savings over session
✅ Index enables instant followup questions (zero additional exploration cost)
✅ "Quick" is often 5-10 related questions, not one

Rule: Quick questions justify index more, not less. Generate index first.

Detection Signal

If you're tempted to:

Explore "just this area" without full index
Rely on context window size as buffer
Build mental model instead of persistent index
Add up per-file costs and conclude "cheap enough"
Skip indexing for "small" projects
Answer "quick" questions without structure map

Then: STOP. Generate SHANNON_INDEX first. Token waste is exponential, not linear.

Workflow

Phase 1: Project Discovery

Count Files by Type
- Action: Use find or Glob to count files per extension
- Tool: find . -type f -name "*.ts" | wc -l for each file type
- Output: File counts (typescript_count, python_count, jsx_count, etc.)
Calculate Total Lines of Code
- Action: Use wc -l on all files by language
- Tool: find . -type f -name "*.ts" -exec wc -l {} + | tail -1
- Output: LOC per language
Identify Tech Stack
- Action: Read dependency files (package.json, requirements.txt, Cargo.toml)
- Tool: Grep for dependencies
- Output: Raw dependency lists

Cost: ~500 tokens

Phase 2: Architecture Summarization

Extract Directory Structure
- Action: Generate 2-level directory tree
- Tool: tree -L 2 -d or recursive directory listing
- Output: Hierarchical directory structure
Identify Core Modules
- Action: Read first 50 lines of README or index files in top-level directories
- Tool: Read (limited lines)
- Output: 1-2 sentence purpose per module
Detect Key Patterns
- Action: Identify architectural patterns from file names and imports
- Patterns: Test runners (Jest/Pytest), state management (Redux/Context), routing (React Router)
- Output: Pattern descriptions

Cost: ~1,500 tokens (97% compression from full file reads)

Phase 3: Context Enrichment

Extract Git Recent Changes
- Action: Get last 7 days of commits
- Tool: git log --since="7 days ago" --pretty=format:"%h - %s" --abbrev-commit
- Output: Recent commit list
Analyze Key Dependencies
- Action: Extract top 10 dependencies by usage frequency
- Tool: Parse package files, count imports via Grep
- Output: Dependency list with versions
Detect Testing Setup
- Action: Identify test framework, file patterns, coverage tools
- Tool: Check for test config files, test directories
- Output: Testing strategy description

Cost: ~500 tokens

Phase 4: Template Population

Generate Quick Stats Section
- Action: Format file counts, languages, LOC, timestamp
- Output: 5-line stats block (100 tokens)
Generate Tech Stack Section
- Action: Format languages, frameworks, build tools
- Output: Tech stack list (200 tokens)
Generate Core Modules Section
- Action: Format directory structure with purposes
- Output: Module descriptions (800 tokens)
Generate Recent Changes Section
- Action: Format git commits
- Output: Commit list (300 tokens)
Generate Dependencies Section
- Action: Format top 10 dependencies with versions
- Output: Dependency table (150 tokens)
Generate Testing Strategy Section
- Action: Format framework, patterns, coverage
- Output: Testing description (150 tokens)
Generate Key Patterns Section
- Action: Format routing, state, auth, API conventions
- Output: Pattern descriptions (400 tokens)

Cost: ~500 tokens

Phase 5: Persistence

Store in Serena Memory
- Action: Write complete index to Serena
- Tool: write_memory("shannon_index_{project_name}", index_content)
- Output: Memory storage confirmation
Write Local Backup
- Action: Save SHANNON_INDEX.md to project root
- Tool: Write
- Output: File creation confirmation

Cost: ~100 tokens

Total Generation Cost: 3,100 tokens

SHANNON_INDEX Generation Algorithm

Step 1: Project Scan (Discovery)

Objective: Discover project structure without loading content

# Count files by type
find . -type f -name "*.ts" | wc -l → typescript_count
find . -type f -name "*.py" | wc -l → python_count
find . -type f -name "*.jsx" | wc -l → jsx_count
# ... repeat for all extensions

# Calculate total lines of code
find . -type f -name "*.ts" -exec wc -l {} + | tail -1 → typescript_lines
# ... repeat for all languages

# Identify tech stack
grep -h "^[^#]" package.json requirements.txt Cargo.toml → dependencies_raw

Token Cost: ~500 tokens (Bash commands + Glob queries)

Output: File counts, language breakdown, total LOC, dependency lists

Step 2: Architecture Summarization (Compression)

Objective: Identify core modules and their purposes without reading full files

Strategy:

Directory structure = high-level organization

tree -L 2 -d > structure.txt  # 2-level directory tree

Core modules = top-level directories with descriptions
- Read first 50 lines of each main directory's README or index file
- Extract purpose from comments, docstrings, or exports
- Limit: 1-2 sentence summary per module
Key patterns = architectural conventions
- Testing: Detect test runners (Jest, Pytest, Vitest)
- State management: Detect patterns (Redux, Context, Zustand)
- Routing: Identify router setup (React Router, Next.js, FastAPI)

Token Cost: ~1,500 tokens (directory tree + module summaries + pattern detection)

Compression Ratio: 58K full files → 1.5K structured summaries = 97% reduction

Step 3: Context Enrichment (Metadata)

Objective: Add temporal and dependency context

Git Recent Changes (Last 7 Days):

git log --since="7 days ago" --pretty=format:"%h - %s" --abbrev-commit

Key Dependencies (Top 10 by usage):

Extract from package.json, requirements.txt, Cargo.toml
Sort by import frequency (grep entire codebase for package names)
Include versions for breaking change awareness

Testing Setup:

Test framework: Jest/Pytest/Vitest
Test file location pattern: **/*.test.* or tests/
Coverage tools: Istanbul, Coverage.py

Token Cost: ~500 tokens (git log + dependency extraction + testing detection)

Step 4: Template Population (Generation)

Objective: Write structured SHANNON_INDEX.md following template

Template Sections:

Quick Stats (5 lines)
- Total files, languages, lines of code, last updated
Tech Stack (10 lines)
- Primary languages, frameworks, build tools
Core Modules (30-40 lines)
- Top-level directories with 1-2 sentence purposes
Recent Changes (15-20 lines)
- Last 7 days git commits (titles only)
Key Dependencies (15 lines)
- Top 10 packages with versions
Testing Strategy (10 lines)
- Framework, file patterns, coverage tools
Key Patterns (20-25 lines)
- Routing, state, auth, API conventions

Token Cost: ~500 tokens (writing template)

Step 5: Persistence (Storage)

Objective: Save for cross-session retrieval

Serena Storage:

# Save to Serena memory
write_memory(
    memory_name="shannon_index_{project_name}",
    content=shannon_index_content
)

Local Backup:

# Write to project root
echo "$shannon_index_content" > SHANNON_INDEX.md

Token Cost: ~100 tokens (write operations)

Total Generation Cost

Phase	Token Cost
Project Scan	500
Architecture Summarization	1,500
Context Enrichment	500
Template Population	500
Persistence	100
TOTAL	3,100 tokens

Token Savings

Scenario	Without Index	With Index	Savings
Initial load	58,000	3,100	54,900 (94%)
Subsequent queries	5,000/query	50/query	4,950 (99%)
Multi-agent (3 agents)	64,000	3,100	60,900 (95%)
Context switching	58,000	3,100	54,900 (94%)

ROI: 3,100 token investment → 54,900+ token savings on first use → 17.7x return

SHANNON_INDEX Template Structure

# Shannon Project Index

## Quick Stats
- **Total Files**: {count}
- **Primary Languages**: {languages_list}
- **Total Lines of Code**: {loc_total}
- **Last Updated**: {timestamp}
- **Dependencies**: {dep_count}

## Tech Stack
- **Languages**: {languages_with_percentages}
- **Frameworks**: {frameworks_list}
- **Build Tools**: {build_tools}
- **Testing**: {test_framework}
- **Package Manager**: {package_manager}

## Core Modules
{directory_structure_with_purposes}

## Recent Changes (Last 7 Days)
{git_log_commits}

## Key Dependencies
{top_10_dependencies_with_versions}

## Testing Strategy
- **Framework**: {test_framework}
- **Test Location**: {test_file_patterns}
- **Coverage Tool**: {coverage_tool}
- **Test Types**: {unit/integration/e2e}

## Key Patterns
- **Routing**: {routing_approach}
- **State Management**: {state_pattern}
- **Authentication**: {auth_pattern}
- **API Design**: {api_conventions}
- **Error Handling**: {error_patterns}

Template Token Count: ~3,000 tokens when populated

94% Token Reduction Methodology

Compression Techniques

1. Hierarchical Summarization

Full file content: 235 tokens average
1-sentence summary: 12 tokens
Compression: 95% per file

2. Structural Deduplication

247 files with similar patterns (components/, utils/, tests/)
Represent as directory with purpose: 50 tokens
Compression: 99.6% (247 files → 1 directory entry)

3. Temporal Relevance Filtering

Full git history: 10,000+ tokens
Last 7 days commits: 300 tokens
Compression: 97%

4. Dependency Aggregation

Full package.json: 2,000 tokens
Top 10 with versions: 150 tokens
Compression: 92.5%

5. Pattern Abstraction

Reading 50 routing files: 11,750 tokens
"Uses React Router v6 with lazy loading": 8 tokens
Compression: 99.9%

Cumulative Effect

Full Codebase (247 files):
├─ File content: 58,000 tokens
└─ Compressed representation:
   ├─ Quick Stats: 100 tokens
   ├─ Tech Stack: 200 tokens
   ├─ Core Modules: 800 tokens
   ├─ Recent Changes: 300 tokens
   ├─ Dependencies: 150 tokens
   ├─ Testing: 150 tokens
   └─ Key Patterns: 400 tokens
   TOTAL: 2,100 tokens

Overhead (formatting, markdown): +900 tokens
FINAL INDEX SIZE: 3,000 tokens

REDUCTION: (58,000 - 3,000) / 58,000 = 94.8% ≈ 94%

Outputs

SHANNON_INDEX.md file containing:

# Shannon Project Index

## Quick Stats
- **Total Files**: 247
- **Primary Languages**: TypeScript (65%), JavaScript (20%), CSS (10%), JSON (5%)
- **Total Lines of Code**: 18,543
- **Last Updated**: 2025-11-03T14:23:00Z
- **Dependencies**: 42

## Tech Stack
- **Languages**: TypeScript 65%, JavaScript 20%, CSS 10%, JSON 5%
- **Frameworks**: React 18.2.0, Next.js 13.4.0
- **Build Tools**: Vite 4.3.0, TypeScript 5.1.0
- **Testing**: Playwright 1.35.0
- **Package Manager**: npm

## Core Modules
- **src/**: Main application source code (React components, hooks, utilities)
- **public/**: Static assets (images, fonts, favicon)
- **tests/**: Playwright functional tests (NO MOCKS)
- **docs/**: Project documentation and guides
- **config/**: Build and deployment configuration

## Recent Changes (Last 7 Days)
- d25b52a - feat(validation): add skill structure validation
- 25b283e - feat(skills): add comprehensive skill template
- f1bf9dc - WIP
- fc93e23 - docs(v4): Add completion SITREP

## Key Dependencies
1. react@18.2.0 - UI framework
2. next@13.4.0 - React framework
3. playwright@1.35.0 - Browser automation
4. typescript@5.1.0 - Type safety
5. vite@4.3.0 - Build tool

## Testing Strategy
- **Framework**: Playwright
- **Test Location**: tests/**/*.spec.ts
- **Coverage Tool**: None
- **Test Types**: E2E functional tests (NO MOCKS)

## Key Patterns
- **Routing**: Next.js file-based routing with App Router
- **State Management**: React Context with custom hooks
- **Authentication**: JWT tokens with NextAuth.js
- **API Design**: REST API with tRPC for type safety
- **Error Handling**: Error boundaries with fallback UI

Serena Memory Storage:

Key: shannon_index_{project_name}
Content: Complete SHANNON_INDEX.md content
Retrievable via: read_memory("shannon_index_{project_name}")

Metrics:

{
  "generation_time_seconds": 120,
  "token_cost": 3100,
  "original_size_tokens": 58000,
  "compressed_size_tokens": 3000,
  "compression_ratio": 0.948,
  "savings_tokens": 54900,
  "roi_multiplier": 17.7
}

Success Criteria

This skill succeeds if:

✅ Index generated in < 5 minutes
- Small projects (< 50 files): 30-60 seconds
- Medium projects (50-150 files): 60-120 seconds
- Large projects (150-300 files): 120-180 seconds
- Extra large (300+ files): 180-300 seconds
✅ Compression ratio >= 90%
- Target: 94% (58K → 3K)
- Acceptable: 90-96%
- Poor: < 90%
✅ All 7 sections present and populated
- Quick Stats (5 lines minimum)
- Tech Stack (5 entries minimum)
- Core Modules (3 modules minimum)
- Recent Changes (1+ commits or "No recent changes")
- Key Dependencies (3+ dependencies)
- Testing Strategy (framework identified)
- Key Patterns (2+ patterns identified)
✅ Token count within target range
- Small projects: 1,500-2,000 tokens
- Medium projects: 2,000-2,500 tokens
- Large projects: 2,500-3,500 tokens
- Extra large: 3,500-4,500 tokens
✅ Stored in Serena memory
- Memory key: shannon_index_{project_name}
- Retrievable via read_memory()
- Content matches local file

Validation:

def validate_shannon_index(index_content, metrics):
    # Verify compression ratio
    compression = 1 - (metrics.compressed_size / metrics.original_size)
    assert compression >= 0.90, f"Compression ratio {compression:.2%} below 90% target"

    # Verify all sections present
    required_sections = [
        "## Quick Stats",
        "## Tech Stack",
        "## Core Modules",
        "## Recent Changes",
        "## Key Dependencies",
        "## Testing Strategy",
        "## Key Patterns"
    ]
    for section in required_sections:
        assert section in index_content, f"Missing required section: {section}"

    # Verify token count
    assert 1500 <= metrics.compressed_size <= 4500, "Token count outside acceptable range"

    # Verify Serena storage
    project_name = extract_project_name(index_content)
    assert serena.memory_exists(f"shannon_index_{project_name}"), "Index not stored in Serena"

    # Verify ROI
    assert metrics.roi_multiplier >= 10, f"ROI {metrics.roi_multiplier}x below 10x minimum"

Examples

Example 1: Small React Project

Input:

project_path: "/Users/dev/my-react-app"
include_tests: true
git_days: 7
max_dependencies: 10

Process:

Project Scan: 47 files, 3,200 LOC, TypeScript + React detected
Architecture: src/ (components), public/ (assets), tests/ identified
Context: 3 commits in last 7 days, 15 dependencies, Jest detected
Template: Generate 7 sections totaling 1,800 tokens
Storage: Save to Serena + local file

Output:

# Shannon Project Index

## Quick Stats
- **Total Files**: 47
- **Primary Languages**: TypeScript (75%), CSS (15%), JSON (10%)
- **Total Lines of Code**: 3,200
- **Last Updated**: 2025-11-03T15:30:00Z
- **Dependencies**: 15

## Tech Stack
- **Languages**: TypeScript 75%, CSS 15%, JSON 10%
- **Frameworks**: React 18.2.0, Vite 4.3.0
- **Testing**: Jest 29.5.0
- **Package Manager**: npm

## Core Modules
- **src/components/**: React UI components (Button, Input, Modal)
- **src/hooks/**: Custom React hooks (useAuth, useData)
- **src/utils/**: Utility functions (formatDate, validateEmail)
- **tests/**: Jest unit tests

## Recent Changes (Last 7 Days)
- abc123 - feat: add login form component
- def456 - fix: resolve validation bug
- ghi789 - test: add component tests

## Key Dependencies
1. react@18.2.0
2. vite@4.3.0
3. jest@29.5.0

## Testing Strategy
- **Framework**: Jest
- **Test Location**: tests/**/*.test.ts
- **Test Types**: Component tests

## Key Patterns
- **Routing**: React Router v6
- **State Management**: React Context

Metrics:

Original: 12,000 tokens (47 files × 255 avg)
Compressed: 1,800 tokens
Savings: 10,200 tokens (85% reduction)
ROI: 5.7x

Example 2: Large Full-Stack Project

Input:

project_path: "/Users/dev/enterprise-app"
include_tests: true
git_days: 7
max_dependencies: 10

Process:

Project Scan: 247 files, 18,543 LOC, TypeScript + React + Node.js detected
Architecture: Multiple directories (frontend/, backend/, database/, tests/)
Context: 15 commits in last 7 days, 42 dependencies, Playwright + Jest detected
Template: Generate 7 sections totaling 3,200 tokens
Storage: Save to Serena + local file

Output:

# Shannon Project Index

## Quick Stats
- **Total Files**: 247
- **Primary Languages**: TypeScript (60%), JavaScript (25%), SQL (10%), CSS (5%)
- **Total Lines of Code**: 18,543
- **Last Updated**: 2025-11-03T15:45:00Z
- **Dependencies**: 42

## Tech Stack
- **Languages**: TypeScript 60%, JavaScript 25%, SQL 10%, CSS 5%
- **Frontend**: React 18.2.0, Next.js 13.4.0
- **Backend**: Express 4.18.0, Node.js 18.x
- **Database**: PostgreSQL 15, Prisma ORM 4.15.0
- **Testing**: Playwright 1.35.0, Jest 29.5.0
- **Build**: Vite 4.3.0, TypeScript 5.1.0

## Core Modules
- **frontend/src/**: Next.js application with React components
- **backend/src/**: Express API server with REST endpoints
- **database/**: Prisma schema and migrations
- **tests/e2e/**: Playwright functional tests (NO MOCKS)
- **tests/unit/**: Jest component tests
- **docs/**: API documentation and architecture guides

## Recent Changes (Last 7 Days)
- d25b52a - feat(auth): add OAuth integration
- 25b283e - fix(api): resolve rate limiting issue
- f1bf9dc - test: add E2E checkout flow test
- fc93e23 - docs: update API documentation
- 68dbbd4 - refactor(db): optimize query performance

## Key Dependencies
1. react@18.2.0 - Frontend framework
2. next@13.4.0 - React framework with SSR
3. express@4.18.0 - Backend API server
4. prisma@4.15.0 - Database ORM
5. playwright@1.35.0 - E2E testing
6. jest@29.5.0 - Unit testing
7. typescript@5.1.0 - Type safety
8. zod@3.21.0 - Runtime validation
9. stripe@12.10.0 - Payment processing
10. next-auth@4.22.0 - Authentication

## Testing Strategy
- **Framework**: Playwright (E2E), Jest (Unit)
- **Test Location**: tests/e2e/**/*.spec.ts, tests/unit/**/*.test.ts
- **Coverage Tool**: Istanbul (c8)
- **Test Types**: E2E functional (NO MOCKS), Unit component tests

## Key Patterns
- **Routing**: Next.js App Router with server components
- **State Management**: React Context + Zustand for global state
- **Authentication**: NextAuth.js with OAuth providers (Google, GitHub)
- **API Design**: REST API with tRPC for type-safe endpoints
- **Database**: Prisma ORM with PostgreSQL, migrations via Prisma Migrate
- **Error Handling**: Error boundaries (frontend), global error middleware (backend)

Metrics:

Original: 58,000 tokens (247 files × 235 avg)
Compressed: 3,200 tokens
Savings: 54,800 tokens (94% reduction)
ROI: 17.1x

Usage Patterns

Pattern 1: Initial Project Analysis

User: "Analyze the Shannon Framework codebase"

Agent (WITH SKILL):
1. Invoke @skill project-indexing
2. Generate SHANNON_INDEX (3,100 tokens)
3. Read SHANNON_INDEX (3,000 tokens)
4. Answer user question (200 tokens)
TOTAL: 6,300 tokens

Agent (WITHOUT SKILL):
1. Glob all files (1,000 tokens)
2. Read 50+ files to understand structure (58,000 tokens)
3. Answer user question (200 tokens)
TOTAL: 59,200 tokens

SAVINGS: 52,900 tokens (89% reduction)

Pattern 2: Multi-Agent Wave Execution

Wave Coordinator: "Launch 3 agents for parallel implementation"

Agents (WITH SKILL):
1. Coordinator generates SHANNON_INDEX once (3,100 tokens)
2. Each agent reads shared index (3,000 tokens × 3 = 9,000 tokens)
TOTAL: 12,100 tokens

Agents (WITHOUT SKILL):
1. Each agent explores independently:
   - Frontend: 18,000 tokens
   - Backend: 20,000 tokens
   - Testing: 26,000 tokens
TOTAL: 64,000 tokens

SAVINGS: 51,900 tokens (81% reduction)

Pattern 3: Agent Onboarding

User: "Bring in SECURITY agent to review authentication"

Security Agent (WITH SKILL):
1. Read SHANNON_INDEX from Serena (3,000 tokens)
2. Locate auth module from Core Modules section (50 tokens)
3. Read auth files identified in index (2,000 tokens)
TOTAL: 5,050 tokens

Security Agent (WITHOUT SKILL):
1. Read package.json (500 tokens)
2. Read README (1,200 tokens)
3. Grep for "auth" (finds 23 files)
4. Read 12 auth files (15,000 tokens)
TOTAL: 16,700 tokens

SAVINGS: 11,650 tokens (70% reduction)

Pattern 4: Context Switching

User: "Compare Project A and Project B architectures"

Agent (WITH SKILL):
1. Read SHANNON_INDEX for Project A (3,000 tokens)
2. Read SHANNON_INDEX for Project B (3,000 tokens)
3. Compare Tech Stack sections (500 tokens)
TOTAL: 6,500 tokens

Agent (WITHOUT SKILL):
1. Load Project A files (19,000 tokens)
2. Load Project B files (18,000 tokens)
3. Compare (500 tokens)
TOTAL: 37,500 tokens

SAVINGS: 31,000 tokens (83% reduction)

Integration Points

With Serena MCP

# Store index for cross-session retrieval
write_memory(
    memory_name=f"shannon_index_{project_name}",
    content=shannon_index_md
)

# Retrieve in future sessions
index = read_memory(f"shannon_index_{project_name}")

Benefit: Zero-cost context restoration across sessions

With spec-analysis Skill

1. User provides spec: "Build authentication system"
2. @skill project-indexing generates SHANNON_INDEX
3. @skill spec-analysis uses index to:
   - Detect existing auth patterns
   - Identify dependencies (JWT, OAuth, etc.)
   - Assess codebase familiarity (0.0-1.0)
4. 8D score adjusted based on codebase structure

Benefit: More accurate complexity scoring with project context

With wave-orchestration Skill

1. @skill wave-orchestration creates wave plan
2. Each wave agent receives SHANNON_INDEX in context
3. Agents use index to:
   - Locate relevant modules
   - Identify dependencies
   - Avoid duplicate exploration
4. Wave completion 3.5x faster with index

Benefit: Efficient multi-agent coordination

Performance Benchmarks

Generation Time

Project Size	Files	LOC	Generation Time
Small	10-50	<5K	30-60 seconds
Medium	50-150	5K-20K	60-120 seconds
Large	150-300	20K-50K	120-180 seconds
Extra Large	300+	50K+	180-300 seconds

Token Savings by Project Size

Project Size	Without Index	With Index	Savings
Small	12,000	1,500	87%
Medium	35,000	2,500	93%
Large	58,000	3,000	94%
Extra Large	100,000+	4,000	96%

Query Response Time

Query Type	Without Index	With Index	Speedup
"Where is X?"	3-5 minutes	5-15 seconds	12-60x
"What changed?"	5-8 minutes	10-20 seconds	15-48x
"How does Y work?"	5-10 minutes	30-60 seconds	10-20x
"Compare A and B"	10-15 minutes	30-60 seconds	20-30x

Average Speedup: 25x faster with index

Validation Checklist

After generating SHANNON_INDEX, verify:

✅ Quick Stats section includes:

Total files count
Primary languages list
Total LOC
Last updated timestamp

✅ Tech Stack section includes:

Language percentages
Framework names and versions
Build tools
Testing framework

✅ Core Modules section includes:

Top-level directories
1-2 sentence purpose per module
Clear organization structure

✅ Recent Changes section includes:

Last 7 days git commits
Commit hashes and titles
Relevant to understanding current state

✅ Key Dependencies section includes:

Top 10 dependencies by usage
Version numbers
Brief purpose notes

✅ Testing Strategy section includes:

Test framework name
Test file location patterns
Coverage tool (if applicable)

✅ Key Patterns section includes:

Routing approach
State management pattern
Authentication method
API conventions

✅ Token count is 2,500-3,500 tokens

✅ Compression ratio is 90-96%

✅ Saved to Serena with key shannon_index_{project_name}

✅ Written to file at {project_root}/SHANNON_INDEX.md

Common Pitfalls

Pitfall 1: Including Too Much Detail

Problem: Trying to summarize every file leads to 10K+ token index

Solution:

Limit Core Modules to top-level directories only
Use 1-2 sentences max per module
Summarize patterns, don't list every implementation

Pitfall 2: Skipping Temporal Context

Problem: Index without Recent Changes becomes stale

Solution:

Always include Last 7 Days commits
Add "Last Updated" timestamp to Quick Stats
Regenerate index after major changes

Pitfall 3: Missing Key Patterns

Problem: Index lacks architectural insights, forcing agent to explore anyway

Solution:

Detect routing approach (React Router, Next.js, etc.)
Identify state management (Redux, Context, Zustand)
Document authentication method (JWT, OAuth, Sessions)

Pitfall 4: Not Persisting to Serena

Problem: Index must be regenerated every session

Solution:

Always call write_memory() with index content
Use consistent naming: shannon_index_{project_name}
Verify storage with read_memory() before completing

Success Metrics

Track these metrics to validate skill effectiveness:

Token Reduction: (tokens_without_index - tokens_with_index) / tokens_without_index
- Target: ≥90%
- Good: 85-90%
- Poor: <85%
Query Response Time: seconds_to_answer_with_index / seconds_without_index
- Target: ≥20x speedup
- Good: 10-20x
- Poor: <10x
Multi-Agent Efficiency: total_tokens_with_shared_index / sum(tokens_per_agent_without_index)
- Target: ≥80% reduction
- Good: 70-80%
- Poor: <70%
Context Switching Cost: tokens_to_switch_with_indexes / tokens_to_reload_full_codebases
- Target: ≥85% reduction
- Good: 75-85%
- Poor: <75%
Generation ROI: tokens_saved_over_session / tokens_spent_generating_index
- Target: ≥15x
- Good: 10-15x
- Poor: <10x

Example Output

See examples/large-project-index.md for a complete SHANNON_INDEX example generated from a 247-file codebase, demonstrating 94% token reduction (58K → 3K tokens).

When to Use This Skill:

Starting any project analysis or implementation
Onboarding new agents to existing codebase
Launching multi-agent wave execution
Switching between multiple projects
When context window efficiency is critical
After major codebase changes (regenerate index)

When to Skip:

Never. Index generation is 3-minute investment for 40K+ token savings. Always generate.

project-indexing

Project Indexing

Overview

When to Use

Core Competencies

Inputs

Anti-Rationalization (From Baseline Testing)

Rationalization 1: "I only need to understand one area"

Rationalization 2: "Context window is large enough"

Rationalization 3: "I'll remember the structure"

Rationalization 4: "Reading files is fast enough"

Rationalization 5: "This is a small project"

Rationalization 6: "User only needs a quick answer"

Detection Signal

Workflow

Phase 1: Project Discovery

Phase 2: Architecture Summarization

Phase 3: Context Enrichment

Phase 4: Template Population

Phase 5: Persistence

SHANNON_INDEX Generation Algorithm

Step 1: Project Scan (Discovery)

Step 2: Architecture Summarization (Compression)

Step 3: Context Enrichment (Metadata)

Step 4: Template Population (Generation)

Step 5: Persistence (Storage)

Total Generation Cost

Token Savings

SHANNON_INDEX Template Structure

94% Token Reduction Methodology

Compression Techniques

Cumulative Effect

Outputs

Success Criteria

Examples

Example 1: Small React Project

Example 2: Large Full-Stack Project

Usage Patterns

Pattern 1: Initial Project Analysis

Pattern 2: Multi-Agent Wave Execution

Pattern 3: Agent Onboarding

Pattern 4: Context Switching

Integration Points

With Serena MCP

With spec-analysis Skill

With wave-orchestration Skill

Performance Benchmarks

Generation Time

Token Savings by Project Size

Query Response Time

Validation Checklist

Common Pitfalls

Pitfall 1: Including Too Much Detail

Pitfall 2: Skipping Temporal Context

Pitfall 3: Missing Key Patterns

Pitfall 4: Not Persisting to Serena

Success Metrics

Example Output

Similar Skills