GitHub Code Search

Fetch real-world code examples from GitHub through intelligent multi-search orchestration.

Prerequisites

Required:

gh CLI installed and authenticated (gh auth login --web)
Node.js + pnpm

Validation:

gh auth status  # Should show authenticated

Usage

cd plugins/knowledge-work/skills/gh-code-search
pnpm install  # First time only
pnpm search "your query here"

When to Use

Invoke this skill when user requests:

Real-world examples ("how do people implement X?")
Library usage patterns ("show me examples of workflow.dev usage")
Implementation approaches ("claude code skills doing github search")
Architectural patterns ("repository pattern in TypeScript")
Best practices for specific frameworks/libraries

Examples requiring nuanced multi-search:

"Find React hooks" → Need queries for useState/useEffect (imports), function use (definitions), const use = (arrow functions)
"Express error handling middleware" → Queries for (req, res, next) => (signatures), app.use (registration), function(err, (error handlers)
"How do people use XState?" → Queries for useMachine, createMachine, interpret (actual function names, not "state machine")
"GitHub Actions for TypeScript projects" → Queries for .yml workflow files + actions/setup-node + tsc or pnpm build (actual commands)

How It Works

Division of responsibilities:

Script Responsibilities (Tool Implementation)

Authenticates with GitHub (gh CLI token)
Executes single search query (Octokit API, pagination, text-match)
Ranks by quality (stars, recency, code structure)
Fetches top 10 files in parallel
Extracts factual data (imports, syntax patterns, metrics)
Returns clean markdown with code + metadata + GitHub URLs for all files

The script is a single-query tool. Claude orchestrates multiple invocations.

Claude's Orchestration Responsibilities

Claude executes a multi-phase workflow:

Query Strategy Generation: Craft 3-5 targeted search queries considering:
- Language context (TypeScript vs JavaScript)
- File type nuances (tsx vs ts for React, config files vs implementations)
- Specificity variants (broad → narrow)
- Related terminology expansion
Sequential Search Execution: Run tool multiple times, adapting queries based on intermediate results
Result Aggregation: Combine all code files, deduplicate by repo+path, preserve GitHub URLs
Pattern Analysis: Extract common imports, architectural styles, implementation patterns
Comprehensive Summary: Synthesize findings with trade-offs, recommendations, key code highlights, and GitHub URLs for ALL meaningful files

Claude's Orchestration Workflow

Step 1: Analyze Request & Generate Queries (BLOCKING)

Analyze user request to determine:

Primary language/framework (TypeScript, Python, Go, etc.)
Specific patterns sought (hooks, middleware, configs, actions)
File type variations needed (tsx vs ts, yaml vs json)

Generate 3-5 targeted queries considering nuances:

Example 1: "Find React hooks"

Query 1: useState useEffect language:typescript extension:tsx (matches import statements, hook usage in components)
Query 2: function use language:typescript extension:ts (matches hook definitions like function useFetch())
Query 3: const use = language:typescript (matches arrow function hooks like const useAuth = () =>)

Example 2: "Express error handling"

Query 1: (req, res, next) => language:javascript (matches middleware function signatures)
Query 2: app.use express (matches Express middleware registration)
Query 3: function(err, req, res, next) language:javascript (matches error handler signatures with 4 params)

Example 3: "XState state machines in React"

Query 1: useMachine language:typescript extension:tsx (matches React hook usage like const [state, send] = useMachine(machine))
Query 2: createMachine language:typescript (matches machine definitions and imports)
Query 3: interpret xstate language:typescript (matches service creation like const service = interpret(machine))

Rationale for each query:

Match actual code patterns (function names, import statements, signatures)
NOT human-friendly search terms or documentation phrases
Different file extensions capture different use cases (tsx for components, ts for utilities)
Specific function/variable names find concrete implementations
Signature patterns match common code structures (arrow functions, function declarations)
Language/extension filters prevent noise from unrelated languages

CHECKPOINT: Verify 3-5 queries generated before proceeding

Step 2: Execute Searches Sequentially

For each query (in order):

cd plugins/knowledge-work/skills/gh-code-search
pnpm search "query text here"

After each search:

Note result count - Too many? Too few?
Check quality indicators - Stars, recency, code structure scores
Identify patterns - Are results converging on specific approaches?
Adapt next query if needed:
- Too many results → Narrow scope (add more filters)
- Too few results → Broaden (remove restrictive filters)
- Found strong pattern → Search for similar variations

Track which queries succeed - Note for summary which strategies were effective

Early stopping: If first 2-3 queries yield 30+ high-quality results, remaining queries optional

Step 3: Aggregate Results

Combine all code files from all searches:

Collect files from each search output with their GitHub URLs
Deduplicate by repository.full_name + file_path
- Keep highest-scored version if duplicates exist
- Note which queries found the same file (indicates strong relevance)
Maintain diversity - Ensure multiple repositories represented (avoid 10 files from one repo)
Track provenance - Remember which query found each file (useful for analysis)
Preserve GitHub URLs - Maintain full GitHub URLs for every file to include in summary

Result: Unified list of 15-30 unique, high-quality code files with GitHub URLs

Step 4: Analyze Patterns

Extract factual patterns across all code files:

Import Analysis:

List most common imports (group by library)
Identify standard library vs third-party dependencies
Note version-specific patterns (e.g., React 18 hooks)

Architectural Patterns:

Functional vs class-based implementations
Custom hooks vs inline logic
Middleware chains vs single-handler
Configuration patterns (env vars, config files, inline)

Code Structure:

Function signatures (parameters, return types)
Error handling approaches (try/catch, error middleware, Result types)
Testing patterns (unit vs integration, mocking strategies)
Documentation styles (JSDoc, inline comments, README)

Language Distribution:

TypeScript vs JavaScript split
Strict mode usage
Type definition patterns

DO NOT editorialize - Extract facts, not opinions. Analysis comes in Step 5.

Step 5: Generate Comprehensive Summary

Output format (exact structure):

# GitHub Code Search: [User Query Topic]

## Search Strategy Executed

Ran [N] targeted queries:
1. `query text` - [brief rationale] → [X results]
2. `query text` - [brief rationale] → [Y results]
3. ...

**Total unique files analyzed:** [N]

---

## Pattern Analysis

### Common Imports
- `library-name` - Used in [N/total] files
- `another-lib` - Used in [M/total] files

### Architectural Styles
- **Functional Programming** - [N] files use pure functions, immutable patterns
- **Object-Oriented** - [M] files use classes, inheritance
- **Hybrid** - [K] files mix both approaches

### Implementation Patterns
- **Pattern 1 Name**: [Description with prevalence]
- **Pattern 2 Name**: [Description with prevalence]

---

## Approaches Found

### Approach 1: [Name]
**Repos:** [repo1], [repo2]
**Characteristics:**
- [Key trait 1]
- [Key trait 2]

**Example:** [repo/file_path:line_number](github_url)
```language
[relevant code snippet - NOT just first 40 lines]

Approach 2: [Name]

[Similar structure]

Trade-offs

Approach	Pros	Cons	Best For
Approach 1	[pros]	[cons]	[use case]
Approach 2	[pros]	[cons]	[use case]

Recommendations

For your use case ([infer from user's context]):

Primary recommendation: [Approach name]
- Why: [Rationale based on analysis]
- Implementation: [High-level steps]
- References: repo/file_path:line_number
Alternative: [Another approach]
- When to use: [Scenarios where this is better]
- References: repo/file_path:line_number

Key Code Sections

[Concept 1]

Source: repo/file_path:line_range

[Specific relevant code - imports, key function, not arbitrary truncation]

Why this matters: [Brief explanation]

[Concept 2]

Source: repo/file_path:line_range

[Specific relevant code]

Why this matters: [Brief explanation]

All GitHub Files Analyzed

IMPORTANT: Include GitHub URLs for ALL meaningful files found across all searches.

List every unique file analyzed (15-30 files), grouped by repository:

[repo-owner/repo-name] ([stars]⭐)

file_path - [language] - [brief description of what this file demonstrates]
file_path - [language] - [brief description]

[repo-owner/repo-name2] ([stars]⭐)

file_path - [language] - [brief description]

Format: Direct GitHub blob URLs with line numbers where relevant (e.g., https://github.com/owner/repo/blob/main/path/file.ts#L10-L50)


**Summary characteristics:**
- **Factual** - Based on extracted data, not assumptions
- **Actionable** - Clear recommendations with reasoning
- **Contextualized** - References specific code locations with line numbers
- **Balanced** - Shows trade-offs, not just "best practice"
- **Comprehensive** - Covers patterns, approaches, trade-offs, recommendations
- **Accessible** - GitHub URLs for ALL meaningful files so users can explore full source code

---

### Step 6: Save Results to File

**After generating comprehensive summary, persist for future reference:**

1. **Generate timestamp:**
   - Invoke `timestamp` skill to get deterministic YYYYMMDDHHMMSS format
   - Example: `20250110143052`

2. **Sanitize query for filename:**
   - Convert user's original query to kebab-case slug
   - Rules: lowercase, spaces → hyphens, remove special chars, max 50 chars
   - Example: "React hooks useState" → "react-hooks-usestate"

3. **Construct file path:**
   - Directory: `docs/research/github/`
   - Format: `<timestamp>-<sanitized-query>.md`
   - Full path: `docs/research/github/20250110143052-react-hooks-usestate.md`

4. **Save using Write tool:**
   - Content: Full comprehensive summary from Step 5
   - Ensures persistence across sessions
   - User can reference past research

5. **Log saved location:**
   - Inform user where file was saved
   - Example: "Research saved to docs/research/github/20250110143052-react-hooks-usestate.md"

**Why save:**
- Comprehensive summaries represent significant analysis work (30-150s of API calls)
- Users may want to reference patterns/trade-offs later
- Builds searchable knowledge base of GitHub research
- Avoids re-running expensive queries for same topics

---

## Error Handling

**Common issues:**

- **Auth errors:** Prompt user to run `gh auth login --web`
- **Rate limits:** Show remaining quota, reset time. If hit during multi-search, stop gracefully with partial results
- **Network failures:** Continue with partial results, note which queries failed
- **No results for query:** Note in summary, adjust subsequent queries to be broader
- **All queries return same files:** Note low diversity, recommend broader initial queries

**Graceful degradation:** Partial results are acceptable. Complete summary based on available data.

---

## Limitations

- GitHub API rate limit: 5,000 req/hr authenticated (multi-search uses more quota)
- Each tool invocation fetches top 10 results only
- Skips files >100KB
- Sequential execution takes longer than single query (10-30s per query)
- Provides factual data, not conclusions (Claude interprets patterns)
- Deduplication assumes exact path matches (renamed files treated as unique)

---

## Typical Execution Time

**Per query:** 10-30 seconds depending on:
- Number of results (100 max)
- File sizes
- Network latency
- API rate limits

**Full workflow (3-5 queries):** 30-150 seconds

**Optimization:** If first queries yield sufficient results, skip remaining queries

---

## Integration Notes

**Example: User asks "find Claude Code skills doing github search"**

```javascript
// 1. Claude analyzes: Skills use SKILL.md with frontmatter, likely in .claude/skills/
// 2. Claude generates queries:
//    - "filename:SKILL.md github search" (matches SKILL.md files with "github search" text)
//    - "octokit.rest.search language:typescript" (matches actual Octokit API usage)
//    - "gh api language:typescript path:skills" (matches gh CLI usage in skills)
// 3. Claude executes sequentially:
cd plugins/knowledge-work/skills/gh-code-search
pnpm search "filename:SKILL.md github search"
// [analyze results]
pnpm search "octokit.rest.search language:typescript"
// [analyze results]
pnpm search "gh api language:typescript path:skills"
// 4. Claude aggregates, deduplicates, analyzes patterns
// 5. Claude generates comprehensive summary with trade-offs and recommendations

Testing

Validate workflow with diverse queries:

Simple: "React hooks" (should generate 3+ queries, combine results)
Complex: "GitHub Actions TypeScript project setup" (requires config + implementation queries)
Ambiguous: "state management" (should generate queries for Redux, Zustand, XState, Context)

Check quality:

Deduplication works (no repeated files in summary)
Diverse repositories (not all from one repo)
Summary includes trade-offs and recommendations
Code snippets are relevant (not arbitrary truncations)

gh-code-search