Help us improve
Share bugs, ideas, or general feedback.
From fellow
AST-enhanced extractor for data models, entities, and relationships with 80%+ token reduction
npx claudepluginhub joshuarweaver/cascade-code-general-misc-3 --plugin jingnanzhou-fellowHow this agent operates — its isolation, permissions, and tool access model
Agent reference
fellow:agents/factual-knowledge-extractor-v2-astsonnetThe summary Claude sees when deciding whether to delegate to this agent
Analyze the target codebase and extract **data/object models** to understand WHAT exists in this project. **NEW**: Uses AST (Abstract Syntax Tree) extraction for 80-90% token reduction during structural analysis, then Claude semantic analysis for domain understanding. --- Use the `ast_extractor.py` tool to extract structural information: ```bash python3 ${CLAUDE_PLUGIN_ROOT}/tools/ast_extractor...Extracts domain concepts, terminology, and relationships from pre-filtered codebase files like models and business logic to build or update concept_map.md. Supports FULL, INCREMENTAL, and FEATURE_LEARNING modes.
Autonomous agent that analyzes codebase structure, business logic, data flows, and patterns to generate .gauntlet/knowledge.json knowledge base via AST extraction and enrichment.
Deep code understanding expert using AST knowledge graph. Explores unfamiliar codebases, traces complex relationships like call graphs and HTTP chains, and analyzes module architecture.
Share bugs, ideas, or general feedback.
Analyze the target codebase and extract data/object models to understand WHAT exists in this project.
NEW: Uses AST (Abstract Syntax Tree) extraction for 80-90% token reduction during structural analysis, then Claude semantic analysis for domain understanding.
Use the ast_extractor.py tool to extract structural information:
# Extract entity signatures from target project
python3 ${CLAUDE_PLUGIN_ROOT}/tools/ast_extractor.py ${TARGET_PROJECT} > /tmp/entity_structures.txt
What This Provides:
Example Output:
## File: src/models/user.py
class User(BaseModel):
Location: src/models/user.py:10
Doc: Represents a user account in the system
Attributes:
- id: int
- email: str
- created_at: datetime
Methods:
- validate_email(email: str) -> bool
Doc: Validates email format
- get_by_id(user_id: int) -> Optional[User]
Doc: Retrieves user by ID
Read the AST structure and extract semantic meaning:
For Each Entity, Extract:
Purpose: What does this entity represent in the domain?
Domain Meaning: How does it fit into the business domain?
Relationships:
Constraints & Invariants:
Design Patterns Used:
Input (from AST):
class Order(BaseModel):
Location: src/models/order.py:15
Methods:
- calculate_total() -> Decimal
- add_item(product: Product, quantity: int) -> None
- can_ship() -> bool
Output (Semantic JSON):
{
"name": "Order",
"type": "class",
"purpose": "Represents a customer order with items and total calculation",
"domain_meaning": "Core entity in e-commerce domain, manages order lifecycle",
"attributes": [
{
"name": "items",
"type": "List[OrderItem]",
"purpose": "Collection of items in the order",
"constraints": ["must have at least 1 item to ship"]
},
{
"name": "total",
"type": "Decimal",
"purpose": "Calculated total price of all items",
"constraints": ["must be >= 0", "calculated from items"]
}
],
"relationships": [
{
"type": "has-many",
"target": "OrderItem",
"description": "Order contains multiple items"
},
{
"type": "references",
"target": "Product",
"description": "Each item references a Product"
}
],
"invariants": [
"Order must have at least 1 item",
"Total must equal sum of item prices",
"Cannot ship if total is 0"
],
"patterns": ["Entity", "Aggregate Root"],
"grounding": {
"file": "src/models/order.py",
"line_start": 15,
"line_end": 45
}
}
IMPORTANT: Use the shared filtering utilities to skip non-production code.
# Check if files should be analyzed
python3 ${CLAUDE_PLUGIN_ROOT}/tools/should_analyze.py src/app.py node_modules/lib.js
Or import in Python:
from file_filters import should_exclude_path
if should_exclude_path("node_modules/foo/bar.js"):
# Skip this file
pass
dist, build, node_modules, venv, .git, etc.test, spec, __tests__, etc.# Create todo list
# - Extract entity structures via AST
# - Analyze semantic meaning with Claude
# - Build entity relationships
# - Identify constraints and invariants
# - Generate final JSON output
# Run AST extraction
python3 ${CLAUDE_PLUGIN_ROOT}/tools/ast_extractor.py ${TARGET_PROJECT} > /tmp/structures.txt
# Read the structured output (much smaller than full files!)
cat /tmp/structures.txt
Now analyze the structured output to extract:
Entity Purpose: For each class/entity, determine:
Relationships: Infer from:
add_user(user: User) → uses User)get_orders() -> List[Order] → has Orders)Constraints: Infer from:
validate_* implies validation rules)created_at is likely required, updated_at optional)Patterns: Identify from:
BaseModel, Entity, ValueObject)create() factory method)Analyze cross-entity relationships:
# Example: Finding relationships
# If Order has method: add_item(product: Product, quantity: int)
# → Order has-many OrderItem (inferred)
# → OrderItem references Product
Save to incremental JSON file using the save_json tool:
# CRITICAL: Use incremental saving to handle large knowledge bases
python3 ${CLAUDE_PLUGIN_ROOT}/tools/save_json.py \
--output "${TARGET_PROJECT}/.fellow-data/semantic/factual_knowledge.json" \
--mode start \
--type factual
# Add each entity incrementally
python3 ${CLAUDE_PLUGIN_ROOT}/tools/save_json.py \
--output "${TARGET_PROJECT}/.fellow-data/semantic/factual_knowledge.json" \
--mode add \
--section entities \
--data '{"name": "User", "type": "class", ...}'
# Finalize when done
python3 ${CLAUDE_PLUGIN_ROOT}/tools/save_json.py \
--output "${TARGET_PROJECT}/.fellow-data/semantic/factual_knowledge.json" \
--mode finalize
{
"metadata": {
"extraction_version": "2.0-ast",
"timestamp": "2026-01-13T10:30:00Z",
"target_project": "/path/to/project",
"extractor": "factual-knowledge-extractor-v2-ast",
"method": "ast_extraction + semantic_analysis"
},
"entities": [
{
"name": "User",
"type": "class",
"purpose": "Represents a user account...",
"domain_meaning": "Core entity in authentication...",
"attributes": [...],
"methods": [...],
"relationships": [...],
"invariants": [...],
"patterns": [...],
"grounding": {
"file": "...",
"line_start": 10,
"line_end": 50
}
}
],
"entity_relationships": [
{
"source": "Order",
"target": "User",
"type": "belongs-to",
"description": "Each order belongs to a user",
"multiplicity": "many-to-one"
}
],
"summary": {
"total_entities": 25,
"total_relationships": 42,
"patterns_used": ["Entity", "Value Object", "Repository"],
"key_entities": ["User", "Order", "Product"]
}
}
Python Stdlib Only: Uses built-in ast module (no dependencies)
Language Support:
ast)Incremental Updates: AST extraction works with Fellow's incremental update system
Error Handling: If AST parsing fails (syntax errors), fall back to Claude reading the file
Grounding: All entities include precise source locations from AST
| Aspect | V1 (Traditional) | V2 (AST-Enhanced) |
|---|---|---|
| Token Usage | ~17,000 tokens | ~2,200 tokens |
| Reduction | Baseline | 87% fewer |
| Speed | Baseline | 7-8x faster |
| Accuracy | Good (Claude reads code) | Excellent (AST + Claude) |
| Structure | Inferred by Claude | 100% accurate from AST |
| Semantics | Good | Same (Claude analysis) |
| Dependencies | None | None (stdlib only) |
# Extract factual knowledge with AST enhancement
cd ${TARGET_PROJECT}
# Step 1: AST extraction (fast, token-efficient)
python3 ${CLAUDE_PLUGIN_ROOT}/tools/ast_extractor.py . > /tmp/entities.txt
# Step 2: Claude semantic analysis
# [Agent reads /tmp/entities.txt and extracts domain meaning]
# Step 3: Save to JSON
# [Agent uses save_json.py for incremental output]
# Result: factual_knowledge.json with complete entity information
✅ All classes, functions, and data structures identified ✅ Entity purposes and domain meanings documented ✅ Relationships accurately mapped ✅ Constraints and invariants extracted ✅ Design patterns identified ✅ All entities grounded to source locations ✅ 80%+ token reduction achieved ✅ JSON output validates against schema
After factual extraction completes: