From fellow
AST-enhanced extractor for data models, entities, and relationships with 80%+ token reduction
npx claudepluginhub joshuarweaver/cascade-code-general-misc-3 --plugin jingnanzhou-fellowsonnetAnalyze the target codebase and extract **data/object models** to understand WHAT exists in this project. **NEW**: Uses AST (Abstract Syntax Tree) extraction for 80-90% token reduction during structural analysis, then Claude semantic analysis for domain understanding. --- Use the `ast_extractor.py` tool to extract structural information: ```bash python3 ${CLAUDE_PLUGIN_ROOT}/tools/ast_extractor...Manages AI prompt library on prompts.chat: search by keyword/tag/category, retrieve/fill variables, save with metadata, AI-improve for structure.
Manages AI Agent Skills on prompts.chat: search by keyword/tag, retrieve skills with files, create multi-file skills (SKILL.md required), add/update/remove files for Claude Code.
Reviews Claude Code skills for structure, description triggering/specificity, content quality, progressive disclosure, and best practices. Provides targeted improvements. Trigger proactively after skill creation/modification.
Analyze the target codebase and extract data/object models to understand WHAT exists in this project.
NEW: Uses AST (Abstract Syntax Tree) extraction for 80-90% token reduction during structural analysis, then Claude semantic analysis for domain understanding.
Use the ast_extractor.py tool to extract structural information:
# Extract entity signatures from target project
python3 ${CLAUDE_PLUGIN_ROOT}/tools/ast_extractor.py ${TARGET_PROJECT} > /tmp/entity_structures.txt
What This Provides:
Example Output:
## File: src/models/user.py
class User(BaseModel):
Location: src/models/user.py:10
Doc: Represents a user account in the system
Attributes:
- id: int
- email: str
- created_at: datetime
Methods:
- validate_email(email: str) -> bool
Doc: Validates email format
- get_by_id(user_id: int) -> Optional[User]
Doc: Retrieves user by ID
Read the AST structure and extract semantic meaning:
For Each Entity, Extract:
Purpose: What does this entity represent in the domain?
Domain Meaning: How does it fit into the business domain?
Relationships:
Constraints & Invariants:
Design Patterns Used:
Input (from AST):
class Order(BaseModel):
Location: src/models/order.py:15
Methods:
- calculate_total() -> Decimal
- add_item(product: Product, quantity: int) -> None
- can_ship() -> bool
Output (Semantic JSON):
{
"name": "Order",
"type": "class",
"purpose": "Represents a customer order with items and total calculation",
"domain_meaning": "Core entity in e-commerce domain, manages order lifecycle",
"attributes": [
{
"name": "items",
"type": "List[OrderItem]",
"purpose": "Collection of items in the order",
"constraints": ["must have at least 1 item to ship"]
},
{
"name": "total",
"type": "Decimal",
"purpose": "Calculated total price of all items",
"constraints": ["must be >= 0", "calculated from items"]
}
],
"relationships": [
{
"type": "has-many",
"target": "OrderItem",
"description": "Order contains multiple items"
},
{
"type": "references",
"target": "Product",
"description": "Each item references a Product"
}
],
"invariants": [
"Order must have at least 1 item",
"Total must equal sum of item prices",
"Cannot ship if total is 0"
],
"patterns": ["Entity", "Aggregate Root"],
"grounding": {
"file": "src/models/order.py",
"line_start": 15,
"line_end": 45
}
}
IMPORTANT: Use the shared filtering utilities to skip non-production code.
# Check if files should be analyzed
python3 ${CLAUDE_PLUGIN_ROOT}/tools/should_analyze.py src/app.py node_modules/lib.js
Or import in Python:
from file_filters import should_exclude_path
if should_exclude_path("node_modules/foo/bar.js"):
# Skip this file
pass
dist, build, node_modules, venv, .git, etc.test, spec, __tests__, etc.# Create todo list
# - Extract entity structures via AST
# - Analyze semantic meaning with Claude
# - Build entity relationships
# - Identify constraints and invariants
# - Generate final JSON output
# Run AST extraction
python3 ${CLAUDE_PLUGIN_ROOT}/tools/ast_extractor.py ${TARGET_PROJECT} > /tmp/structures.txt
# Read the structured output (much smaller than full files!)
cat /tmp/structures.txt
Now analyze the structured output to extract:
Entity Purpose: For each class/entity, determine:
Relationships: Infer from:
add_user(user: User) → uses User)get_orders() -> List[Order] → has Orders)Constraints: Infer from:
validate_* implies validation rules)created_at is likely required, updated_at optional)Patterns: Identify from:
BaseModel, Entity, ValueObject)create() factory method)Analyze cross-entity relationships:
# Example: Finding relationships
# If Order has method: add_item(product: Product, quantity: int)
# → Order has-many OrderItem (inferred)
# → OrderItem references Product
Save to incremental JSON file using the save_json tool:
# CRITICAL: Use incremental saving to handle large knowledge bases
python3 ${CLAUDE_PLUGIN_ROOT}/tools/save_json.py \
--output "${TARGET_PROJECT}/.fellow-data/semantic/factual_knowledge.json" \
--mode start \
--type factual
# Add each entity incrementally
python3 ${CLAUDE_PLUGIN_ROOT}/tools/save_json.py \
--output "${TARGET_PROJECT}/.fellow-data/semantic/factual_knowledge.json" \
--mode add \
--section entities \
--data '{"name": "User", "type": "class", ...}'
# Finalize when done
python3 ${CLAUDE_PLUGIN_ROOT}/tools/save_json.py \
--output "${TARGET_PROJECT}/.fellow-data/semantic/factual_knowledge.json" \
--mode finalize
{
"metadata": {
"extraction_version": "2.0-ast",
"timestamp": "2026-01-13T10:30:00Z",
"target_project": "/path/to/project",
"extractor": "factual-knowledge-extractor-v2-ast",
"method": "ast_extraction + semantic_analysis"
},
"entities": [
{
"name": "User",
"type": "class",
"purpose": "Represents a user account...",
"domain_meaning": "Core entity in authentication...",
"attributes": [...],
"methods": [...],
"relationships": [...],
"invariants": [...],
"patterns": [...],
"grounding": {
"file": "...",
"line_start": 10,
"line_end": 50
}
}
],
"entity_relationships": [
{
"source": "Order",
"target": "User",
"type": "belongs-to",
"description": "Each order belongs to a user",
"multiplicity": "many-to-one"
}
],
"summary": {
"total_entities": 25,
"total_relationships": 42,
"patterns_used": ["Entity", "Value Object", "Repository"],
"key_entities": ["User", "Order", "Product"]
}
}
Python Stdlib Only: Uses built-in ast module (no dependencies)
Language Support:
ast)Incremental Updates: AST extraction works with Fellow's incremental update system
Error Handling: If AST parsing fails (syntax errors), fall back to Claude reading the file
Grounding: All entities include precise source locations from AST
| Aspect | V1 (Traditional) | V2 (AST-Enhanced) |
|---|---|---|
| Token Usage | ~17,000 tokens | ~2,200 tokens |
| Reduction | Baseline | 87% fewer |
| Speed | Baseline | 7-8x faster |
| Accuracy | Good (Claude reads code) | Excellent (AST + Claude) |
| Structure | Inferred by Claude | 100% accurate from AST |
| Semantics | Good | Same (Claude analysis) |
| Dependencies | None | None (stdlib only) |
# Extract factual knowledge with AST enhancement
cd ${TARGET_PROJECT}
# Step 1: AST extraction (fast, token-efficient)
python3 ${CLAUDE_PLUGIN_ROOT}/tools/ast_extractor.py . > /tmp/entities.txt
# Step 2: Claude semantic analysis
# [Agent reads /tmp/entities.txt and extracts domain meaning]
# Step 3: Save to JSON
# [Agent uses save_json.py for incremental output]
# Result: factual_knowledge.json with complete entity information
✅ All classes, functions, and data structures identified ✅ Entity purposes and domain meanings documented ✅ Relationships accurately mapped ✅ Constraints and invariants extracted ✅ Design patterns identified ✅ All entities grounded to source locations ✅ 80%+ token reduction achieved ✅ JSON output validates against schema
After factual extraction completes: