From cocosearch
Guides adding dependency extractors for languages/grammars in CocoSearch via pre-checks, implementation, module resolvers, tests, and registration. Enables deps tree, impact analysis, and enriched search.
npx claudepluginhub violetcranberry/coco-search --plugin cocosearchThis skill uses the workspace's default tool permissions.
A structured workflow for adding a dependency extractor to CocoSearch. Dependency extractors parse files to build a graph of what-depends-on-what, enabling `deps tree`, `deps impact`, `get_file_dependencies`/`get_file_impact` MCP tools, and `include_deps=True` in search results.
Guides adding support for new programming languages or config formats to CocoSearch via workflows for handlers, symbol extraction, context expansion, and registration checklists.
Builds persistent knowledge graph of codebase using Tree-sitter and SQLite, enabling Claude to query blast radius of changes and reduce tokens up to 49x on code reviews and tasks.
AST-based code analysis using tree-sitter. Use for parsing code structure, extracting symbols, finding patterns with tree-sitter queries, analyzing complexity, and understanding code architecture. Supports Python, JavaScript, TypeScript, Go, Rust, C, C++, Swift, Java, Kotlin, Julia, and more.
Share bugs, ideas, or general feedback.
A structured workflow for adding a dependency extractor to CocoSearch. Dependency extractors parse files to build a graph of what-depends-on-what, enabling deps tree, deps impact, get_file_dependencies/get_file_impact MCP tools, and include_deps=True in search results.
Philosophy: Extractors are autodiscovered and lightweight -- they parse one file at a time and emit edges. Module resolvers are optional and translate raw import strings into file paths. This skill guides you through both, with pre-checks that prevent wasted work.
Reference: The existing 8 extractors in src/cocosearch/deps/extractors/ and 4 resolvers in src/cocosearch/deps/resolver.py serve as patterns.
Before writing any code, verify the extractor is viable.
search_code(
query="dependency extractor LANGUAGES",
symbol_type="class",
use_hybrid_search=True,
smart_context=True
)
Or check the registry directly:
uv run python -c "from cocosearch.deps.registry import get_all_extractor_language_ids; print(sorted(get_all_extractor_language_ids()))"
Currently registered: cjs, cts, docker-compose, github-actions, go, helm-template, helm-values, js, jsx, mjs, mts, py, terraform, ts, tsx.
If the target language_id is already listed, stop. The extractor exists. Inform the user.
The extractor's LANGUAGES set must match the language_id assigned during indexing:
| Source | language_id comes from | Example |
|---|---|---|
| Standard language | File extension without dot | .py -> py, .go -> go |
| Language handler | handler.SEPARATOR_SPEC.language_name | hcl, dockerfile, bash |
| Grammar handler | handler.GRAMMAR_NAME | docker-compose, github-actions, terraform |
Verify the language_id exists in the system:
search_code(
query="LANGUAGE_EXTENSIONS EXTENSIONS language_name GRAMMAR_NAME",
use_hybrid_search=True,
smart_context=True
)
If the language_id doesn't exist yet, the user needs to add language/grammar support first. Suggest using /cocosearch:cocosearch-add-language or /cocosearch:cocosearch-add-grammar.
Determine what dependency patterns the language has:
| Pattern Type | Edge Type | Examples |
|---|---|---|
| Import statements | DepType.IMPORT | Python import, JS require(), Go import |
| Symbol calls | DepType.CALL | Direct function calls across files |
| Reference patterns | DepType.REFERENCE | Docker Compose image:, GitHub Actions uses:, Terraform source |
If the language has no recognizable import or reference patterns, stop. Not all languages benefit from dependency extraction.
Resolvers translate raw import strings (e.g., cocosearch.deps.models) into file paths (e.g., src/cocosearch/deps/models.py). They run after extraction to resolve target_file from metadata["module"].
| Language Type | Resolver Needed? | Why |
|---|---|---|
| Code with imports (Python, JS, Go) | Yes | Import strings need path resolution |
| Config with external refs (Docker Compose, GitHub Actions) | No | References point to external resources, not local files |
| IaC with local modules (Terraform) | Yes | Local source = "./modules/..." needs resolution |
| Templates with includes (Helm) | No | Template includes are path-relative |
Currently implemented resolvers: PythonResolver, JavaScriptResolver, GoResolver, TerraformResolver.
Confirm with user: "I'll add a dependency extractor for [language] (language_id: [id]) extracting [pattern types]. [Will/Won't] add a module resolver. Ready to proceed?"
Choose based on the extraction technique:
| Extraction Technique | Analog Extractor | Key Pattern |
|---|---|---|
| Tree-sitter AST parsing | go.py | Parse with get_parser(), walk nodes |
| Regex on source code | python.py | Line-by-line regex matching |
| Multi-style imports (ES6 + CommonJS) | javascript.py | Multiple regex patterns + re-exports |
| YAML parsing with refs | docker_compose.py | yaml.safe_load(), walk dict structure |
| Regex + YAML hybrid | helm.py | Different parse strategy per language_id |
| HCL block parsing | terraform.py | Tree-sitter for module blocks |
Search for and read the analog:
search_code(
query="<analog> dependency extractor extract LANGUAGES",
symbol_type="class",
use_hybrid_search=True,
smart_context=True
)
Read the analog extractor fully before proceeding.
Copy src/cocosearch/deps/extractors/_template.py to <language>.py.
from cocosearch.deps.models import DependencyEdge, DepType
class <Language>Extractor:
"""Extractor for <language> dependency edges."""
LANGUAGES: set[str] = {"<language_id>"}
def extract(self, file_path: str, content: str) -> list[DependencyEdge]:
edges: list[DependencyEdge] = []
# ... parse content, emit edges ...
return edges
Key implementation rules:
LANGUAGES must contain the exact language_id(s) from Step 1bfile_path is the relative path within the project (use as source_file)target_file should be None for unresolved imports (resolver fills it in later) or for external dependenciesmetadata["module"] must contain the raw import string (resolvers use this)metadata["line"] should contain the line number (1-indexed) for diagnosticsDepType.IMPORT for code imports, DepType.REFERENCE for grammar-level refs (with metadata["kind"] for specifics like "image", "action", "module_source")For tree-sitter languages (preferred when available):
from tree_sitter_language_pack import get_parser
parser = get_parser("<language>")
tree = parser.parse(content.encode())
# Walk tree.root_node to find import nodes
Explore the AST first:
uv run python -c "
from tree_sitter_language_pack import get_parser
parser = get_parser('<language>')
tree = parser.parse(b'''<sample-code-with-imports>''')
def show(node, indent=0):
print(' ' * indent + f'{node.type} [{node.start_point[0]}:{node.start_point[1]}]')
for child in node.children:
show(child, indent + 2)
show(tree.root_node)
"
For regex-based extraction (simpler languages):
import re
_IMPORT_RE = re.compile(r'^import\s+(.+)$', re.MULTILINE)
for match in _IMPORT_RE.finditer(content):
line = content[:match.start()].count('\n') + 1
edges.append(DependencyEdge(
source_file=file_path,
source_symbol=None,
target_file=None,
target_symbol=None,
dep_type=DepType.IMPORT,
metadata={"module": match.group(1), "line": line},
))
For YAML-based grammars:
import yaml
try:
data = yaml.safe_load(content)
except yaml.YAMLError:
return []
# Walk data structure to find references
The extractor is autodiscovered at import time -- no registration code needed.
Skip this step if pre-check 1d determined no resolver is needed.
search_code(
query="ModuleResolver protocol build_index resolve",
symbol_type="class",
use_hybrid_search=True,
smart_context=True
)
Read src/cocosearch/deps/resolver.py fully.
Add a new resolver class in src/cocosearch/deps/resolver.py:
class <Language>Resolver:
"""Resolve <language> import paths to file paths."""
def build_index(self, indexed_files: list[tuple[str, str]]) -> dict[str, str]:
"""Build module-name-to-file-path mapping."""
index: dict[str, str] = {}
for rel_path, lang_id in indexed_files:
if lang_id not in self.LANGUAGES:
continue
# Map module identifier -> relative path
# ...
return index
def resolve(self, edge: DependencyEdge, module_index: dict[str, str]) -> str | None:
"""Resolve an import to a file path."""
module = edge.metadata.get("module", "")
return module_index.get(module)
Add the resolver to _build_resolver_registry() in resolver.py:
<lang> = <Language>Resolver()
registry["<language_id>"] = <lang>
# If multiple language_ids share the same resolver:
# for lang_id in ("<id1>", "<id2>"):
# registry[lang_id] = <lang>
Create tests/unit/deps/extractors/test_<language>.py.
Follow the pattern from existing tests (e.g., test_go.py, test_python.py):
from cocosearch.deps.extractors.<language> import <Language>Extractor
from cocosearch.deps.models import DepType
def _extract(code: str, file_path: str = "<default/path>"):
"""Helper to extract edges from <language> code."""
extractor = <Language>Extractor()
return extractor.extract(file_path, code)
Test categories:
| Category | What to Test |
|---|---|
| Basic imports | Single import, multiple imports, aliased imports |
| Import variations | Language-specific import styles (e.g., ES6 vs CommonJS, relative vs absolute) |
| Edge metadata | module, line, alias, kind fields are correct |
| Edge types | Correct dep_type (IMPORT, REFERENCE, CALL) |
| Source file | source_file matches the file_path argument |
| Edge cases | Empty file, no imports, comments containing imports, malformed imports |
| LANGUAGES set | assert <Language>Extractor.LANGUAGES == {"<expected_ids>"} |
Add tests to tests/unit/deps/test_resolver.py:
class Test<Language>Resolver:
def test_build_index_maps_files(self):
resolver = <Language>Resolver()
index = resolver.build_index([
("src/module.ext", "<lang_id>"),
("lib/other.ext", "<lang_id>"),
])
assert "expected_key" in index
def test_resolve_internal_import(self):
resolver = <Language>Resolver()
index = {"module_name": "src/module.ext"}
edge = DependencyEdge(
source_file="src/main.ext",
source_symbol=None,
target_file=None,
target_symbol=None,
dep_type=DepType.IMPORT,
metadata={"module": "module_name"},
)
assert resolver.resolve(edge, index) == "src/module.ext"
def test_resolve_external_returns_none(self):
resolver = <Language>Resolver()
edge = DependencyEdge(
source_file="src/main.ext",
source_symbol=None,
target_file=None,
target_symbol=None,
dep_type=DepType.IMPORT,
metadata={"module": "external_package"},
)
assert resolver.resolve(edge, {}) is None
Checkpoint with user: "Extractor and tests created. [N] test classes, [N] tests. Ready for documentation updates?"
Update the deps/ module description:
search_code(
query="deps dependency extractor autodiscovery registry",
use_hybrid_search=True,
smart_context=True
)
Update:
deps/ descriptionUpdate the languages and grammars tables -- the "Deps" column should show checkmark for the new language/grammar.
If the extractor introduces a novel pattern, update docs/architecture.md or docs/how-it-works.md.
# Extractor tests
uv run pytest tests/unit/deps/extractors/test_<language>.py -v
# Resolver tests (if added)
uv run pytest tests/unit/deps/test_resolver.py -v
# Registry smoke test (ensure autodiscovery works)
uv run python -c "from cocosearch.deps.registry import get_all_extractor_language_ids; print(sorted(get_all_extractor_language_ids()))"
# CLI shows Deps column correctly
uv run cocosearch languages --json | python -c "import json,sys; langs=json.load(sys.stdin); [print(f'{l[\"name\"]}: deps={l[\"deps\"]}') for l in langs if l['deps']]"
uv run ruff check src/cocosearch/deps/ tests/unit/deps/
uv run ruff format --check src/cocosearch/deps/ tests/unit/deps/
Dependency extractor added for [language]!
Extractor: src/cocosearch/deps/extractors/<language>.py
- Language IDs: <ids>
- Edge types: <import/reference/call>
- Parsing: <tree-sitter/regex/yaml>
- Resolver: <yes, in resolver.py / no, not needed>
Registration points:
[x] Extractor file created (autodiscovered)
[x] LANGUAGES matches language_id(s) from indexer
[x] Module resolver added (if applicable)
[x] Resolver registered in _build_resolver_registry() (if applicable)
[x] Tests created
[x] CLAUDE.md updated (extractor count, language list)
[x] README.md Deps column updated
Tests: PASS
Lint: PASS
To try it out:
uv run cocosearch index . --deps # Index + extract dependencies
uv run cocosearch deps show <file> # Check dependencies for a file
uv run cocosearch deps tree <file> # Forward dependency tree
uv run cocosearch deps impact <file> # Reverse impact tree
Complete checklist of all registration points:
Extractor:
src/cocosearch/deps/extractors/<language>.py createdLANGUAGES set matches language_id(s) from handler/grammar/extensionextract() returns edges with correct dep_type and metadatatests/unit/deps/extractors/test_<language>.py createdModule Resolver (if needed):
src/cocosearch/deps/resolver.py_build_resolver_registry()tests/unit/deps/test_resolver.pyDocumentation:
CLAUDE.md -- extractor count, language list, resolver list updatedREADME.md -- Deps column in languages/grammars table updatedFor language handler support, use /cocosearch:cocosearch-add-language.
For grammar handler support, use /cocosearch:cocosearch-add-grammar.