From cocosearch
Guides adding support for new programming languages or config formats to CocoSearch via workflows for handlers, symbol extraction, context expansion, and registration checklists.
npx claudepluginhub violetcranberry/coco-search --plugin cocosearchThis skill uses the workspace's default tool permissions.
A structured workflow for adding language support to CocoSearch. Navigates up to 6 independent paths (handler, symbol extraction, grammar, context expansion, dependency extraction, documentation) and ensures every registration point is covered.
Guides adding grammar handlers to CocoSearch for domain-specific formats like GitHub Actions in YAML, covering YamlGrammarBase inheritance, content validation, metadata extraction, tests, and registration.
Performs semantic code search across the codebase using ccc CLI, manages initialization, indexing freshness, path/language filtering, and pagination.
Performs semantic search, grep, artifact fetching, relationship tracing, and data source listing across CodeAlive-indexed codebases and docs.
Share bugs, ideas, or general feedback.
A structured workflow for adding language support to CocoSearch. Navigates up to 6 independent paths (handler, symbol extraction, grammar, context expansion, dependency extraction, documentation) and ensures every registration point is covered.
Philosophy: The most common failure when adding language support is missing a registration step. This skill makes that impossible by tracking every step explicitly.
Reference: docs/adding-languages.md is the authoritative technical guide. This skill wraps it in an interactive workflow.
Parse the user's request to determine what's being added.
Extract from the request:
.kt, .kts)Confirm with user: "I'll add support for [language] with extensions [list]. Let me determine which paths apply."
Check two things to decide which of the 5 paths (A-E) apply:
CocoIndex's SplitRecursively has built-in Tree-sitter chunking for ~28 languages. Search for the language mapping:
search_code(
query="LANGUAGE_EXTENSIONS supported languages",
use_hybrid_search=True,
smart_context=True
)
Also check the CocoIndex docs: if the language is in the built-in list, chunking works automatically -- no handler (Path A) needed.
If the user wants symbol extraction (Path B) or context expansion (Path E), the language must be in tree-sitter-language-pack:
uv run python -c "from tree_sitter_language_pack import SupportedLanguage; print(sorted(SupportedLanguage.__args__))"
Verify a specific language:
uv run python -c "from tree_sitter_language_pack import get_parser; p = get_parser('<language>'); print(p)"
Present the applicable paths:
| Path | When to Use | Applies? |
|---|---|---|
| A: Language Handler | Language NOT in CocoIndex's built-in list -- needs custom chunking | ? |
| B: Symbol Extraction | Language IS in tree-sitter-language-pack -- enables --symbol-type/--symbol-name filtering | ? |
| C: Both A + B | Not built-in for chunking but has tree-sitter support | ? |
| D: Grammar Handler | Domain-specific schema sharing a base language (e.g., Ansible = YAML) | ? |
| E: Context Expansion | Language IS in tree-sitter-language-pack -- enables smart_context=True boundary expansion | ? |
| F: Dependency Extractor | Language has import/require/reference patterns -- enables deps tree, deps impact, and get_file_dependencies/get_file_impact MCP tools. Use /cocosearch:cocosearch-add-extractor for dedicated guidance. | ? |
Present to user: "Based on my checks, here are the paths that apply: [list]. Ready to proceed?"
Skip this step if the language is in CocoIndex's built-in list (no custom chunking needed).
Choose the closest existing handler based on language type:
| Language Type | Analog Handler | Why |
|---|---|---|
| Config format (key-value, blocks) | hcl.py | Block-based structure with labels |
| Template language | gotmpl.py | Template directives + content |
| Script / shell language | bash.py | Function definitions + commands |
| Containerization / CI | dockerfile.py | Directive-based, sequential |
| JVM / compiled language | scala.py or groovy.py | OOP with classes, methods, imports |
Search for the analog:
search_code(
query="<analog-language> handler EXTENSIONS SEPARATOR_SPEC",
symbol_type="class",
use_hybrid_search=True,
smart_context=True
)
Read the analog handler fully before proceeding.
Copy from the template and implement:
src/cocosearch/handlers/<language>.py (copy _template.py)EXTENSIONS to all file extensions (with leading dot)SEPARATOR_SPEC with CustomLanguageConfig -- hierarchical regex separators from coarsest to finestextract_metadata() returning block_type, hierarchy, and language_idThe handler is autodiscovered at import time; no registration code needed.
language_id to _SKIP_PARSE_EXTENSIONS in src/cocosearch/indexer/parse_tracking.py. This prevents false no_grammar reports in parse tracking stats.Extensions are auto-derived from the handler's EXTENSIONS attribute via _default_include_patterns() in src/cocosearch/indexer/config.py. No manual config.py edits are needed.
EXTENSIONS list you set in step 3b (e.g., [".hcl", ".tf"]) is automatically converted to glob patterns (e.g., "*.hcl", "*.tf") and merged into include_patternsDockerfile, Containerfile), define an INCLUDE_PATTERNS class attribute on the handler (e.g., INCLUDE_PATTERNS = ["Dockerfile", "Dockerfile.*", "Containerfile"]) — these are also picked up automaticallyCheck if the language name needs a display override in cli.py:
search_code(
query="display_names languages_command",
symbol_name="languages_command",
use_hybrid_search=True,
smart_context=True
)
Add to the display_names dict only if .title() casing is wrong (e.g., "hcl": "HCL", "gotmpl": "Go Template").
Find the analog's test file for the pattern:
search_code(
query="test <analog-language> handler EXTENSIONS SEPARATOR_SPEC",
symbol_type="class",
use_hybrid_search=True,
smart_context=True
)
Create tests/unit/handlers/test_<language>.py covering:
Checkpoint with user: "Handler created at src/cocosearch/handlers/<language>.py with [N] extensions and [N] separator levels. Tests pass. Ready for the next path?"
Skip this step if the language is NOT in
tree-sitter-language-pack.
Choose based on language similarity:
| Language Type | Analog Query | Why |
|---|---|---|
| Python-like (indent-based) | python.scm | function/class definitions |
| C-like (braces) | go.scm or java.scm | declaration patterns |
| Config (blocks with labels) | hcl.scm | block-based structures |
| Functional | rust.scm | items, traits, impls |
Search for the analog:
search_code(
query="<analog-language> tree-sitter query definition function class",
use_hybrid_search=True,
smart_context=True
)
Read the analog .scm file to understand the capture patterns.
Before writing the query, explore the language's tree-sitter AST to find the correct node types:
uv run python -c "
from tree_sitter_language_pack import get_parser
parser = get_parser('<language>')
tree = parser.parse(b'''<sample-code>''')
def show(node, indent=0):
print(' ' * indent + f'{node.type} [{node.start_point[0]}:{node.start_point[1]}-{node.end_point[0]}:{node.end_point[1]}]')
for child in node.children:
show(child, indent + 2)
show(tree.root_node)
"
Identify the node types for functions, classes, methods, interfaces, etc.
Create src/cocosearch/indexer/queries/<language>.scm with S-expression patterns:
@definition.<type> captures for symbol types (function, class, method, interface)@name for symbol name capturesAdd extension-to-language mappings:
search_code(
query="LANGUAGE_MAP extension mapping",
symbol_name="LANGUAGE_MAP",
use_hybrid_search=True,
smart_context=True
)
Add entries to LANGUAGE_MAP in src/cocosearch/indexer/symbols.py:
"ext": "language_name",
Add the language to the symbol-aware set:
search_code(
query="SYMBOL_AWARE_LANGUAGES",
use_hybrid_search=True,
smart_context=True
)
Add the language name to SYMBOL_AWARE_LANGUAGES in src/cocosearch/search/query.py.
Check if the language introduces new AST node types that need mapping to standard types:
search_code(
query="_map_symbol_type node type mapping",
symbol_name="_map_symbol_type",
use_hybrid_search=True,
smart_context=True
)
Add mappings in _map_symbol_type if the language uses non-standard node names for standard concepts (e.g., HCL uses "block" for what maps to "class").
Check if the language needs special qualified name logic:
search_code(
query="_build_qualified_name qualified name",
symbol_name="_build_qualified_name",
use_hybrid_search=True,
smart_context=True
)
Add language-specific logic to _build_qualified_name in symbols.py if the language has special naming patterns (e.g., Go receiver methods, HCL block labels).
Find the analog's test file:
search_code(
query="test <analog-language> symbol extraction",
symbol_type="class",
use_hybrid_search=True,
smart_context=True
)
Create tests/unit/indexer/symbols/test_<language>.py covering:
Checkpoint with user: "Symbol extraction configured for [language] with [N] query patterns. Tests pass. Ready for the next path?"
Skip this step unless the language is a domain-specific schema sharing a base language extension.
For grammar handler implementation, use the dedicated skill:
Invoke: /cocosearch:cocosearch-add-grammar
This skill provides in-depth guidance for matches() design, separator spec, metadata extraction, conflict avoidance, and grammar-specific testing.
After completing the grammar skill, return here for Step 7 (count assertions) and Step 8 (documentation).
Skip this step unless the language is in
tree-sitter-language-packAND context expansion is desired.
Explore the AST (same technique as Step 4b) to find which node types represent function/class definitions.
search_code(
query="DEFINITION_NODE_TYPES context expansion node types",
use_hybrid_search=True,
smart_context=True
)
Add the language entry to DEFINITION_NODE_TYPES in src/cocosearch/search/context_expander.py:
"<language>": {"<function_node_type>", "<class_node_type>"},
Add file extension mappings to EXTENSION_TO_LANGUAGE in the same file:
".<ext>": "<language>",
CONTEXT_EXPANSION_LANGUAGES updates automatically -- it's derived from DEFINITION_NODE_TYPES.keys().
The CONTEXT_EXPANSION_LANGUAGES set is exported and referenced in search docs. Update any docs listing supported context expansion languages.
Checkpoint with user: "Context expansion added for [language]. smart_context=True will now expand to [node types] boundaries."
Skip this step unless the language has import/require/reference patterns that can be extracted for dependency analysis.
For dependency extractor implementation, use the dedicated skill:
Invoke: /cocosearch:cocosearch-add-extractor
This skill provides in-depth guidance for pre-checks, analog selection, extractor implementation, optional module resolver, tests, and registration.
After completing the extractor skill, return here for Step 7 (count assertions) and Step 8 (documentation).
Checkpoint with user: "Dependency extractor added for [language] with [N] import patterns. Tests pass. Ready for count assertions?"
This is the most commonly missed step. Do not skip.
search_code(
query="test registry handler count _HANDLER_REGISTRY",
use_hybrid_search=True,
smart_context=True
)
Update in tests/unit/handlers/test_registry.py:
len(_HANDLER_REGISTRY) >= N -- increment by number of new extensionslen(specs) == N -- increment by 1 (one CustomLanguageConfig per handler)search_code(
query="test grammar registry count _GRAMMAR_REGISTRY",
use_hybrid_search=True,
smart_context=True
)
Update in tests/unit/handlers/test_grammar_registry.py:
len(_GRAMMAR_REGISTRY) == N -- increment by 1len(grammars) == N -- increment by 1Both test_registry.py and test_grammar_registry.py assert len(specs) == N from get_all_custom_language_specs(). This is the combined total of all language handler specs + grammar handler specs. Increment by 1 for each new handler or grammar added.
Update module descriptions and counts:
search/ module descriptionsearch_code(
query="Supported Languages README badges",
use_hybrid_search=True,
smart_context=True
)
Update:
If the new language introduces a new pattern worth documenting, add it as a worked example (like the HCL example in Path C).
# Handler tests (if Path A)
uv run pytest tests/unit/handlers/test_<language>.py -v
# Symbol extraction tests (if Path B)
uv run pytest tests/unit/indexer/symbols/test_<language>.py -v
# Grammar tests (if Path D)
uv run pytest tests/unit/handlers/grammars/test_<grammar>.py -v
# Dependency extractor tests (if Path F)
uv run pytest tests/unit/deps/extractors/test_<language>.py -v
uv run pytest tests/unit/deps/test_resolver.py -v
# Registry count assertions
uv run pytest tests/unit/handlers/test_registry.py -v
uv run pytest tests/unit/handlers/test_grammar_registry.py -v
# Full handler test suite
uv run pytest tests/unit/handlers/ -v
uv run ruff check src/ tests/
uv run ruff format --check src/ tests/
Language support added for [language]!
Paths completed:
[x] Path A: Language Handler -- src/cocosearch/handlers/<language>.py
[x] Path B: Symbol Extraction -- src/cocosearch/indexer/queries/<language>.scm
[ ] Path D: Grammar Handler -- not applicable
[x] Path E: Context Expansion -- added to context_expander.py
[x] Path F: Dependency Extractor -- src/cocosearch/deps/extractors/<language>.py
Registration points:
[x] Handler file created (autodiscovered)
[x] EXTENSIONS auto-derived into include patterns
[x] LANGUAGE_MAP entries (symbols.py)
[x] Query file created (queries/<language>.scm)
[x] SYMBOL_AWARE_LANGUAGES updated (query.py)
[x] DEFINITION_NODE_TYPES updated (context_expander.py)
[x] EXTENSION_TO_LANGUAGE updated (context_expander.py)
[x] Test count assertions updated
[x] Documentation updated
Tests: PASS
Lint: PASS
To try it out:
uv run cocosearch languages # Verify language appears
uv run cocosearch index . # Reindex with new language support
uv run cocosearch search "query" --language <language>
Complete checklist of all registration points. Check off each one as you complete it:
Language Handler (Path A):
src/cocosearch/handlers/<language>.py createdEXTENSIONS attribute defined (auto-derived into include patterns)INCLUDE_PATTERNS attribute defined (if non-extension patterns needed, e.g., Dockerfile)_SKIP_PARSE_EXTENSIONS updated in src/cocosearch/indexer/parse_tracking.py (if no tree-sitter grammar)cli.py languages_command (if .title() casing is wrong)tests/unit/handlers/test_<language>.py createdSymbol Extraction (Path B):
src/cocosearch/indexer/queries/<language>.scm createdLANGUAGE_MAP in src/cocosearch/indexer/symbols.pySYMBOL_AWARE_LANGUAGES in src/cocosearch/search/query.py_map_symbol_type updated (if new AST node types need mapping)_build_qualified_name updated (if special naming logic needed)tests/unit/indexer/symbols/test_<language>.py createdGrammar Handler (Path D):
src/cocosearch/handlers/grammars/<grammar>.py createdtests/unit/handlers/grammars/test_<grammar>.py createdContext Expansion (Path E):
DEFINITION_NODE_TYPES updated in src/cocosearch/search/context_expander.pyEXTENSION_TO_LANGUAGE updated in src/cocosearch/search/context_expander.pyDependency Extractor (Path F):
src/cocosearch/deps/extractors/<language>.py created (autodiscovered)LANGUAGES set matches the language IDs from handler/grammarsrc/cocosearch/deps/resolver.py (if import resolution needed)_RESOLVERS dict (if added)tests/unit/deps/extractors/test_<language>.py createdtests/unit/deps/test_resolver.py (if resolver added)Count Assertions:
tests/unit/handlers/test_registry.py -- handler count and spec count updatedtests/unit/handlers/test_grammar_registry.py -- grammar count and spec count updatedDocumentation:
CLAUDE.md -- module descriptions and counts updatedREADME.md -- supported languages section updateddocs/adding-languages.md -- new example added (if novel pattern)For common search tips (hybrid search, smart_context, symbol filtering), see skills/README.md.
For installation instructions, see skills/README.md.