Skill

cocosearch-add-language

Guides adding support for new programming languages or config formats to CocoSearch via workflows for handlers, symbol extraction, context expansion, and registration checklists.

developer-tools

npx claudepluginhub violetcranberry/coco-search --plugin cocosearch

Tool Access

This skill uses the workspace's default tool permissions.

Preview

A structured workflow for adding language support to CocoSearch. Navigates up to 6 independent paths (handler, symbol extraction, grammar, context expansion, dependency extraction, documentation) and ensures every registration point is covered.

SKILL.md

Similar Skills

cocosearch-add-grammar

Guides adding grammar handlers to CocoSearch for domain-specific formats like GitHub Actions in YAML, covering YamlGrammarBase inheritance, content validation, metadata extraction, tests, and registration.

cocosearch

ccc

1.5k

Performs semantic code search across the codebase using ccc CLI, manages initialization, indexing freshness, path/language filtering, and pagination.

2 files

cocoindex-io-cocoindex-code

codealive-context-engine

Performs semantic search, grep, artifact fetching, relationship tracing, and data source listing across CodeAlive-indexed codebases and docs.

11 files

codealive

Stats

Stars25

Forks4

Last CommitApr 30, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Add Language Support with CocoSearch

Philosophy: The most common failure when adding language support is missing a registration step. This skill makes that impossible by tracking every step explicitly.

Reference: docs/adding-languages.md is the authoritative technical guide. This skill wraps it in an interactive workflow.

Step 1: Identify the Language

Parse the user's request to determine what's being added.

Extract from the request:

Language name: The language or format (e.g., "Kotlin", "Ansible", "Makefile")
File extensions: What extensions it uses (e.g., .kt, .kts)
Desired capabilities: Which paths the user wants (chunking, symbol filtering, context expansion, or all)

Confirm with user: "I'll add support for [language] with extensions [list]. Let me determine which paths apply."

Step 2: Determine Applicable Paths

Check two things to decide which of the 5 paths (A-E) apply:

2a. Check CocoIndex's Built-in Language List

CocoIndex's SplitRecursively has built-in Tree-sitter chunking for ~28 languages. Search for the language mapping:

search_code(
    query="LANGUAGE_EXTENSIONS supported languages",
    use_hybrid_search=True,
    smart_context=True
)

Also check the CocoIndex docs: if the language is in the built-in list, chunking works automatically -- no handler (Path A) needed.

2b. Check tree-sitter-language-pack Support

If the user wants symbol extraction (Path B) or context expansion (Path E), the language must be in tree-sitter-language-pack:

uv run python -c "from tree_sitter_language_pack import SupportedLanguage; print(sorted(SupportedLanguage.__args__))"

Verify a specific language:

uv run python -c "from tree_sitter_language_pack import get_parser; p = get_parser('<language>'); print(p)"

2c. Decision Matrix

Present the applicable paths:

Path	When to Use	Applies?
A: Language Handler	Language NOT in CocoIndex's built-in list -- needs custom chunking	?
B: Symbol Extraction	Language IS in `tree-sitter-language-pack` -- enables `--symbol-type`/`--symbol-name` filtering	?
C: Both A + B	Not built-in for chunking but has tree-sitter support	?
D: Grammar Handler	Domain-specific schema sharing a base language (e.g., Ansible = YAML)	?
E: Context Expansion	Language IS in `tree-sitter-language-pack` -- enables `smart_context=True` boundary expansion	?
F: Dependency Extractor	Language has import/require/reference patterns -- enables `deps tree`, `deps impact`, and `get_file_dependencies`/`get_file_impact` MCP tools. Use `/cocosearch:cocosearch-add-extractor` for dedicated guidance.	?

Present to user: "Based on my checks, here are the paths that apply: [list]. Ready to proceed?"

Step 3: Language Handler (Path A)

Skip this step if the language is in CocoIndex's built-in list (no custom chunking needed).

3a. Find the Best Analog Handler

Choose the closest existing handler based on language type:

Language Type	Analog Handler	Why
Config format (key-value, blocks)	`hcl.py`	Block-based structure with labels
Template language	`gotmpl.py`	Template directives + content
Script / shell language	`bash.py`	Function definitions + commands
Containerization / CI	`dockerfile.py`	Directive-based, sequential
JVM / compiled language	`scala.py` or `groovy.py`	OOP with classes, methods, imports

Search for the analog:

search_code(
    query="<analog-language> handler EXTENSIONS SEPARATOR_SPEC",
    symbol_type="class",
    use_hybrid_search=True,
    smart_context=True
)

Read the analog handler fully before proceeding.

3b. Create the Handler File

Copy from the template and implement:

Create src/cocosearch/handlers/<language>.py (copy _template.py)
Set EXTENSIONS to all file extensions (with leading dot)
Define SEPARATOR_SPEC with CustomLanguageConfig -- hierarchical regex separators from coarsest to finest
Implement extract_metadata() returning block_type, hierarchy, and language_id
Constraint: Separators must use standard regex only -- no lookaheads/lookbehinds (CocoIndex uses Rust regex)

The handler is autodiscovered at import time; no registration code needed.

If the language has no tree-sitter grammar, add the language_id to _SKIP_PARSE_EXTENSIONS in src/cocosearch/indexer/parse_tracking.py. This prevents false no_grammar reports in parse tracking stats.

3c. File Extension Registration (Automatic)

Extensions are auto-derived from the handler's EXTENSIONS attribute via _default_include_patterns() in src/cocosearch/indexer/config.py. No manual config.py edits are needed.

The EXTENSIONS list you set in step 3b (e.g., [".hcl", ".tf"]) is automatically converted to glob patterns (e.g., "*.hcl", "*.tf") and merged into include_patterns
For non-extension patterns (e.g., Dockerfile, Containerfile), define an INCLUDE_PATTERNS class attribute on the handler (e.g., INCLUDE_PATTERNS = ["Dockerfile", "Dockerfile.*", "Containerfile"]) — these are also picked up automatically

3d. Update CLI Display Name

Check if the language name needs a display override in cli.py:

search_code(
    query="display_names languages_command",
    symbol_name="languages_command",
    use_hybrid_search=True,
    smart_context=True
)

Add to the display_names dict only if .title() casing is wrong (e.g., "hcl": "HCL", "gotmpl": "Go Template").

3e. Create Handler Tests

Find the analog's test file for the pattern:

search_code(
    query="test <analog-language> handler EXTENSIONS SEPARATOR_SPEC",
    symbol_type="class",
    use_hybrid_search=True,
    smart_context=True
)

Create tests/unit/handlers/test_<language>.py covering:

Extension registration
Separator spec structure
Metadata extraction (block type, hierarchy, language ID)
Edge cases (empty content, malformed input)

Checkpoint with user: "Handler created at src/cocosearch/handlers/<language>.py with [N] extensions and [N] separator levels. Tests pass. Ready for the next path?"

Step 4: Symbol Extraction (Path B)

Skip this step if the language is NOT in tree-sitter-language-pack.

4a. Find the Best Analog Query

Choose based on language similarity:

Language Type	Analog Query	Why
Python-like (indent-based)	`python.scm`	function/class definitions
C-like (braces)	`go.scm` or `java.scm`	declaration patterns
Config (blocks with labels)	`hcl.scm`	block-based structures
Functional	`rust.scm`	items, traits, impls

Search for the analog:

search_code(
    query="<analog-language> tree-sitter query definition function class",
    use_hybrid_search=True,
    smart_context=True
)

Read the analog .scm file to understand the capture patterns.

4b. Explore the AST

Before writing the query, explore the language's tree-sitter AST to find the correct node types:

uv run python -c "
from tree_sitter_language_pack import get_parser
parser = get_parser('<language>')
tree = parser.parse(b'''<sample-code>''')
def show(node, indent=0):
    print(' ' * indent + f'{node.type} [{node.start_point[0]}:{node.start_point[1]}-{node.end_point[0]}:{node.end_point[1]}]')
    for child in node.children:
        show(child, indent + 2)
show(tree.root_node)
"

Identify the node types for functions, classes, methods, interfaces, etc.

4c. Create the Query File

Create src/cocosearch/indexer/queries/<language>.scm with S-expression patterns:

Use @definition.<type> captures for symbol types (function, class, method, interface)
Use @name for symbol name captures
Match patterns from the analog query file

4d. Register in LANGUAGE_MAP

Add extension-to-language mappings:

search_code(
    query="LANGUAGE_MAP extension mapping",
    symbol_name="LANGUAGE_MAP",
    use_hybrid_search=True,
    smart_context=True
)

Add entries to LANGUAGE_MAP in src/cocosearch/indexer/symbols.py:

"ext": "language_name",

4e. Register in SYMBOL_AWARE_LANGUAGES

Add the language to the symbol-aware set:

search_code(
    query="SYMBOL_AWARE_LANGUAGES",
    use_hybrid_search=True,
    smart_context=True
)

Add the language name to SYMBOL_AWARE_LANGUAGES in src/cocosearch/search/query.py.

4f. Update Symbol Type Mapping (If Needed)

Check if the language introduces new AST node types that need mapping to standard types:

search_code(
    query="_map_symbol_type node type mapping",
    symbol_name="_map_symbol_type",
    use_hybrid_search=True,
    smart_context=True
)

Add mappings in _map_symbol_type if the language uses non-standard node names for standard concepts (e.g., HCL uses "block" for what maps to "class").

4g. Update Qualified Name Builder (If Needed)

Check if the language needs special qualified name logic:

search_code(
    query="_build_qualified_name qualified name",
    symbol_name="_build_qualified_name",
    use_hybrid_search=True,
    smart_context=True
)

Add language-specific logic to _build_qualified_name in symbols.py if the language has special naming patterns (e.g., Go receiver methods, HCL block labels).

4h. Create Symbol Extraction Tests

Find the analog's test file:

search_code(
    query="test <analog-language> symbol extraction",
    symbol_type="class",
    use_hybrid_search=True,
    smart_context=True
)

Create tests/unit/indexer/symbols/test_<language>.py covering:

Function definitions (name, type, qualified name)
Class/struct definitions
Method definitions (including qualified names)
Edge cases (nested definitions, anonymous functions, generics)

Checkpoint with user: "Symbol extraction configured for [language] with [N] query patterns. Tests pass. Ready for the next path?"

Step 5: Grammar Handler (Path D)

Skip this step unless the language is a domain-specific schema sharing a base language extension.

For grammar handler implementation, use the dedicated skill:

Invoke: /cocosearch:cocosearch-add-grammar

This skill provides in-depth guidance for matches() design, separator spec, metadata extraction, conflict avoidance, and grammar-specific testing.

After completing the grammar skill, return here for Step 7 (count assertions) and Step 8 (documentation).

Step 6: Context Expansion (Path E)

Skip this step unless the language is in tree-sitter-language-pack AND context expansion is desired.

6a. Identify Definition Node Types

Explore the AST (same technique as Step 4b) to find which node types represent function/class definitions.

6b. Add to DEFINITION_NODE_TYPES

search_code(
    query="DEFINITION_NODE_TYPES context expansion node types",
    use_hybrid_search=True,
    smart_context=True
)

Add the language entry to DEFINITION_NODE_TYPES in src/cocosearch/search/context_expander.py:

"<language>": {"<function_node_type>", "<class_node_type>"},

6c. Add to EXTENSION_TO_LANGUAGE

Add file extension mappings to EXTENSION_TO_LANGUAGE in the same file:

".<ext>": "<language>",

CONTEXT_EXPANSION_LANGUAGES updates automatically -- it's derived from DEFINITION_NODE_TYPES.keys().

6d. Update Documentation

The CONTEXT_EXPANSION_LANGUAGES set is exported and referenced in search docs. Update any docs listing supported context expansion languages.

Checkpoint with user: "Context expansion added for [language]. smart_context=True will now expand to [node types] boundaries."

Step 6b: Dependency Extractor (Path F)

Skip this step unless the language has import/require/reference patterns that can be extracted for dependency analysis.

For dependency extractor implementation, use the dedicated skill:

Invoke: /cocosearch:cocosearch-add-extractor

This skill provides in-depth guidance for pre-checks, analog selection, extractor implementation, optional module resolver, tests, and registration.

After completing the extractor skill, return here for Step 7 (count assertions) and Step 8 (documentation).

Checkpoint with user: "Dependency extractor added for [language] with [N] import patterns. Tests pass. Ready for count assertions?"

Step 7: Update Count Assertions

This is the most commonly missed step. Do not skip.

7a. Handler Count (If Path A was done)

search_code(
    query="test registry handler count _HANDLER_REGISTRY",
    use_hybrid_search=True,
    smart_context=True
)

Update in tests/unit/handlers/test_registry.py:

len(_HANDLER_REGISTRY) >= N -- increment by number of new extensions
len(specs) == N -- increment by 1 (one CustomLanguageConfig per handler)

7b. Grammar Count (If Path D was done)

search_code(
    query="test grammar registry count _GRAMMAR_REGISTRY",
    use_hybrid_search=True,
    smart_context=True
)

Update in tests/unit/handlers/test_grammar_registry.py:

len(_GRAMMAR_REGISTRY) == N -- increment by 1
len(grammars) == N -- increment by 1

7c. Combined Spec Count (If Path A or D was done)

Both test_registry.py and test_grammar_registry.py assert len(specs) == N from get_all_custom_language_specs(). This is the combined total of all language handler specs + grammar handler specs. Increment by 1 for each new handler or grammar added.

Step 8: Update Documentation

8a. CLAUDE.md

Update module descriptions and counts:

Handler count in Architecture section (e.g., "Total custom language specs: N")
Context expansion language list in the search/ module description
Any handler/grammar counts mentioned

8b. README.md

search_code(
    query="Supported Languages README badges",
    use_hybrid_search=True,
    smart_context=True
)

Update:

Supported Languages count/table
Language badges section (if applicable)
Any feature lists mentioning language counts

8c. docs/adding-languages.md

If the new language introduces a new pattern worth documenting, add it as a worked example (like the HCL example in Path C).

Step 9: Verify

9a. Run Tests

# Handler tests (if Path A)
uv run pytest tests/unit/handlers/test_<language>.py -v

# Symbol extraction tests (if Path B)
uv run pytest tests/unit/indexer/symbols/test_<language>.py -v

# Grammar tests (if Path D)
uv run pytest tests/unit/handlers/grammars/test_<grammar>.py -v

# Dependency extractor tests (if Path F)
uv run pytest tests/unit/deps/extractors/test_<language>.py -v
uv run pytest tests/unit/deps/test_resolver.py -v

# Registry count assertions
uv run pytest tests/unit/handlers/test_registry.py -v
uv run pytest tests/unit/handlers/test_grammar_registry.py -v

# Full handler test suite
uv run pytest tests/unit/handlers/ -v

9b. Lint

uv run ruff check src/ tests/
uv run ruff format --check src/ tests/

9c. Present Summary

Language support added for [language]!

Paths completed:
  [x] Path A: Language Handler -- src/cocosearch/handlers/<language>.py
  [x] Path B: Symbol Extraction -- src/cocosearch/indexer/queries/<language>.scm
  [ ] Path D: Grammar Handler -- not applicable
  [x] Path E: Context Expansion -- added to context_expander.py
  [x] Path F: Dependency Extractor -- src/cocosearch/deps/extractors/<language>.py

Registration points:
  [x] Handler file created (autodiscovered)
  [x] EXTENSIONS auto-derived into include patterns
  [x] LANGUAGE_MAP entries (symbols.py)
  [x] Query file created (queries/<language>.scm)
  [x] SYMBOL_AWARE_LANGUAGES updated (query.py)
  [x] DEFINITION_NODE_TYPES updated (context_expander.py)
  [x] EXTENSION_TO_LANGUAGE updated (context_expander.py)
  [x] Test count assertions updated
  [x] Documentation updated

Tests: PASS
Lint: PASS

To try it out:
  uv run cocosearch languages          # Verify language appears
  uv run cocosearch index .            # Reindex with new language support
  uv run cocosearch search "query" --language <language>

Registration Checklist

Complete checklist of all registration points. Check off each one as you complete it:

Language Handler (Path A):

src/cocosearch/handlers/<language>.py created
EXTENSIONS attribute defined (auto-derived into include patterns)
INCLUDE_PATTERNS attribute defined (if non-extension patterns needed, e.g., Dockerfile)
_SKIP_PARSE_EXTENSIONS updated in src/cocosearch/indexer/parse_tracking.py (if no tree-sitter grammar)
Display name added to cli.py languages_command (if .title() casing is wrong)
tests/unit/handlers/test_<language>.py created

Symbol Extraction (Path B):

src/cocosearch/indexer/queries/<language>.scm created
Extension mappings added to LANGUAGE_MAP in src/cocosearch/indexer/symbols.py
Language added to SYMBOL_AWARE_LANGUAGES in src/cocosearch/search/query.py
_map_symbol_type updated (if new AST node types need mapping)
_build_qualified_name updated (if special naming logic needed)
tests/unit/indexer/symbols/test_<language>.py created

Grammar Handler (Path D):

src/cocosearch/handlers/grammars/<grammar>.py created
tests/unit/handlers/grammars/test_<grammar>.py created

Context Expansion (Path E):

DEFINITION_NODE_TYPES updated in src/cocosearch/search/context_expander.py
EXTENSION_TO_LANGUAGE updated in src/cocosearch/search/context_expander.py

Dependency Extractor (Path F):

src/cocosearch/deps/extractors/<language>.py created (autodiscovered)
LANGUAGES set matches the language IDs from handler/grammar
Module resolver added to src/cocosearch/deps/resolver.py (if import resolution needed)
Resolver registered in _RESOLVERS dict (if added)
tests/unit/deps/extractors/test_<language>.py created
Resolver tests added to tests/unit/deps/test_resolver.py (if resolver added)

Count Assertions:

tests/unit/handlers/test_registry.py -- handler count and spec count updated
tests/unit/handlers/test_grammar_registry.py -- grammar count and spec count updated

Documentation:

CLAUDE.md -- module descriptions and counts updated
README.md -- supported languages section updated
docs/adding-languages.md -- new example added (if novel pattern)

For common search tips (hybrid search, smart_context, symbol filtering), see skills/README.md.

For installation instructions, see skills/README.md.