From cocosearch
Guides adding grammar handlers to CocoSearch for domain-specific formats like GitHub Actions in YAML, covering YamlGrammarBase inheritance, content validation, metadata extraction, tests, and registration.
npx claudepluginhub violetcranberry/coco-search --plugin cocosearchThis skill uses the workspace's default tool permissions.
A structured workflow for adding a grammar handler to CocoSearch. Grammar handlers provide domain-specific chunking and metadata for files that share a base language extension but have distinct structure (e.g., GitHub Actions workflows are YAML files with a specific schema).
Guides adding support for new programming languages or config formats to CocoSearch via workflows for handlers, symbol extraction, context expansion, and registration checklists.
Enables AST-based code search, linting, analysis, and refactoring using ast-grep (sg). Use for structural patterns, precise modifications, custom rules, and polyglot codebases where regex fails.
Guides markdown linting with markdownlint-cli2: run checks, fix MD0XX errors, configure .markdownlint-cli2.jsonc rules/ignores, set up VS Code extension and GitHub Actions. Supports GFM/CommonMark for validation and workflows.
Share bugs, ideas, or general feedback.
A structured workflow for adding a grammar handler to CocoSearch. Grammar handlers provide domain-specific chunking and metadata for files that share a base language extension but have distinct structure (e.g., GitHub Actions workflows are YAML files with a specific schema).
Philosophy: YamlGrammarBase handles the heavy lifting — matches(), extract_metadata(), comment stripping, and the fallback metadata chain are all inherited. Subclasses only implement _has_content_markers(content) for content validation and _extract_grammar_metadata(stripped, text) for grammar-specific metadata. Override matches() only for broad-pattern grammars that need mandatory content checks (like Kubernetes). This skill guides you through designing content validation and metadata extraction.
Reference: docs/adding-languages.md covers grammar handlers alongside language handlers. This skill is the dedicated deep-dive for grammars.
Parse the user's request to determine what's being added.
Extract from the request:
github-actions, docker-compose, ansible-playbook)yaml, json, toml).github/workflows/*.yml, docker-compose*.yml)on: + jobs: for GitHub Actions, apiVersion: + kind: for Kubernetes)Confirm with user: "I'll add a grammar handler for [grammar] (base: [language]) matching [patterns]. Let me find the best analog."
Choose the closest existing grammar handler based on the grammar type:
| Grammar Type | Analog | Why |
|---|---|---|
| CI/CD pipeline (YAML) | github_actions.py or gitlab_ci.py | Jobs/stages/steps structure |
| Container orchestration (YAML) | docker_compose.py or kubernetes.py | Services/resources structure |
| Template (YAML/gotmpl) | helm_template.py or helm_values.py | Template directives + values |
| Kubernetes manifest (YAML) | kubernetes.py | Content-heavy matching with exclusions |
| Config values (YAML) | helm_values.py | Comment-based section markers |
Non-YAML grammars: The pattern applies equally to JSON, TOML, or XML base languages — adapt
PATH_PATTERNSand content markers accordingly. All 7 existing grammars inherit fromYamlGrammarBase, which provides shared comment stripping, regex patterns (_TOP_KEY_RE,_ITEM_RE, etc.),matches()with path + content delegation, and metadata orchestration with fallback chain. The handler structure is language-agnostic.
Search for and read the analog handler:
search_code(
query="<analog-grammar> grammar handler GRAMMAR_NAME matches",
symbol_type="class",
use_hybrid_search=True,
smart_context=True
)
Read the analog handler fully before proceeding.
Most grammars inherit matches() from YamlGrammarBase — it handles path matching via fnmatch with nested path support (*/pattern) and delegates content checks to your _has_content_markers(content). You only override matches() for broad-pattern grammars that need mandatory content checks (rare — Kubernetes is the only current example).
Use the decision tree to determine what you need to implement:
Are your PATH_PATTERNS specific to this grammar?
(e.g., ".github/workflows/*.yml", ".gitlab-ci.yml", "docker-compose*.yml")
YES --> Implement _has_content_markers() only
- Inherited matches() handles path matching and delegates to your method
- _has_content_markers() confirms content when available (optional validation)
- Returns True when path matches and content is None
- Example: GitHubActionsHandler -- ".github/workflows/*.yml" is specific enough
NO --> Broad patterns that match many files
(e.g., "*.yaml", "*.yml")
- Override matches() entirely (rare case)
- Content check is MANDATORY -- return False when content is None
- Must check for positive markers (e.g., "apiVersion:" + "kind:")
- Must check for negative markers (exclude competing grammars)
- Example: KubernetesHandler -- "*.yaml" matches everything, so content is required
Inherited from YamlGrammarBase — shown for reference, don't implement this:
# This is in YamlGrammarBase — you get it for free
def matches(self, filepath: str, content: str | None = None) -> bool:
for pattern in self.PATH_PATTERNS:
if fnmatch.fnmatch(filepath, pattern) or fnmatch.fnmatch(
filepath, f"*/{pattern}"
):
if content is not None:
return self._has_content_markers(content)
return True
return False
What you implement — _has_content_markers() for content validation:
def _has_content_markers(self, content: str) -> bool:
# Return True if content has grammar-specific markers
return "marker_a:" in content and "marker_b:" in content
matches()Only when PATH_PATTERNS are too broad (e.g., *.yaml) and mandatory content checks are needed. Kubernetes is the only current example:
def matches(self, filepath: str, content: str | None = None) -> bool:
basename = filepath.rsplit("/", 1)[-1] if "/" in filepath else filepath
for pattern in self.PATH_PATTERNS:
if fnmatch.fnmatch(basename, pattern):
if content is None:
return False # Can't distinguish without content
# Positive markers
if "required_key:" not in content:
return False
# Negative markers (exclude competing grammars)
if any(marker in content for marker in _COMPETING_MARKERS):
return False
return True
return False
When multiple grammars share broad patterns (*.yaml, *.yml), check for markers from competing grammars:
_HELM_MARKERS import from helm_template.pydocker-compose*.yml) to avoid overlap.gitlab-ci.yml) to avoid overlapSearch for potential conflicts:
search_code(
query="PATH_PATTERNS matches grammar handler yaml yml",
use_hybrid_search=True,
smart_context=True
)
Review all existing grammar PATH_PATTERNS and matches() logic to ensure your new grammar won't claim files that belong to another grammar.
src/cocosearch/handlers/grammars/_template.py to <grammar>.py<Grammar>Handler (e.g., AnsiblePlaybookHandler)YamlGrammarBase (imported from cocosearch.handlers.grammars._base)GRAMMAR_NAME -- unique lowercase hyphenated identifier (e.g., ansible-playbook)PATH_PATTERNS -- glob patterns matching file pathsmatches() only for broad patterns (rare — see Step 3 decision tree)SEPARATOR_SPEC with CustomLanguageConfig -- hierarchical regex separators from coarsest to finest_has_content_markers(content) -- return True if content has grammar-specific markers_extract_grammar_metadata(stripped, text) -- return metadata dict or None for fallback chainSeparator constraints: Use standard regex only -- no lookaheads/lookbehinds (CocoIndex uses Rust regex).
Autodiscovery: The grammar is autodiscovered at import time. Any handlers/grammars/*.py file (not prefixed with _) implementing the grammar handler protocol is auto-registered. No registration code needed.
_strip_comments() — called automatically before _extract_grammar_metadata()self._make_result(block_type, hierarchy) for result construction (sets language_id to GRAMMAR_NAME)None from _extract_grammar_metadata() to trigger the fallback chain (document → value → empty)_TOP_KEY_RE, _ITEM_RE, _NESTED_KEY_RE, _LIST_ITEM_KEY_REhierarchy as a structured path (e.g., "job:build", "service:web", "kind:Deployment")Create tests/unit/handlers/grammars/test_<grammar>.py following the 4-class test pattern from existing grammars:
project/.github/workflows/ci.yml)content=None behavior: Returns True for path-specific grammars, False for broad-pattern grammarslanguage_name matches GRAMMAR_NAMEseparators_regex is non-empty(?=, (?!, (?<=, (?<! not in separators)block_type and hierarchy_strip_comments() inherited from YamlGrammarBase)language_id always equals GRAMMAR_NAME (set by inherited _make_result())GRAMMAR_NAME is set and non-emptyBASE_LANGUAGE is set and non-emptyPATH_PATTERNS is a non-empty listFind the analog's test file for the exact pattern:
search_code(
query="test <analog-grammar> grammar matching separator metadata",
symbol_type="class",
use_hybrid_search=True,
smart_context=True
)
Checkpoint with user: "Grammar handler created at src/cocosearch/handlers/grammars/<grammar>.py with [path-specific/broad] matching. Tests pass. Ready for count assertions and documentation?"
Skip this step unless the grammar has reference patterns worth extracting (e.g., image refs, action refs, module sources, template includes).
For dependency extractor implementation, use the dedicated skill:
Invoke: /cocosearch:cocosearch-add-extractor
This skill provides in-depth guidance for pre-checks, analog selection, extractor implementation, optional module resolver, tests, and registration. Grammar extractors typically set LANGUAGES = {"<grammar-name>"} and use DepType.REFERENCE with metadata["kind"] for specifics.
After completing the extractor skill, return here for Step 6 (count assertions) and Step 7 (documentation).
Checkpoint with user: "Dependency extractor added for [grammar] with [N] reference types. Tests pass. Ready for count assertions?"
This is the most commonly missed step. Do not skip.
search_code(
query="test grammar registry count _GRAMMAR_REGISTRY",
use_hybrid_search=True,
smart_context=True
)
Update in tests/unit/handlers/test_grammar_registry.py:
len(_GRAMMAR_REGISTRY) == N -- increment by 1len(grammars) == N from get_registered_grammars() -- increment by 1search_code(
query="test_returns_twelve_specs get_all_custom_language_specs",
use_hybrid_search=True,
smart_context=True
)
Update in tests/unit/handlers/test_registry.py:
len(specs) == N from get_all_custom_language_specs() -- increment by 1 (this is the combined total of all language handler specs + grammar handler specs)Update module descriptions and counts:
handlers/ module descriptionsearch_code(
query="Supported Grammars grammar table badges",
use_hybrid_search=True,
smart_context=True
)
Update:
If the new grammar introduces a novel matching pattern (e.g., first non-YAML grammar, first content-only match without path patterns), add it as a worked example.
# Grammar tests
uv run pytest tests/unit/handlers/grammars/test_<grammar>.py -v
# Dependency extractor tests (if added)
uv run pytest tests/unit/deps/extractors/test_<grammar>.py -v
# Registry count assertions
uv run pytest tests/unit/handlers/test_registry.py -v
uv run pytest tests/unit/handlers/test_grammar_registry.py -v
# Full handler test suite
uv run pytest tests/unit/handlers/ -v
uv run ruff check src/ tests/
uv run ruff format --check src/ tests/
Grammar handler added for [grammar]!
Handler: src/cocosearch/handlers/grammars/<grammar>.py
- Grammar name: <grammar-name>
- Base language: <base-language>
- Path patterns: <patterns>
- Matching strategy: <path-specific/broad with content validation>
- Separator levels: <N>
Registration points:
[x] Grammar handler file created (autodiscovered)
[x] Tests created (tests/unit/handlers/grammars/test_<grammar>.py)
[x] test_grammar_registry.py counts updated
[x] test_registry.py combined spec count updated
[x] CLAUDE.md updated
[x] README.md updated
Tests: PASS
Lint: PASS
To try it out:
uv run cocosearch grammars # Verify grammar appears
uv run cocosearch index . # Reindex to pick up grammar-matched files
uv run cocosearch search "query" --language <grammar-name>
Complete checklist of all registration points. Check off each one as you complete it:
src/cocosearch/handlers/grammars/<grammar>.py createdtests/unit/handlers/grammars/test_<grammar>.py createdtests/unit/handlers/test_grammar_registry.py -- grammar count and name set updatedtests/unit/handlers/test_registry.py -- combined spec count updatedsrc/cocosearch/deps/extractors/<grammar>.py created (if grammar has reference patterns)tests/unit/deps/extractors/test_<grammar>.py created (if extractor added)CLAUDE.md -- grammar handler list, extractor count, and dependency descriptions updatedREADME.md -- grammar table and badges updatedFor common search tips (hybrid search, smart_context, symbol filtering), see skills/README.md.
For the full language support workflow (handlers, symbols, context expansion), use /cocosearch:cocosearch-add-language.