Help us improve
Share bugs, ideas, or general feedback.
From oracle
> Add a new testing framework to Oracle's registry end-to-end. Confirms
npx claudepluginhub bri-stevenski/oracle-test-ai-agent --plugin oracleHow this skill is triggered — by the user, by Claude, or both
Slash command
/oracle:oracle-add-frameworkThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
> Add a new testing framework to Oracle's registry end-to-end. Confirms
Provides a checklist for code reviews covering functionality, security, performance, maintainability, tests, and quality. Use for pull requests, audits, team standards, and developer training.
Share bugs, ideas, or general feedback.
Add a new testing framework to Oracle's registry end-to-end. Confirms the classifier↔registry contract, authors the registry entry, validates the execution command, and ensures every classifier
test_typestill resolves to a non-null framework.
test_type exists but has no framework backing it
(registry gap)execution_command template). If you can't write
a one-line shell command that runs a single test file, it doesn't fit
Oracle's model.category value
(e2e_ui, python_unit, api, performance, frontend_unit,
etc.). New categories require a coordinated classifier update — see
Phase 2.preferred framework in
this category? If yes, decide intentionally: is the new framework a
replacement (demote the old to legacy), a peer (add as
supported), or the new preference (demote the old, mark new as
preferred)?Find the classifier rule that emits this test_type. Open
agent/core/classifier.py and locate the heuristic block that
produces the target category.
If no classifier rule exists for the category: Add one before
the registry entry. A framework with no routing path is a dead entry.
Add a heuristic that maps natural-language signals (keywords,
phrases) to the new test_type.
If the category exists but the classifier never emits it confidently: Strengthen the rule's heuristics. Confidence below 0.7 should trigger a clarification, not a generation.
Run the contract tests:
pytest tests/unit/test_orchestrator.py
Every test_type the classifier can emit must resolve to ≥1
framework via get_by_category — the orchestrator tests assert
non-null framework resolution per category. This is the gate.
Open agent/frameworks/registry.json. Add a new object to
frameworks[]. Required fields:
name (unique slug)display_namecategory (must match the classifier emission)languagesfile_extensionsexecution_command (with {file} placeholder)status (preferred / supported / legacy)Add recommended metadata. maturity, community_size,
recommended_for, strengths, avoid_when, ecosystems. These
show up in the recommender's reasoning string — sparse entries
produce sparse recommendations.
Validate the execution_command. It must run a single test file
when {file} is substituted. Test manually:
echo '<minimal test>' > /tmp/sample.<ext>
<execution_command with /tmp/sample.<ext> substituted>
Confirm exit code 0 (or expected non-zero) and no missing-config errors.
Confirm the {file} placeholder is the only substitution. The
executor tokenizes the template with shlex.split before
substituting {file}, so the file path stays a single argv element;
paths with spaces must remain intact.
Generate a test that should route to the new framework:
python -m agent.cli generate "<prompt that should hit the new test_type>"
Check the trace. Classification matches the expected category, recommendation picks the new framework (or the existing preferred — confirm this matches your intent), output file has the right extension.
Execute with --execute. Confirm the framework's CLI actually
runs the generated file. A passing execution validates the
execution_command template under the real Oracle invocation path.
Don't promote. This is a registry-validation run, not a real test
promotion. Leave the artifact in tests/generated/.
docs/guides/framework-registry.md if you added a new
category, changed the contract surface, or introduced a new optional
field schema.docs/ORACLE_STATE.md. One line: framework name,
category, status, date added. This is the project ledger; future
maintainers grep here first.feat(registry): add <framework-name> support.
Body should include the validation steps you ran and the
generated/executed sample.agent/frameworks/registry.json — Source of truth for entries.agent/core/framework_registry.py — Loader and lookup methods.
Don't add lookup helpers here unless the existing five
(get_all_frameworks, get_by_category, get_preferred_by_category,
find_by_name, match_by_language) genuinely don't fit.agent/core/classifier.py — Routing rules. Coordinate changes
with registry entries.tests/unit/test_orchestrator.py — Contract enforcement. Non-null
resolution per test_type (api → pytest, e2e_ui → playwright,
performance → k6, etc.) is asserted here. tests/unit/test_factory.py
separately enforces the LLM provider matrix.test_type
in classifier, no null framework resolution in registry)python -m agent.cli generate "<routing prompt>" resolves to the new
framework--execute runs the generated file successfully via the new
execution_commandframework-registry.md documents any new category or schema fieldORACLE_STATE.md ledger entry is present| Rationalization | Why It Is Wrong |
|---|---|
| "I'll add the registry entry now and update the classifier later" | Orphaned registry entries are routing traps. Either both land together or neither does. |
"Marking it preferred is fine, no need to demote the existing entry" | Multiple preferred entries in one category means get_preferred_by_category picks by registry order, not by merit. Demote the loser explicitly. |
| "I tested the command with my path — Oracle's substitution will work too" | Test the substituted command with a path that contains spaces. The executor's shlex.split-then-substitute order exists specifically to handle this case; a broken template will silently corrupt argv. |
| "It only needs to work for the happy-path prompt I tried" | If the classifier emits this test_type with confidence ≥0.7 for any phrasing, the framework must handle the resulting generated file. Test at least two distinct prompts that route to the new entry. |
| "Sparse metadata is fine, we can fill it in later" | The recommender's reasoning string is read by humans and agents. Sparse metadata produces unhelpful recommendations — fill it in now while you have the context. |
Scope: Cypress already has community demand and a CI image. Category
e2e_ui already has Playwright as preferred. Decision: add Cypress as
supported, keep Playwright as preferred.
Classifier: Already emits test_type=e2e_ui on UI-test prompts. No
classifier change needed.
Entry (abridged):
{
"name": "cypress",
"display_name": "Cypress",
"category": "e2e_ui",
"languages": ["javascript", "typescript"],
"file_extensions": ["cy.ts", "cy.js"],
"execution_command": "npx --yes cypress run --spec {file}",
"status": "supported",
"strengths": ["Time-travel debugging", "Strong DX for UI tests"],
"avoid_when": ["Cross-browser parity required (Playwright is stronger)"]
}
Validation: Generated a UI test prompt, confirmed recommender still
picked Playwright (correct — preferred wins). Verified Cypress is
accessible by routing a prompt with explicit Cypress mention through a
future explicit-framework override.
Scope: Performance category doesn't exist in the registry yet.
Classifier emits test_type=performance on load/stress prompts, but the
registry has no matching entry — this is the orphan case.
Phase 2 first: Confirmed agent/core/classifier.py already has the
performance rule (if "performance" in p or "load test" in p ...). No
classifier change needed.
Phase 3:
{
"name": "k6",
"display_name": "k6",
"category": "performance",
"languages": ["javascript"],
"file_extensions": ["js"],
"execution_command": "k6 run {file}",
"status": "preferred",
"recommended_for": ["HTTP load testing", "Stress and spike tests"],
"avoid_when": ["Browser-level performance (use Playwright tracing)"]
}
Validation: python -m agent.cli generate "Load test /v1/search at 200 RPS for 5 minutes" --execute. Confirmed classifier emits
test_type=performance at 0.95 confidence, recommender picks k6, k6
CLI runs the generated file without error.
playwright.config.ts): Document the setup in avoid_when or in
recommended_for. If Oracle is expected to generate the config too,
that's a scaffolder change, not just a registry change.preferred for
different sub-cases: Split the category. A single category should
have a single preferred. If the split isn't clean, surface this to
the user as a design decision rather than fudging the registry.test_type:
Don't add the registry entry yet. Strengthen the classifier first — an
unreachable entry is dead weight.