From ai-model-research
Use when the user wants a deep evaluation of a single OpenRouter model that goes beyond the OR catalog — model card, paper, benchmarks, license, known limitations. Triggers on phrases like "evaluate <model> for <task>", "deep dive on <OR model>", "research <model> beyond OpenRouter", "is <model> good for <use case>", "tell me everything about <OR model>", "model card for <model>".
npx claudepluginhub danielrosehill/claude-code-plugins --plugin ai-model-researchThis skill uses the workspace's default tool permissions.
Conduct a thorough evaluation of a single model the user is considering. Combine OpenRouter catalog data with external research — Hugging Face model card, original paper, license, benchmark coverage, community feedback — to give the user a confident go/no-go answer.
Mandates invoking relevant skills via tools before any response in coding sessions. Covers access, priorities, and adaptations for Claude Code, Copilot CLI, Gemini CLI.
Share bugs, ideas, or general feedback.
Conduct a thorough evaluation of a single model the user is considering. Combine OpenRouter catalog data with external research — Hugging Face model card, original paper, license, benchmark coverage, community feedback — to give the user a confident go/no-go answer.
The user has shortlisted a model (often from or-recommend-model or or-compare-models) and wants to understand it deeply before committing — for a real workflow, a production deployment, or a comparison against incumbents.
Fetch the OpenRouter catalog and extract the target model's full record:
curl -s https://openrouter.ai/api/v1/models -H "Accept: application/json"
Capture: id, context_length, modalities, pricing, supported_parameters, top_provider info, description, created date.
Go beyond the OR catalog. Use the available research tools (WebFetch, web search, Hugging Face MCP if available) to gather:
huggingface.co/<org>/<repo>. Look for: training data, training compute, licence, intended use, limitations, evaluation results.If the user mentioned a specific workflow (e.g. "I want to use this for legal document summarization"), explicitly evaluate fitness for that task:
Output a structured evaluation report:
# Evaluation: <Model ID>
## OpenRouter Catalog Snapshot
- Context: ...
- Pricing: ... / 1M prompt, ... / 1M completion
- Modalities: ...
- Supported parameters: ...
## Background
- Provider: ...
- Released: ...
- Architecture / scale (if known): ...
- Paper: <link if found>
## Capabilities
- ...
## Limitations & Known Issues
- ...
## Licence
- ...
- Commercial use: yes / no / conditional
## Benchmarks (if publicly reported)
- ...
## Fit for <user's stated use case>
- Verdict: strong / moderate / weak fit
- Reasoning: ...
## Recommendation
- Use this if: ...
- Avoid this if: ...
- Consider alternatives: <list 1–2 from OR catalog>