Evaluate and compare LLMs, ML APIs, and fine-tuned models for product fit across quality, latency, cost, compliance, and vendor risk dimensions. Use when selecting an AI model or vendor, comparing foundation model options, or making build-vs-API decisions for a product use case.
From pm-ai-product-managementnpx claudepluginhub tarunccet/pm-skills --plugin pm-ai-product-managementThis skill uses the workspace's default tool permissions.
Provides UI/UX resources: 50+ styles, color palettes, font pairings, guidelines, charts for web/mobile across React, Next.js, Vue, Svelte, Tailwind, React Native, Flutter. Aids planning, building, reviewing interfaces.
Fetches up-to-date documentation from Context7 for libraries and frameworks like React, Next.js, Prisma. Use for setup questions, API references, and code examples.
Guides Payload CMS config (payload.config.ts), collections, fields, hooks, access control, APIs. Debugs validation errors, security, relationships, queries, transactions, hook behavior.
Help PMs systematically evaluate AI models (LLMs, ML APIs, fine-tuned models) for product fit using a structured framework.
You are helping evaluate AI models or vendors for $ARGUMENTS.
Clarify the use case:
Identify candidate models / vendors:
Score each candidate on the evaluation matrix (see table below):
Assess quality benchmarks:
Model operational requirements:
Cost modelling:
requests/month × avg_tokens × price_per_tokenFine-tuning and customisation:
Data privacy and compliance:
Vendor risk:
Decision framework — build vs. API vs. fine-tune:
Produce evaluation report:
| Dimension | Weight | Candidate A | Candidate B | Candidate C |
|---|---|---|---|---|
| Task alignment / output quality | 25% | /5 | /5 | /5 |
| Latency (p95) | 15% | /5 | /5 | /5 |
| Cost at scale | 15% | /5 | /5 | /5 |
| Context window | 10% | /5 | /5 | /5 |
| Fine-tuning capability | 10% | /5 | /5 | /5 |
| Data privacy / compliance | 15% | /5 | /5 | /5 |
| Vendor lock-in risk | 5% | /5 | /5 | /5 |
| Deprecation / stability risk | 5% | /5 | /5 | /5 |
| Weighted total | 100% |
Think step by step. Save as markdown.