Skill

model-theft

Detects inference endpoints missing authentication or rate limiting, enabling model theft via systematic queries. Use when building or auditing LLM-serving infrastructure.

OpenAI

Anthropic

security

ai-ml

Popularity

Stars

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/soundcheck:model-theft

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Prevents unauthorized replication of proprietary models through API abuse. Unauthenticated

SKILL.md

61 lines · ~814 tokens

Stats

LanguagePython

Stars18

MaintenanceExcellent

Last CommitJun 17, 2026

Actions

View Source View Plugin View on GitHub View README

Model Theft (OWASP LLM10:2025)

What this checks

Prevents unauthorized replication of proprietary models through API abuse. Unauthenticated or unthrottled inference endpoints let attackers systematically query a model to reconstruct its weights or distill a clone — stealing the commercial and IP value of the deployment.

Vulnerable patterns

Inference endpoint has no authentication — any client can query freely
Rate limiting applied per IP only, trivially bypassed with rotating proxies
Response includes raw logprobs or full embedding vectors, enabling extraction
No monitoring for systematic/grid-search query patterns that signal extraction attempts

Fix immediately

Flag the vulnerable code and explain the risk. Then suggest a fix that establishes these properties:

Every inference endpoint requires authentication — API key, bearer token, or mTLS. Unauthenticated endpoints are free training data for anyone who wants to clone the model.
Rate limits are keyed on the authenticated principal, not the IP. IP-only throttles are defeated by rotating proxies and residential IP pools; a per-user or per-key quota follows the attacker even as IPs churn.
Extraction-signal fields are stripped from responses. Log-probabilities, full embedding vectors, and per-token probabilities are the primary signals distillation attacks use to reconstruct a model. If a caller does not strictly need them, do not return them.
Query patterns are monitored for extraction signatures — high-volume, low-entropy, systematic grid-search probes. Alerts fire on anomalies; the handler records user identity, timestamp, and prompt (or a content fingerprint) for after-the-fact investigation.

Translate each principle to the serving framework, auth provider, and rate-limiter of the audited file. Use the framework's documented authentication and throttling middleware — do not roll your own.

Verification

Every inference endpoint requires a valid API key or bearer token
Rate limits are enforced per authenticated user, not per IP address
Log-probabilities, raw embeddings, and weight data are excluded from API responses
Query logs include user identity, timestamp, and either the prompt itself or a stable fingerprint (hash, embedding, or normalized form) sufficient to detect content-pattern anomalies. Logging only metadata (length, token count, request id) without any reconstructable prompt signal does not satisfy this. Choice between raw prompt and fingerprint is a privacy tradeoff — document the decision.

References

CWE-285 (Improper Authorization)
CWE-307 (Improper Restriction of Excessive Authentication Attempts)
OWASP LLM10:2025 Model Theft

model-theft

Popularity

Invocation

Context Preview

SKILL.md

model-theft

Popularity

Invocation

Context Preview

SKILL.md

Model Theft (OWASP LLM10:2025)

What this checks

Vulnerable patterns

Fix immediately

Verification

References

Similar Skills

Model Theft (OWASP LLM10:2025)

What this checks

Vulnerable patterns

Fix immediately

Verification

References

Similar Skills