From bee
Evaluates codebases for LLM-assisted development ergonomics including context window friendliness, explicitness, module boundaries, test-as-spec, and naming criteria.
npx claudepluginhub incubyte/ai-plugins --plugin beeThis skill uses the workspace's default tool permissions.
How comfortable is this codebase for an LLM to navigate, understand, and generate correct code in? Code that's ergonomic for AI is also better for humans — but the emphasis here is on what specifically helps or hinders LLM tools like Claude Code, Copilot, and Cursor.
Detects LLM agent artifacts in codebases: test quality issues, dead code, over-abstraction, verbose styles. Use for reviewing or cleaning AI-generated code.
Applies Karpathy guidelines to reduce LLM coding mistakes: think before coding, prioritize simplicity, make surgical changes, and define verifiable success criteria when writing, reviewing, or refactoring code.
Guides code writing, review, and refactoring with Karpathy-inspired rules to avoid overcomplication, ensure simplicity, surgical changes, and verifiable success criteria.
Share bugs, ideas, or general feedback.
How comfortable is this codebase for an LLM to navigate, understand, and generate correct code in? Code that's ergonomic for AI is also better for humans — but the emphasis here is on what specifically helps or hinders LLM tools like Claude Code, Copilot, and Cursor.
LLMs work with a finite context window. Every file they read consumes tokens. Large files are expensive, and when a file exceeds the context budget, the LLM works with a partial view — leading to hallucinations and missed dependencies.
What to check:
What to recommend:
OrderService could be OrderCreation, OrderPricing, OrderFulfillment — each fits in context and has a clear purpose.LLMs generate better code when they can see the shape of data and contracts. Implicit conventions — patterns that exist only in developers' heads — are invisible to AI and cause hallucinations.
What to check:
function process(data: any) gives the LLM nothing to work with. function calculateDiscount(order: Order): Money tells it exactly what to generate."active" in 12 places instead of a Status.ACTIVE constant? LLMs will guess wrong strings.validate() before save()"? Unless it's in a type system, a base class, or documentation, the LLM won't know.What to recommend:
ValidatedOrder type that can only be created by calling validate()).When an LLM needs to work on a module, it should be able to understand the module's purpose and interface without loading the entire codebase. Clear boundaries mean the LLM can work on isolated pieces — fewer files in context, faster and more accurate results.
What to check:
src/utils/ with 30 unrelated functions is an anti-pattern for AI — the LLM has to read all 30 to find the one it needs.What to recommend:
utils/ into focused modules (utils/dates.ts, utils/money.ts).Tests are the best specification an LLM can read. A well-named test suite tells the LLM what the code is supposed to do — it can then generate implementations that match. Missing or poorly-named tests mean the LLM is guessing at behavior.
What to check:
test('should return 404 when user not found') is a spec. test('test1') or test('getUser') tells the LLM nothing.What to recommend:
CLAUDE.md is the LLM's instruction manual for the project. If it's missing, outdated, or too vague, the LLM operates without project context and falls back to generic patterns that may not fit.
What to check:
What to recommend:
LLMs rely heavily on names to understand code without reading every line. A function called process() forces the LLM to read the entire implementation. A function called applyBulkDiscountToOrder() tells it everything.
What to check:
handle() vs handlePaymentWebhook().x vs remainingRetryAttempts.helpers.ts vs order-pricing.ts.active vs isOrderActive.What to recommend:
When evaluating a codebase for AI ergonomics, prioritize findings by impact:
Every finding should answer: "If this were fixed, what would the LLM be able to do better?" That's the WHY.