Search everything...

Stats

Actions

Available In

agentic-usability

Name: agentic-usability
Author: pspdfkit-labs

By PSPDFKit-labs

Run, evaluate, and analyze AI agent benchmark suites for SDK usability. Generates test cases from source code, executes them in sandboxed VMs, scores solutions via LLM judge, and surfaces failure patterns or API design gaps.

testing

automation

api-development

npx claudepluginhub pspdfkit-labs/agentic-usability --plugin agentic-usability

Popularity

Stars

Top 25%

Med: 0·Avg: 362

Installs

Med: 0·Avg: 2

What's Inside

Skills10

eval

/eval

Run the full evaluation pipeline (execute, judge, report) for an SDK usability benchmark. Use when running a complete benchmark end-to-end, resuming an interrupted pipeline, or checking pipeline status.

execute

/execute

Execute benchmark test cases in sandboxed environments with AI agents. Spins up microsandbox containers for each test case and extracts solutions.

export

/export

Export a benchmark pipeline as a zip file for sharing or archiving. Excludes cache and large snapshots.

generate

/generate

Generate SDK usability test cases by exploring source code. Use when creating benchmark test suites, generating test cases for an SDK, or when the user wants to create evaluation scenarios.

init

/init

Initialize a new agentic-usability benchmark pipeline project. Use when setting up a new SDK benchmark, creating a config.json, or starting a new evaluation project.

Stats

Version0.1.2

ReleasedMay 14, 2026

LanguageTypeScript

Stars15

MaintenanceExcellent

LicenseApache-2.0

Last CommitJun 11, 2026

AddedMay 15, 2026

Actions

View on GitHub View README Plugin Marketplace JSON

Own this plugin?

Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).

Available In

agentic-usability-marketplace15

README

Agentic Usability

A CLI tool that measures how well AI coding agents (Claude Code, Codex, Gemini CLI, etc.) can use your SDK. It generates programming problems from your SDK source, runs agents in sandboxed environments to solve them, then scores the results using an LLM judge that compares generated solutions against reference implementations.

stateDiagram-v2
    generate: Test Suite Generation Agent
    executionSandbox: Sandbox Pool
    state executionSandbox {
        execution: Test Solver Agent
        publicInfo: Public Documentation
    }
    judgeSandbox: Sandbox Pool
    state judgeSandbox {
        judge: Test Judge Agent
        publicInfo2: Public Documentation
        privateInfo: Private Source Code
    }
    insight: Analyzer Agent
    generate --> executionSandbox: Test Cases
    executionSandbox --> judgeSandbox: Test Solutions
    judgeSandbox --> insight: Test Scores

Prerequisites

Node.js >= 20
Linux with KVM or macOS with Apple Silicon (required by microsandbox microVMs)
An AI agent CLI installed locally for test generation and judging (e.g. Claude Code, Codex, Gemini CLI)
API keys for the agent(s) you plan to use

Installation

npm install -g @pspdfkit-labs/agentic-usability

Then run commands directly:

agentic-usability init -p pipelines/my-sdk-eval

Install from source

git clone https://github.com/PSPDFKit-labs/agentic-usability.git
cd agentic-usability
npm install
npm run build

Then run commands via npx:

npx agentic-usability init -p pipelines/my-sdk-eval

Claude Code Plugin

This package includes a Claude Code plugin with skills for every CLI command. Once installed, you can run pipeline stages directly from Claude Code (e.g. /agentic-usability:eval).

Install the plugin

From within Claude Code:

/plugin marketplace add PSPDFKit-labs/agentic-usability
/plugin install agentic-usability@agentic-usability-marketplace
/reload-plugins

Available skills

Skill	Description
`/agentic-usability:init`	Create a new pipeline project
`/agentic-usability:generate`	Generate test suite from SDK source
`/agentic-usability:execute`	Run agents in sandboxes
`/agentic-usability:judge`	LLM judge scoring
`/agentic-usability:report`	Display scorecard
`/agentic-usability:eval`	Full pipeline (execute → judge → report)
`/agentic-usability:inspect`	Open web UI
`/agentic-usability:insights`	AI analysis of results
`/agentic-usability:export`	Export pipeline as zip
`/agentic-usability:sandbox`	Debug shell inside a sandbox

Quick Start

1. Initialize a project

agentic-usability init -p pipelines/my-sdk-eval

The interactive wizard walks you through configuring:

Private info — where your SDK source code lives (local path, git repo, or URL). This is provided to the generator and judge but not the executor.
Public info — package name, docs URLs, install command. This is what the executor agent sees.
Agents — which AI CLI to use for each pipeline stage (claude, codex, gemini, or custom)
Targets — Docker image + timeout for sandbox execution
Sandbox — resource limits, secrets, environment variables

The wizard explains each field and provides sensible defaults. You can also cd into a directory and run agentic-usability init without -p.

2. Run the pipeline

agentic-usability eval -p pipelines/my-sdk-eval

This runs the evaluation pipeline: execute → judge → report.

Or run stages individually:

agentic-usability generate -p pipelines/my-sdk-eval
agentic-usability execute  -p pipelines/my-sdk-eval
agentic-usability judge    -p pipelines/my-sdk-eval
agentic-usability report   -p pipelines/my-sdk-eval

Use --tests to run specific test cases (comma-separated):

agentic-usability execute -p pipelines/my-sdk-eval --tests TC-001,TC-003
agentic-usability judge   -p pipelines/my-sdk-eval --tests TC-001,TC-003

Project Directory Layout

Each pipeline project is a self-contained directory. Without -p, the CLI treats CWD as the project directory.

View full README on GitHub

agentic-usability

Popularity

What's Inside

Confidence

README

Agentic Usability

Prerequisites

Installation

Install from source

Claude Code Plugin

Install the plugin

Available skills

Quick Start

1. Initialize a project

2. Run the pipeline

Project Directory Layout

Similar Plugins

evalview

agent-validator

evaluate-agent

agent-sdk-dev

agent-lint

trustabl

More by PSPDFKit-labs

nutrient-dws

pdf-to-markdown

Agentic Usability

Prerequisites

Installation

Install from source

Claude Code Plugin

Install the plugin

Available skills

Quick Start

1. Initialize a project

2. Run the pipeline

Project Directory Layout

Popularity

Health & Quality

More by PSPDFKit-labs

nutrient-dws

pdf-to-markdown

Similar Plugins

evalview

agent-validator

evaluate-agent

agent-sdk-dev

agent-lint

trustabl