Plugin

metr-inspect-action

Name: metr-inspect-action
Author: metr

Monitoring and diagnostics skill for Hawk job evaluation runs, providing status checks, log viewing, and troubleshooting capabilities.

Component Overview

Commands

Agents

Skills

Hooks

datadog, linear

MCP Servers

LSP Servers

Output Styles

Themes

Monitors

Install

npx claudepluginhub joshuarweaver/cascade-code-general-misc-1 --plugin metr-inspect-action

Component Details

MCP Servers (2)

Connects to external services

README

Hawk

Run Inspect AI evaluations at scale on Kubernetes.

Define your tasks, agents, and models in a YAML config. Hawk runs every combination on isolated Kubernetes pods, streams logs to your terminal, imports results into a PostgreSQL warehouse, and gives you a web UI to explore everything.

Why Hawk

📋 One YAML, full grid. Define tasks, agents, and models. Hawk runs the Cartesian product.
☸️ Kubernetes-native. Each eval gets its own pod and fresh virtualenv. Sandboxes run in separate pods with Cilium network policies for multi-tenant isolation.
🔑 Built-in LLM proxy. Managed proxy for OpenAI, Anthropic, and Google Vertex with automatic token refresh. No API keys to juggle (or bring your own).
📡 Live monitoring. hawk logs -f streams logs in real-time. hawk status gives you a structured JSON report. Every job gets a Datadog dashboard URL on submission.
🖥️ Web UI. Browse eval sets, filter samples by score range and full-text search, compare across eval sets, export to CSV. Filter state lives in the URL for sharing.
🔍 Scout scanning. Run scanners over transcripts from previous evals. Filter transcripts by status, score, model, metadata with a rich query DSL.
🗄️ Data warehouse. Results land in PostgreSQL with trigram search, covering indexes, and computed status columns.
🔒 Access control. Model group permissions gate who can run models, view logs, and scan eval sets. S3 Object Lambda enforces permissions per-object.
✏️ Sample editing. Batch edit scores, invalidate or un-invalidate samples. Full audit trail.
💻 Local mode. hawk local eval-set runs the same config on your machine. --direct skips the venv so you can attach a debugger.
🔄 Resumable scans. Configs save to S3. hawk scan resume picks up where you left off.

Get Started

uv pip install "hawk[cli] @ git+https://github.com/METR/inspect-action"
hawk login
hawk eval-set examples/simple.eval-set.yaml
hawk logs -f   # watch it run
hawk web       # open results in browser

Prerequisites

Before using Hawk, ensure you have:

Python 3.11 or later
uv for dependency management
Access to a Hawk deployment - You'll need:
- Hawk API server URL
- Authentication credentials (OAuth2)
For deployment: Kubernetes cluster, AWS account, Terraform 1.10+

Installation

Install the Hawk CLI:

uv pip install "hawk[cli] @ git+https://github.com/METR/inspect-action"

Or install from source:

git clone https://github.com/METR/inspect-action.git
cd inspect-action
uv pip install -e .[cli]

Quick Start

1. Authenticate

First, log in to your Hawk server:

hawk login

This will open a browser for OAuth2 authentication.

2. Run Your First Evaluation

Create a simple eval config file or use an example:

hawk eval-set examples/simple.eval-set.yaml

3. View Results

Open the evaluation in your browser:

hawk web

Or view logs and results in the configured log viewer.

Configuration

Required Environment Variables

Set these before using the Hawk CLI:

Variable	Required	Description	Example
`HAWK_API_URL`	Yes	URL of your Hawk API server	`https://hawk.example.com`
`INSPECT_LOG_ROOT_DIR`	Yes	S3 bucket for eval logs	`s3://my-bucket/evals`
`LOG_VIEWER_BASE_URL`	No	URL for web log viewer	`https://viewer.example.com`

You can set these in a .env file in your project directory or export them in your shell:

export HAWK_API_URL=https://hawk.example.com
export INSPECT_LOG_ROOT_DIR=s3://my-bucket/evals

Authentication Variables

For API server and CLI authentication:

INSPECT_ACTION_API_MODEL_ACCESS_TOKEN_AUDIENCE
INSPECT_ACTION_API_MODEL_ACCESS_TOKEN_ISSUER
INSPECT_ACTION_API_MODEL_ACCESS_TOKEN_JWKS_PATH

For log viewer authentication (can be different):

VITE_API_BASE_URL - Should match HAWK_API_URL
VITE_OIDC_ISSUER
VITE_OIDC_CLIENT_ID
VITE_OIDC_TOKEN_PATH

Running Eval Sets

hawk eval-set examples/simple.eval-set.yaml

The Eval Set Config File

The eval set config file is a YAML file that defines a grid of tasks, solvers/agents, and models to evaluate.

See examples/simple.eval-set.yaml for a minimal working example.

Required Fields

tasks:
  - package: git+https://github.com/UKGovernmentBEIS/inspect_evals
    name: inspect_evals
    items:
      - name: mbpp
        sample_ids: [1, 2, 3]  # Optional: test specific samples

models:
  - package: openai
    name: openai
    items:
      - name: gpt-4o-mini

Optional Fields

View full README on GitHub

Similar Plugins

claude-mem

67.8k

168

Memory compression system for Claude Code - persist context across sessions

v12.4.7

Stats

Version2026.04.06

Stars24

Forks11

MaintenanceExcellent

Last CommitApr 2, 2026

AddedApr 26, 2026

Actions

View on GitHub View README Plugin Marketplace JSON

Available In

cascade-code-general-misc-1

Safety Signals

Caution

External network access

Connects to servers outside your machine

Requires secrets

Needs API keys or credentials to function

Hawk

Run Inspect AI evaluations at scale on Kubernetes.

Why Hawk

📋 One YAML, full grid. Define tasks, agents, and models. Hawk runs the Cartesian product.
☸️ Kubernetes-native. Each eval gets its own pod and fresh virtualenv. Sandboxes run in separate pods with Cilium network policies for multi-tenant isolation.
🔑 Built-in LLM proxy. Managed proxy for OpenAI, Anthropic, and Google Vertex with automatic token refresh. No API keys to juggle (or bring your own).
📡 Live monitoring. hawk logs -f streams logs in real-time. hawk status gives you a structured JSON report. Every job gets a Datadog dashboard URL on submission.
🖥️ Web UI. Browse eval sets, filter samples by score range and full-text search, compare across eval sets, export to CSV. Filter state lives in the URL for sharing.
🔍 Scout scanning. Run scanners over transcripts from previous evals. Filter transcripts by status, score, model, metadata with a rich query DSL.
🗄️ Data warehouse. Results land in PostgreSQL with trigram search, covering indexes, and computed status columns.
🔒 Access control. Model group permissions gate who can run models, view logs, and scan eval sets. S3 Object Lambda enforces permissions per-object.
✏️ Sample editing. Batch edit scores, invalidate or un-invalidate samples. Full audit trail.
💻 Local mode. hawk local eval-set runs the same config on your machine. --direct skips the venv so you can attach a debugger.
🔄 Resumable scans. Configs save to S3. hawk scan resume picks up where you left off.

Get Started

uv pip install "hawk[cli] @ git+https://github.com/METR/inspect-action"
hawk login
hawk eval-set examples/simple.eval-set.yaml
hawk logs -f   # watch it run
hawk web       # open results in browser

Prerequisites

Before using Hawk, ensure you have:

Python 3.11 or later
uv for dependency management
Access to a Hawk deployment - You'll need:
- Hawk API server URL
- Authentication credentials (OAuth2)
For deployment: Kubernetes cluster, AWS account, Terraform 1.10+

Installation

Install the Hawk CLI:

uv pip install "hawk[cli] @ git+https://github.com/METR/inspect-action"

Or install from source:

git clone https://github.com/METR/inspect-action.git
cd inspect-action
uv pip install -e .[cli]

Quick Start

1. Authenticate

First, log in to your Hawk server:

hawk login

This will open a browser for OAuth2 authentication.

2. Run Your First Evaluation

Create a simple eval config file or use an example:

hawk eval-set examples/simple.eval-set.yaml

3. View Results

Open the evaluation in your browser:

hawk web

Or view logs and results in the configured log viewer.

Configuration

Required Environment Variables

Set these before using the Hawk CLI:

Variable	Required	Description	Example
`HAWK_API_URL`	Yes	URL of your Hawk API server	`https://hawk.example.com`
`INSPECT_LOG_ROOT_DIR`	Yes	S3 bucket for eval logs	`s3://my-bucket/evals`
`LOG_VIEWER_BASE_URL`	No	URL for web log viewer	`https://viewer.example.com`

You can set these in a .env file in your project directory or export them in your shell:

export HAWK_API_URL=https://hawk.example.com
export INSPECT_LOG_ROOT_DIR=s3://my-bucket/evals

Authentication Variables

For API server and CLI authentication:

INSPECT_ACTION_API_MODEL_ACCESS_TOKEN_AUDIENCE
INSPECT_ACTION_API_MODEL_ACCESS_TOKEN_ISSUER
INSPECT_ACTION_API_MODEL_ACCESS_TOKEN_JWKS_PATH

For log viewer authentication (can be different):

VITE_API_BASE_URL - Should match HAWK_API_URL
VITE_OIDC_ISSUER
VITE_OIDC_CLIENT_ID
VITE_OIDC_TOKEN_PATH

Running Eval Sets

hawk eval-set examples/simple.eval-set.yaml

The Eval Set Config File

The eval set config file is a YAML file that defines a grid of tasks, solvers/agents, and models to evaluate.

See examples/simple.eval-set.yaml for a minimal working example.

Required Fields

tasks:
  - package: git+https://github.com/UKGovernmentBEIS/inspect_evals
    name: inspect_evals
    items:
      - name: mbpp
        sample_ids: [1, 2, 3]  # Optional: test specific samples

models:
  - package: openai
    name: openai
    items:
      - name: gpt-4o-mini

metr-inspect-action

Component Overview

Install

Component Details

MCP Servers (2)

README

Hawk

Why Hawk

Get Started

Prerequisites

Installation

Quick Start

1. Authenticate

2. Run Your First Evaluation

3. View Results

Configuration

Required Environment Variables

Authentication Variables

Running Eval Sets

The Eval Set Config File

Required Fields

Optional Fields

Similar Plugins

claude-mem

metr-inspect-action

Component Overview

Install

Component Details

MCP Servers (2)

README

Hawk

Why Hawk

Get Started

Prerequisites

Installation

Quick Start

1. Authenticate

2. Run Your First Evaluation

3. View Results

Configuration

Required Environment Variables

Authentication Variables

Running Eval Sets

The Eval Set Config File

Required Fields

Optional Fields

Similar Plugins

claude-mem

claude-mem

nanobanana

human-resources

excalidraw