Skill

eaa-hypothesis-verification

Use when verifying claims through Docker experimentation. Applies TBV principle to test claims before relying on them. Trigger with experiment setup or claim verification.

npx claudepluginhub emasoft/emasoft-plugins --plugin emasoft-architect-agent

Tool Access

This skill uses the workspace's default tool permissions.

Preview

Patterns for **personally verifying claims** through controlled Docker experimentation. Use this skill when you need to test whether a claim (from docs, researchers, or developers) is actually true.

Supporting Assets

README.mdreferences/docker-experimentation.mdreferences/experiment-scenarios.mdreferences/multiplicity-rule.mdreferences/op-archive-prototype.mdreferences/op-classify-result.mdreferences/op-cleanup-containers.mdreferences/op-design-multiplicity-experiment.mdreferences/op-document-findings.mdreferences/op-execute-experiment.mdreferences/op-setup-docker-experiment.mdreferences/output-templates.mdreferences/researcher-vs-experimenter.md

SKILL.md

Similar Skills

amoa-verification-patterns

Provides evidence-based verification patterns for code and systems, including exit code proofs, E2E testing, and integration checks. Generates pass/fail reports with reproducible evidence.

20 files

ai-maestro-orchestrator-agent

anti-fabrication

Validates factual claims in code reviews, system analysis, documentation, and test reports using tools; prohibits superlatives and unverified metrics.

core

anti-fabrication

Validates claims through tool execution and enforces factual language without superlatives or unsubstantiated metrics. Use for reviewing codebases, analyzing systems, reporting test results, or factual claims about code.

all-skills

Stats

Stars0

Forks0

Last CommitFeb 8, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Hypothesis Verification Skill

Overview

Patterns for personally verifying claims through controlled Docker experimentation. Use this skill when you need to test whether a claim (from docs, researchers, or developers) is actually true.

TBV Principle: Everything is "To Be Verified" until you personally test it. Claims from any source require experimental confirmation before relying on them for decisions.

Prerequisites

Docker installed and running
Write access to experiment output directories
Understanding of the claim to be verified
Isolation environment for safe experimentation

Instructions

Identify the claim to be verified (mark as TBV)
Set up Docker container for isolated testing
Design experiment with multiple approaches (Multiplicity Rule: 3+)
Execute experiments and collect measurements
Document findings in experimentation report
Classify result: VERIFIED, UNVERIFIED, or PARTIALLY VERIFIED
Clean up containers and archive prototype if valuable

Checklist

Copy this checklist and track your progress:

Output

Artifact	Location	Purpose
Experimentation Report	`experiments/<claim-name>/REPORT.md`	Documents hypothesis, approaches tested, measurements, and classification
Status Classification	Report header	VERIFIED / UNVERIFIED / PARTIALLY VERIFIED / TBV
Measurement Data	`experiments/<claim-name>/data/`	Raw metrics, logs, benchmark results
Prototype Archive (if valuable)	`prototypes/<claim-name>/`	Working code with README explaining findings
Docker Cleanup Log	Terminal output	Confirms containers removed after experiment

Docker Experimentation

For Docker container setup and experiment infrastructure, see docker-experimentation.md:

1. Why Docker is Required
1. Container Structure Template
1. docker-compose.yml Template
1. Container Cleanup Procedure

Researcher vs Experimenter

For understanding the critical distinction between roles, see researcher-vs-experimenter.md:

1. The Researcher (What OTHERS say is true)
1. The Experimenter (What I can PROVE is true)
1. The TBV Principle (To Be Verified)
1. Workflow Integration: Researcher → Experimenter

Experiment Scenarios

For when to invoke the experimenter, see experiment-scenarios.md:

1. Case 1: Post-Research Validation
1. Case 2: Issue Reproduction in Isolation
1. Case 3: Architectural Bug Investigation
1. Case 4: New API/Tool Evaluation
1. Case 5: Fact-Checking Claims (Quick Verification)

Multiplicity Rule

For the evidence-based selection process, see multiplicity-rule.md:

1. The Multiplicity Process
1. Example: Implementing a Paper Algorithm
1. Iterative Selection Workflow

Output Templates

For experiment documentation and prototype archiving, see output-templates.md:

1. Experiment Directory Structure
1. Experimentation Report Template
1. Prototype Archive Policy
1. Archive README Template

Quick Reference

Status Classifications

Status	Meaning	Safe to Rely On?
VERIFIED	Experimentally confirmed	YES
UNVERIFIED	Tested but failed to match claim	NO (dangerous)
PARTIALLY VERIFIED	True under specific conditions	YES (with conditions)
TBV	Not yet tested	NO (unknown risk)

Implementation vs Experimental Code

Implementation Code	Experimental Code
Permanent (committed)	Ephemeral (deleted after)
Production-ready	Throwaway testbed
Follows specifications	Generates specifications
One chosen solution	Multiple solutions compared
Part of delivery	Part of decision-making

Workflow Integration Points

Workflow	Trigger	Experimenter Action
BUILD	Architecture decision needs validation	Validates with testbeds
DEBUG	Root cause unclear or fix uncertain	Reproduces in isolation, tests fixes
REVIEW	Performance concerns or architectural questions	Benchmarks alternatives

IRON RULES Summary

Multiplicity: Always test 3+ approaches
Ephemeral code: Delete after findings documented
Evidence-based: Conclusions backed by measurements
Docker isolation: ALL experiments in containers
Documentation: 50% output is the report
TBV by default: Everything unverified until tested

Examples

Example 1: Verify API Performance Claim

Claim: "Redis caches API responses 10x faster than in-memory dict"
Status: TBV

1. Create Docker container with Redis and Python
2. Implement both approaches:
   - Approach A: In-memory dict cache
   - Approach B: Redis cache
   - Approach C: Redis with connection pooling
3. Run 1000 iterations, measure latency
4. Results:
   - Dict: 0.001ms avg
   - Redis: 0.15ms avg
   - Redis pooled: 0.08ms avg
5. Classification: UNVERIFIED (Redis is slower for simple cases)
6. Conditions: Redis faster only for distributed scenarios

Example 2: Verify Library Compatibility

Claim: "Library X works with Python 3.12"
Status: TBV

1. Docker container with Python 3.12
2. Install library X
3. Run test suite
4. Result: Import error on async module
5. Classification: UNVERIFIED
6. Action: Use Python 3.11 or wait for library update

Error Handling

Error	Cause	Solution
Docker not available	Docker daemon not running	Start Docker Desktop or docker service
Container cleanup failed	Orphaned containers	Run `docker system prune`
Experiment inconclusive	Insufficient test iterations	Increase sample size, reduce variables
Conflicting results	Environment differences	Standardize container configuration
Resource exhaustion	Too many containers	Clean up between experiments

Resources

docker-experimentation.md - Container setup and templates
researcher-vs-experimenter.md - Role distinction
experiment-scenarios.md - When to invoke experimenter
multiplicity-rule.md - Evidence-based selection process
output-templates.md - Report and archive templates

eaa-hypothesis-verification

Tool Access

Preview

Supporting Assets

SKILL.md

Similar Skills

Help us improve

Help us improve

eaa-hypothesis-verification

Tool Access

Preview

Supporting Assets

SKILL.md

Hypothesis Verification Skill

Overview

Prerequisites

Instructions

Checklist

Output

Table of Contents

Docker Experimentation

Researcher vs Experimenter

Experiment Scenarios

Multiplicity Rule

Output Templates

Quick Reference

Status Classifications

Implementation vs Experimental Code

Workflow Integration Points

IRON RULES Summary

Examples

Example 1: Verify API Performance Claim

Example 2: Verify Library Compatibility

Error Handling

Resources

Similar Skills

Help us improve

Hypothesis Verification Skill

Overview

Prerequisites

Instructions

Checklist

Output

Table of Contents

Docker Experimentation

Researcher vs Experimenter

Experiment Scenarios

Multiplicity Rule

Output Templates

Quick Reference

Status Classifications

Implementation vs Experimental Code

Workflow Integration Points

IRON RULES Summary

Examples

Example 1: Verify API Performance Claim

Example 2: Verify Library Compatibility

Error Handling

Resources