DeepWork — Make AI Agents Trustworthy at Complex Tasks
AI agents are powerful, but they're unreliable. They go off-script, skip steps, hallucinate details, and don't check their own work. The more complex the task, the worse it gets.
DeepWork fixes this with two systems: Workflows that force agents to follow a structured process step by step, and Reviews that automatically verify every change against your rules. Together, they make agents trustworthy enough to run autonomously on real work.
Quickstart
Claude Code (Terminal)
claude plugin marketplace add Unsupervisedcom/deepwork
claude plugin install deepwork@deepwork-plugins
claude
Then start a new session. First, do the task you want to automate — just ask Claude to do it, and work with Claude to refine it as you go:
Research our top 3 competitors and write a SWOT analysis for each one.
Then you will iterate with feedback as you go - Don't include feature X from competitor Y - they are sunsetting it, Be sure to always include pricing approach in the analysis, etc as you go.
Once you're happy with the result, turn it into a reusable workflow:
/deepwork Create a job called "competitive_research" with a workflow called "update_swot" that does what we just did.
It will ask you some clarification questions to make sure it is tuning it in well for you, then make a repeatable flow.
Then you can call it anytime with /deepwork update_swot. It will do it repeatably and reliably.
For bonus points, try /deepwork learn after running your workflow as well, and watch it auto-tune itself.
Claude Desktop
- Enter Cowork mode (toggle at top of screen)
- Select
Customize with plugins at the bottom of the page.
- Select
Personal, click the +, and select Add marketplace from GitHub'
- Set the URL to
Unsupervisedcom/deepwork and press Sync. (adding a marketplace currently fails on Windows)
- Select the deepwork plugin and click Install.
- In Cowork mode, select 'Start a deepwork workflow'
The Problem
When you ask an agent to do something simple — write a function, answer a question — it works great. But when you ask it to execute a multi-step process — research a topic across multiple sources, audit a codebase, produce a structured report — things fall apart:
- It skips steps or invents shortcuts
- It forgets context between steps
- It doesn't check its work against your standards
- It produces inconsistent results every time you run it
You end up babysitting the agent, which defeats the purpose of automation.
How DeepWork Solves This
DeepWork gives you three complementary systems that work together to make agents trustworthy:
Workflows: Structured Execution with Quality Gates
Workflows force the agent to follow a strict, step-by-step process. Each step has defined inputs, outputs, and quality checks. The agent can't skip ahead or go off-script — it must complete each step and pass its quality gates before moving on.
The fastest way to create a workflow: do the task once with Claude, then turn it into a workflow. Claude already has the full context of what worked, so it can generate a hardened, repeatable process from what it just did.
Write a tutorial for the new export feature we just launched.
Claude does the work. You review the result, give feedback, iterate. Once you're happy:
/deepwork Create a job called "tutorial_writer" with a workflow called "write_tutorial" that does what we just did.
DeepWork asks you a few questions (~10 minutes), then generates the steps. After that, you can run it whenever:
/deepwork tutorial_writer
The agent follows the workflow step by step, every time, the same way. You build the skill once and reuse it forever.
Reviews: Automated Verification of Every Change
Reviews are the second layer of trust. You define review policies in .deepreview config files — what files to watch, what to check for, how to group them — and they run automatically against every change.
/review
One command. Every rule you've defined runs in parallel, each review agent scoped to exactly the files and instructions it needs.
Reviews catch what workflows can't: style regressions, documentation falling out of sync, security issues in code the workflow didn't directly touch, requirements that lost their test coverage.
DeepSchemas: Shared Contracts Between You and the Agent
DeepSchemas are file-level schemas that define what a file should look like — its structure, requirements, and validation rules. They act as contracts: when an agent writes or edits a file, applicable schemas are checked immediately. Failures are caught at write time, before they ever reach review.
/deepschema