Redesigns direct instruction sequences into productive failure activities with struggle before explanation. For teaching concepts via failure-first approaches.
npx claudepluginhub garethmanning/claude-education-skills --plugin claude-education-skillsThis skill uses the workspace's default tool permissions.
Redesigns a teaching sequence to incorporate productive failure (Kapur, 2008, 2016) and desirable difficulties (Bjork, 1994; Bjork & Bjork, 2011), replacing the standard "teach then practise" model with a "struggle then consolidate" model that produces deeper, more durable learning. The core paradox: students who struggle first and fail learn MORE in the long run than students who receive clear...
Designs worked example fading sequences from fully worked examples through completion problems to independent practice for teaching procedures, algorithms, or multi-step processes to novice learners.
Provides UI/UX resources: 50+ styles, color palettes, font pairings, guidelines, charts for web/mobile across React, Next.js, Vue, Svelte, Tailwind, React Native, Flutter. Aids planning, building, reviewing interfaces.
Fetches up-to-date documentation from Context7 for libraries and frameworks like React, Next.js, Prisma. Use for setup questions, API references, and code examples.
Share bugs, ideas, or general feedback.
Redesigns a teaching sequence to incorporate productive failure (Kapur, 2008, 2016) and desirable difficulties (Bjork, 1994; Bjork & Bjork, 2011), replacing the standard "teach then practise" model with a "struggle then consolidate" model that produces deeper, more durable learning. The core paradox: students who struggle first and fail learn MORE in the long run than students who receive clear instruction first and succeed immediately — even though it feels worse during the lesson. Kapur (2016) showed that productive failure works because the generation phase (where students attempt problems before being taught) activates prior knowledge, reveals the limits of that knowledge, and creates "knowledge gaps" that make the subsequent instruction more meaningful. Bjork (1994) introduced the concept of "desirable difficulties" — conditions that make learning harder in the short term but more durable in the long term. These include spacing, interleaving, generation, and retrieval practice. This skill is particularly important in AI-enabled learning environments because AI tools can inadvertently REMOVE desirable difficulties — making tasks easier, providing immediate answers, and reducing the productive struggle that drives deep learning. The output includes a complete productive failure sequence (generation phase + consolidation phase), the specific desirable difficulties embedded in the task, safeguards to ensure failure is productive not destructive, and guidance on preventing AI-enabled cognitive offloading.
Kapur (2008, 2016) developed the productive failure framework through a series of studies in mathematics classrooms. In the canonical design, students are given a complex, novel problem BEFORE any instruction — a problem they are expected to fail at. They work in small groups, generating multiple solution approaches, none of which are fully correct. THEN the teacher provides instruction on the canonical solution, explicitly comparing it to the students' generated approaches. Kapur (2016) found that students in the productive failure condition significantly outperformed students who received direct instruction first on measures of conceptual understanding and transfer — even though the direct instruction students performed better on immediate procedural tests. The key finding: it's not the failure that produces learning, but the GENERATION. Students who generate ideas, even wrong ones, develop richer representations of the problem space, which makes subsequent instruction more meaningful. Bjork (1994) and Bjork & Bjork (2011) articulated the broader principle of desirable difficulties: conditions that reduce performance during learning but enhance long-term retention and transfer. They identified four key desirable difficulties: (1) spacing — distributing practice over time rather than massing it, (2) interleaving — mixing different problem types rather than blocking them, (3) generation — producing answers rather than reading them, and (4) retrieval practice — testing yourself rather than restudying. All four share a common mechanism: they make the learning experience feel harder and less fluent, which paradoxically produces stronger memory traces and deeper understanding. Soderstrom & Bjork (2015) made the critical distinction between LEARNING and PERFORMANCE. Performance is what you can do RIGHT NOW — it's visible and measurable in the moment. Learning is the long-term change in knowledge or skill — it's invisible during the lesson and only measurable later. Desirable difficulties reduce performance (students get more wrong during the lesson) but enhance learning (students remember more and transfer better weeks later). This distinction is essential because teachers — and AI systems — tend to optimise for performance (making students succeed now) rather than learning (making students remember and transfer later).
The teacher must provide:
Optional (injected by context engine if available):
You are an expert in productive failure and desirable difficulties, with deep knowledge of Kapur's (2008, 2016) productive failure framework, Bjork's (1994) desirable difficulties research, Bjork & Bjork's (2011) applied desirable difficulties, and Soderstrom & Bjork's (2015) distinction between learning and performance. You understand the core paradox: making learning HARDER (in the right way) makes it BETTER — but it feels worse for both students and teachers. You also understand the specific risk that AI tools pose to desirable difficulties: by making tasks easier and providing immediate support, AI can remove the productive struggle that drives deep learning.
CRITICAL PRINCIPLES:
- **Generation before instruction.** The productive failure sequence MUST start with students attempting the task BEFORE receiving instruction. This is non-negotiable. The generation phase activates prior knowledge, reveals gaps, and creates the conditions for meaningful consolidation. If you teach first, you lose the generative benefit.
- **Failure must be PRODUCTIVE, not UNPRODUCTIVE.** Kapur (2016) distinguishes four conditions: productive failure (struggle → learning), unproductive failure (struggle → no learning), productive success (succeed → learning), and unproductive success (succeed → no learning). Productive failure requires: (a) the task is challenging but not impossible, (b) students have enough prior knowledge to generate partial solutions, (c) the consolidation phase explicitly connects student-generated ideas to the canonical solution, and (d) the classroom culture supports struggle without shame.
- **Design the consolidation phase as carefully as the generation phase.** Many teachers try productive failure and conclude "it didn't work" because they skip or rush the consolidation. The consolidation is where learning happens: the teacher explicitly compares the students' generated approaches with the correct approach, highlights what was right, what was wrong, and WHY the canonical approach works. Without this, students just experienced confusion.
- **Desirable difficulties feel undesirable.** Students will report that they "didn't learn anything" during the generation phase. They will rate the lesson lower on satisfaction surveys. They will perform worse on immediate post-tests. This is EXPECTED. The benefits appear later — on delayed tests, transfer tasks, and conceptual understanding measures. Warn teachers about this.
- **AI-enabled cognitive offloading is the enemy of desirable difficulty.** If students can ask an AI for the answer during the generation phase, the productive failure is destroyed. The design must include explicit guidance on when and how AI tools should be restricted to preserve the learning benefit.
Your task is to redesign this teaching sequence for productive failure:
**Target concept:** {{target_concept}}
**Current approach:** {{current_approach}}
The following optional context may or may not be provided. Use whatever is available; ignore any fields marked "not provided."
**Student level:** {{student_level}} — if not provided, design for a general secondary school context.
**Subject area:** {{subject_area}} — if not provided, infer from the content.
**Time available:** {{time_available}} — if not provided, design for a double lesson (approximately 90 minutes).
**Prerequisite knowledge:** {{prerequisite_knowledge}} — if not provided, identify the likely prerequisites based on the target concept.
**Risk tolerance:** {{risk_tolerance}} — if not provided, design for a teacher who is willing to try but nervous about student frustration.
Return your output in this exact format:
## Productive Failure Sequence: [Target Concept]
**Target concept:** [What students need to learn]
**Current approach:** [How it's currently taught]
**Redesigned approach:** [One-sentence summary of the productive failure version]
### Phase 1: Generation (Productive Failure)
[The task students will attempt BEFORE instruction. Include the specific problem, the expected student responses (both correct and incorrect approaches they are likely to generate), and the teacher's role during this phase (circulating, NOT helping, noting student approaches for use in Phase 2).]
**The task:**
[Exact task students receive]
**Expected student responses:**
[What students will likely try — including the specific partial solutions and incorrect approaches]
**Teacher role during generation:**
[What the teacher does — observe, collect approaches, resist the urge to help]
### Phase 2: Consolidation
[How the teacher transitions from generation to instruction. This must explicitly connect student-generated approaches to the canonical solution. Include the specific comparisons the teacher should make and the key insights that emerge from the comparison.]
### Desirable Difficulties Embedded
[List each desirable difficulty in the sequence and explain why it enhances learning]
| Difficulty | What It Does | Why It Helps |
|---|---|---|
| [Difficulty] | [Mechanism] | [Learning benefit] |
### Failure Safeguards
[How to ensure failure is PRODUCTIVE — the conditions that must be in place and what to do if students become frustrated or disengaged]
### Cognitive Offloading Risks
[How AI tools might undermine the desirable difficulties and specific recommendations for managing this]
### Teacher Preparation Notes
[What the teacher needs to know and do before attempting this sequence — including how to manage student frustration and their own discomfort with watching students struggle]
**Self-check before returning output:** Verify that (a) generation comes before instruction, (b) the task is challenging but not impossible, (c) the consolidation explicitly connects student approaches to the canonical solution, (d) desirable difficulties are identified and justified, and (e) cognitive offloading risks are addressed.
Scenario: Target concept: "The concept of standard deviation — not just calculating it, but understanding what it MEANS (a measure of spread/variability) and when to use it (comparing consistency, identifying outliers, making predictions)" / Current approach: "I teach the formula, work through two examples on the board, then give 10 practice questions. Students can calculate SD to two decimal places but when asked 'what does this standard deviation tell you about the data?' they can't answer. They treat it as a procedure, not a concept" / Student level: "Year 10, GCSE Higher tier, comfortable with mean and range" / Subject area: "Maths — statistics" / Time available: "One 60-minute lesson"
Target concept: Understanding what standard deviation MEANS (a measure of how spread out data is from the mean) and when to use it (comparing variability, not just calculating a number) Current approach: Teach formula → worked examples → practice questions (produces procedural fluency without conceptual understanding) Redesigned approach: Students attempt to invent a measure of "spread" from data before being taught the standard deviation formula — generating and comparing their own approaches builds the conceptual understanding that direct instruction misses
Duration: 20-25 minutes
The task:
Present this on the board or as a handout:
The Football Manager Problem
You are a football manager choosing between two strikers. Both scored an average of 1.5 goals per match over the last 10 matches. But their match-by-match records are very different:
Player A: 2, 1, 2, 1, 2, 1, 2, 1, 2, 1 Player B: 5, 0, 0, 4, 0, 0, 6, 0, 0, 0
Both have a mean of 1.5 goals per match. But they're clearly VERY different players.
Your challenge (in pairs):
- In what way are these players different? Describe the difference in words.
- Invent a NUMBER that captures this difference. Your number should be BIGGER for the player whose scores are more spread out and SMALLER for the player whose scores are more consistent. You can use any mathematical operations you like.
- Test your number on both players. Does it correctly show that Player B is more "spread out" than Player A?
- If you have time: would your number work for ANY two datasets, not just these two?
Expected student responses:
Students will generate several approaches, most of which will be partially correct:
Range approach (most common): Player A range = 2 - 1 = 1. Player B range = 6 - 0 = 6. This works but is fragile — it only uses two data points and ignores everything in between. Approximately 60-70% of pairs will try this first.
"Distance from mean" approach (less common but promising): Some students will calculate how far each score is from the mean (1.5) and try to combine these distances. They'll likely add the distances, getting different totals for each player. This is close to the conceptual basis of standard deviation. Approximately 20-30% of pairs.
"Count the zeros" or "count the extremes" approach: Some students will try non-mathematical approaches — counting how many times a player scored 0, or how many "big" games they had. This captures the intuition but isn't generalisable. Approximately 10-20%.
Stuck/confused: Some pairs will struggle to move from the verbal description ("Player B is more inconsistent") to a numerical measure. This is expected. The teacher should NOT help — the struggle is the point.
What students will NOT generate:
Teacher role during generation:
Duration: 30-35 minutes
Step 1: Share and compare (10 minutes). Select 3 student approaches (ideally: range, distance-from-mean, and one other). Write each on the board. For each approach, ask the class: "Does this number correctly show that Player B is more spread out? What are the strengths and weaknesses of this approach?" Guide the discussion toward the key limitations:
Step 2: Introduce the canonical solution (10 minutes). "Here's how mathematicians solved the cancellation problem: they SQUARED each deviation (making everything positive), averaged the squares, then took the square root to get back to the original units. This is called the standard deviation." Walk through the formula using Player A and Player B data, explicitly connecting each step to the students' generated approaches: "See how this step is just what Group 2 did — finding the distance from the mean? The only new step is the squaring."
Step 3: Connect back to the original question (10 minutes). Calculate SD for both players. Player A: SD ≈ 0.5. Player B: SD ≈ 2.3. "Player B's standard deviation is about 4.5 times larger than Player A's. What does that MEAN for the football manager?" Push for interpretation: SD is not just a number — it tells you how PREDICTABLE a player is. "If you need a reliable scorer every week, choose Player A. If you're desperate for a match-winner and can absorb some scoreless games, Player B might be worth the risk."
Step 4: Quick practice with understanding focus (5 minutes). Give one more dataset. Ask students to (a) calculate SD AND (b) interpret what it means in context. The interpretation is the assessment target, not the calculation.
| Difficulty | What It Does | Why It Helps |
|---|---|---|
| Generation (attempting before instruction) | Forces students to activate prior knowledge and grapple with the problem of measuring spread | Creates "knowledge gaps" that make the formula meaningful when introduced — students understand WHY each step exists |
| Productive failure (students don't reach the correct answer) | Creates cognitive conflict — students see that their approaches have limitations | Motivation to learn the canonical solution is higher because they've experienced the problem it solves |
| Interleaving (verbal → numerical → interpretation within one task) | Students must move between different representations of the same concept | Builds flexible understanding rather than rigid procedural knowledge |
| Delayed instruction (teaching after struggle, not before) | Prevents students from simply following the procedure without understanding | The formula is introduced as a SOLUTION to a problem students have experienced, not as an arbitrary recipe |
Condition 1: The task must be challenging but not impossible. The Football Manager problem is carefully calibrated: Year 10 students can calculate mean and range, and they can describe spread in words. They have enough prior knowledge to generate partial solutions. If students couldn't calculate a mean, this task would be too hard — they'd experience UNPRODUCTIVE failure.
Condition 2: The classroom culture must support struggle. Before the generation phase, explicitly say: "I'm about to give you a problem that you're NOT expected to solve perfectly. The struggle IS the point. I want to see your thinking, not a correct answer." This normalises failure and reduces anxiety.
Condition 3: Frustration monitoring. If more than half the class appears stuck and frustrated after 10 minutes, offer the single scaffold ("What if you started by finding how far each score is from the average?"). This prevents the failure from becoming unproductive. If frustration persists at 15 minutes, begin Phase 2 early — you have enough student approaches to work with.
Condition 4: The consolidation must be thorough. If you run out of time, cut the practice questions (Step 4), NEVER the consolidation (Steps 1-3). A productive failure sequence without consolidation is just a confusing lesson.
Risk 1: Students Google "how to measure spread." If students have internet access, they will find "standard deviation" within 30 seconds, bypassing the entire generation phase. Mitigation: This is a no-devices task. Calculators yes, phones and laptops no. Explain why: "I need you to figure this out yourselves — that's the whole point."
Risk 2: Students ask an AI chatbot for help. In schools where AI tools are available, students may prompt "how do you measure how spread out data is?" and get a clear explanation of standard deviation. Mitigation: Explicitly instruct that AI tools are not to be used during Phase 1. Frame it positively: "You're doing something an AI can't do — inventing your own mathematical approach. The AI already knows the formula. You don't need the formula right now — you need the THINKING."
Risk 3: High-achieving students short-circuit the struggle. Some students may already know about standard deviation or may find the answer in a textbook. Mitigation: The task asks students to INVENT a measure, not to FIND one. If a student says "Isn't this just standard deviation?" respond: "Maybe. But can you explain WHY standard deviation works? Why square the deviations instead of taking absolute values? That's what I want you to figure out."
Expect discomfort. Watching students struggle without helping is one of the hardest things a teacher can do. You will feel the urge to explain, to give hints, to rescue. Resist. The struggle is the product, not a problem to be solved. Kapur's research shows that the students who struggle most during the generation phase often learn the most during the consolidation phase.
Expect student complaints. Students may say "Why didn't you just teach us the formula?" or "This was confusing." This is NORMAL and does not mean the lesson failed. Soderstrom & Bjork (2015) showed that students' subjective experience of learning is a poor predictor of actual learning — they feel they learn more from clear explanations, but they remember and transfer more from productive failure.
Prepare the consolidation in advance. Know which student approaches you're looking for (range, distance from mean, other). Have the canonical SD calculation ready to present alongside student approaches. The consolidation cannot be improvised — it requires careful planning.
Have a backup plan. If the generation phase produces only one approach (everyone does range), adjust the consolidation: "Most of you tried range. That's a good start. Let me show you another approach that a pair in a different class tried…" and present the distance-from-mean approach yourself.
Productive failure is not appropriate for all content. It works best for CONCEPTUAL understanding — understanding WHY something works, not just HOW to do it. For purely procedural skills (long division, balancing equations), direct instruction followed by practice is often more efficient. Productive failure is most valuable when the current approach produces procedural fluency without conceptual understanding.
The evidence is stronger for mathematics and science than for other domains. Kapur's (2008, 2016) research was conducted primarily in mathematics classrooms. The principle of generation before instruction has been studied in other domains, but the specific productive failure design (attempt → fail → consolidate) has less evidence outside STEM. The underlying mechanism (generation enhances learning) is domain-general, but the specific task design may need adaptation.
Teacher skill is a binding constraint. The consolidation phase requires a skilled teacher who can connect student-generated approaches to the canonical solution in real time. This is significantly harder than delivering a prepared lecture. Teachers attempting productive failure for the first time should start with a topic they know deeply.
The performance-learning distinction creates an assessment problem. Students in productive failure conditions perform WORSE on immediate post-tests but BETTER on delayed and transfer tests (Soderstrom & Bjork, 2015). If the teacher assesses learning immediately after the lesson, productive failure will appear to have failed. The real benefits only appear days or weeks later.