From harness-claude
Provides structured UI design critiques using Like/Wish/Wonder and What/Why/Improve frameworks, separating observations from judgments for PR reviews, mockups, prototypes, and handoffs.
npx claudepluginhub intense-visions/harness-engineering --plugin harness-claudeThis skill uses the workspace's default tool permissions.
> Structured feedback — critique frameworks (like/wish/wonder, what/why/improve), separating subjective preference from objective assessment, avoiding "I don't like it"
Facilitates structured design critiques with presentation, clarification, feedback rounds ('I notice...', 'I wonder...'), and action items. Covers desk, team, cross-team, stakeholder types.
Provides structured critiques of UI designs, wireframes, mockups, and design systems using UX principles, usability heuristics, hierarchy, accessibility, and consistency checks.
Provides structured, actionable feedback on UI designs, Figma files, wireframes, and user flows using JTBD, Gestalt principles, and Nielsen heuristics with prioritized issues.
Share bugs, ideas, or general feedback.
Structured feedback — critique frameworks (like/wish/wonder, what/why/improve), separating subjective preference from objective assessment, avoiding "I don't like it"
Separate observation from judgment before speaking. Every piece of design feedback must begin with what you see, not what you feel:
Apply the Like/Wish/Wonder framework for generative feedback. Structure every critique response in three parts:
Apply the What/Why/Improve framework for evaluative feedback. When the goal is to assess quality rather than generate ideas:
Classify feedback into three weight categories. Not all feedback carries equal weight:
Use the severity/confidence matrix to prioritize feedback. Every critique item should be tagged:
| Severity | Confidence | Example | Action |
|---|---|---|---|
| Critical | High | "Submit button is invisible on mobile — tested iPhone 14, Safari 17" | Blocks approval |
| Major | High | "Form has no error recovery — users must re-enter all fields" | Must fix this iteration |
| Minor | Medium | "Icon weight could be 1.5px instead of 2px for consistency" | Backlog for polish |
| Suggestion | Low | "Wonder if a tooltip would help here" | Consider in future |
This matrix prevents a 90-minute review from being derailed by icon weight discussions while a checkout-blocking bug goes unmentioned.
Critique the design, not the designer. Frame all feedback in terms of the artifact, never the person:
Anchor critique in user goals, not aesthetic preference. The ultimate test of a design is whether it helps the user accomplish their goal:
Close every critique session with a prioritized action list. Synthesize all feedback into a ranked list:
Design critique operates at five distinct levels of depth. Skilled reviewers consciously choose which level is appropriate:
Level 1: Visceral reaction (0-3 seconds). First impression, gut feeling. "This feels premium" or "this feels dated" captures the immediate emotional response. Useful as a data point but never sufficient as feedback. Record it, but do not act on it until you can articulate why.
Level 2: Descriptive observation (what exists). Catalog what you see without judgment. "The header uses a sans-serif font at 32px bold. The primary color is #2563EB. There are four navigation items. The hero image is full-bleed." This establishes a shared factual baseline before opinions enter the conversation.
Level 3: Analytical assessment (how it works). Apply design principles to the observations. "The 32px heading with 16px body text creates a 2:1 type scale ratio — this is below the 2.5:1 minimum recommended for clear hierarchy. The four navigation items use identical visual weight, offering no signal about which is the current page." This is where the majority of useful critique lives.
Level 4: Interpretive evaluation (why it matters). Connect the analysis to user outcomes. "The weak type hierarchy means users scanning the page cannot distinguish section headings from body text at a glance, increasing time-to-task by an estimated 2-4 seconds per page. For a dashboard viewed 20+ times daily, this compounds into meaningful friction."
Level 5: Strategic questioning (what else is possible). Zoom out from the specific solution to question the problem framing. "The dashboard assumes users need to see all metrics simultaneously, but analytics show 80% of sessions focus on a single metric. Would a single-metric focus view with drill-down better serve the primary use case?" This level can redirect entire product directions.
A structured critique session follows this protocol:
Not all critique happens in a room. Pull requests, Figma comments, and Slack threads require adapted techniques:
Written feedback format. Every asynchronous critique comment should follow this template:
This structure prevents the two failure modes of async critique: comments that are too vague to act on and comments that are so long nobody reads them.
Figma annotation conventions. Pin comments to specific layers, not arbitrary canvas positions. Use color-coded labels: red for critical, orange for major, blue for minor, gray for suggestion. Resolve comments when addressed rather than leaving a thread of "fixed" replies.
Pull request design review. When reviewing a PR that changes UI:
Teams drift in critique quality over time. Calibrate regularly by reviewing the same design independently and comparing feedback:
The Drive-By "I Don't Like It." Feedback with no specificity, no reasoning, and no suggested alternative. "I don't like the spacing" tells the designer nothing about what spacing, why it is problematic, or what to change. The fix: require every "I don't like" to be followed by "specifically [element], because [reason], and I suggest [alternative]." If the reviewer cannot complete the sentence, the feedback is not ready to share.
The Pixel Police. Focusing exclusively on implementation details (2px misalignment, slightly wrong shade of gray) while ignoring structural issues (navigation model is confusing, information hierarchy is inverted, user flow has a dead end). A review that spends 40 minutes on padding inconsistencies while the core user flow is broken has inverted its priorities. The fix: enforce critique levels — structure first, details second.
The Solution Stampede. Jumping directly to solutions without understanding the problem. "Just add a sidebar" or "make it more like Notion" or "use tabs instead of a dropdown." Solution-first feedback skips the analytical step and assumes the reviewer's first instinct is correct. The fix: require feedback to follow the What/Why/Improve sequence — the "what" and "why" must be validated before any "improve."
The Frankenstein Review. Multiple reviewers each contribute one change, none coordinated, resulting in an incoherent patchwork. Reviewer A wants more whitespace, B wants more density, C wants larger type, D wants smaller type. The designer implements all four and the result is worse than the original. The fix: the design owner synthesizes feedback into a coherent revision rather than applying each suggestion independently.
The Compliment Sandwich Trap. Framing critical feedback between two hollow compliments: "The colors are nice. The entire navigation model is broken. But great font choice!" This trains people to ignore the compliments and softens the impact of the critique. The fix: give genuine, specific positive and critical feedback separately — do not manufacture fake positives as wrapping paper.
The Stakeholder Override. A senior leader's personal preference overrides well-reasoned design decisions: "The CEO wants it blue." The fix: acknowledge the constraint but document the trade-off with measurable impact: "Changing from orange (#F97316) to blue (#2563EB) reduces the CTA's contrast against the blue navigation bar from 8.2:1 to 1.3:1."
Apple Design Reviews. Apple's industrial design team under Jony Ive held daily critique sessions with physical prototypes. The protocol: place the object on the table, and everyone critiques the object — never the person who designed it. "The radius on this corner creates a shadow line that interrupts the continuous surface" is the level of specificity expected. This culture produced the unibody MacBook, the iPhone's edge-to-edge glass, and the AirPods case.
Google's Design Sprint Critique. In Google Ventures' design sprint methodology, critique happens on Day 3 ("Decide"). The team silently reviews printed concepts on a wall, places dot-vote stickers on effective elements, and discusses the clusters. Silent voting eliminates groupthink. A concept with 15 dots clustered on its navigation but zero dots on its data visualization tells a clear story without discussion.
Stripe's Design Review Protocol. Stripe runs formal reviews for any customer-facing change. Three explicit questions: (1) Does this design handle all edge cases in the state matrix? (2) Does it maintain consistency with the existing design language? (3) Does it degrade gracefully on slow connections and small screens? A "P0" finding (checkout button below the fold on iPhone SE) blocks release. A "P2" finding (reduce font weights from 4 to 2) is logged for the next iteration.
Airbnb's Design Language System Reviews. Airbnb's DLS team runs weekly sessions where any team can bring a component variation. Critique is anchored against DLS principles: unified, iconic, conversational. "This card variant introduces a fifth corner radius (6px) when the system uses only 4px, 8px, 12px, and 16px" is grounded in system consistency. Components that pass review carry a "reviewed" badge in the component library.