Write Rubric
Create an analytic rubric that enables consistent, criterion-based evaluation of complex student work and generates actionable feedback.
Why This Is Best Practice
Adopted by: College Board AP program, IB Diploma, most research university writing programs, and K-12 standards-based grading adoptions
Impact: Brookhart (2013) meta-analysis shows rubrics improve inter-rater reliability from ~60% to ~90% agreement; student performance on rubric-assessed tasks improves when the rubric is shared before the task begins
Rubrics make implicit quality standards explicit — both for the assessor and the learner. When students receive a rubric before working, they self-assess against criteria and produce stronger work. When shared after, rubrics explain grades rather than develop learning.
Steps
- Choose rubric type — Analytic rubric: separate rows for each criterion, separate scores per criterion. Best for complex tasks needing diagnostic feedback. Holistic rubric: single score for overall quality. Use only when the whole matters more than the parts (e.g., creative writing voice).
- List the criteria — Identify 3–6 dimensions that distinguish high-quality from low-quality work. Criteria should be independent (not overlapping) and correspond to the learning objectives. Example for a research paper: thesis clarity, evidence quality, argument structure, citation format, writing mechanics.
- Define 3–4 performance levels — Name levels descriptively, not pejoratively. Common scales: "Exemplary / Proficient / Developing / Beginning" or "4 / 3 / 2 / 1." Avoid "Excellent / Good / Fair / Poor" — they are evaluative, not descriptive.
- Write behavioral descriptors — For each criterion × level cell, write what the work looks like at that level in observable, specific terms. Not "good use of evidence" — instead "uses three or more sources with direct quotes that directly support each claim."
- Anchor with examples — Attach one sample of work per performance level for each key criterion. Anchors calibrate raters more than any written descriptor.
- Pilot and calibrate — Have two raters independently score 5 samples. Compare scores. Rewrite any descriptor where raters disagree by more than one level. Repeat until inter-rater agreement ≥85%.
- Share with learners before the task — Release the rubric when the assignment is given. Allow learners to self-assess a draft against it before submission.
Rules
- Descriptors must describe the work, not the student ("the argument lacks evidence" not "the student did not try").
- Each performance level must be distinguishable from adjacent levels — if raters cannot tell 3 from 4, merge them.
- Criteria must align directly to learning objectives — if a criterion does not map to an objective, remove it.
- Do not weight presentation/formatting criteria more than 10–15% of total score for content-focused tasks.
- Rubric scores are not averaged to a letter grade mechanically; teacher judgment applies within the rubric's guidance.
Examples
Criterion: Evidence Quality (in a persuasive essay)
- 4 — Exemplary: Every claim is supported by at least one source; sources are credible and directly relevant; quotations are properly integrated and analyzed.
- 3 — Proficient: Most claims are supported; at least one source lacks credibility or relevance; quotations present but analysis is surface-level.
- 2 — Developing: Evidence is present but frequently irrelevant or unsupported by citation; quotation integration disrupts argument flow.
- 1 — Beginning: Claims are unsupported or rely on personal opinion only; no citations present.
Common Mistakes
- Vague descriptors — "Good evidence" vs. "Poor evidence" without behavioral specifics produces low inter-rater reliability; raters revert to holistic judgment.
- Too many criteria — A 10-criterion rubric overwhelms both rater and learner; reduce to 4–6 most impactful dimensions.
- Rubric revealed after grading — Sharing the rubric only as a graded feedback tool misses 80% of its developmental value; it must precede the task.
When NOT to Use
- When evaluating creative or expressive work where the primary goal is originality and a predetermined criterion grid would constrain the range of valid approaches and penalize legitimate artistic risk.
- When a single, globally holistic judgment is both faster and adequate — brief low-stakes formative checks do not benefit from analytic rubrics and the overhead of writing one exceeds its instructional value.
- When the task is being assessed for the first time with no prior examples of student work, because behavioral descriptors written without anchor samples will be too vague to produce consistent inter-rater agreement.