Short answer. Item discrimination is a statistical measure of how well a quiz question distinguishes between students who know the material and students who don't. A good question is one that strong students get right and weak students get wrong; a bad question is one where the pattern is reversed or random.
How it's calculated (simplified)
For each question, you compute:
The proportion of **high-scoring students** (typically top 27% on the overall quiz) who got the question rightThe proportion of **low-scoring students** (bottom 27%) who got it rightThe difference is the **discrimination index** (D)D ranges from -1 to +1:
+1: All strong students right, all weak students wrong — ideal+0.3 to +0.5: Good question, keep0 to +0.2: Weak question, reviseNegative: Weak students outperformed strong students — broken question, throw outWhy it matters
A high-difficulty question can still be useful if it has good discrimination. A low-difficulty question with poor discrimination is worse than useless — it makes the quiz feel rigorous without testing anything.
The classic "broken question" pattern: question has poor discrimination because one of the distractors is *actually defensible*. Strong students see the ambiguity and pick the "wrong" answer; weak students don't notice and pick the keyed answer. Fix: rewrite the distractor.
How to find low-discrimination questions
Most LMSs and quiz tools surface this:
Canvas, Blackboard, Moodle: quiz analytics often show discrimination index per itemSimpleQuizMaker: per-question analytics show miss rate; high-scorer vs low-scorer split availableManual: For small classes, sort students by total score; check each question's distributionAfter a quiz, glance at the discrimination indices. The 3-5 lowest-scoring items are candidates for rewriting before the next administration.
Difficulty — proportion who got the item right (separate from discrimination)Point-biserial correlation — a more rigorous version of discriminationCronbach's alpha — reliability of the quiz overallA worked example
20 students take a quiz. On Question 7:
Top quartile (5 students): 4 right, 1 wrong → P_high = 0.80Bottom quartile (5 students): 1 right, 4 wrong → P_low = 0.20Discrimination index = 0.80 − 0.20 = **+0.60** → excellent question, keep.On Question 8:
Top quartile: 3 right, 2 wrong → P_high = 0.60Bottom quartile: 3 right, 2 wrong → P_low = 0.60Discrimination index = 0.60 − 0.60 = **0.00** → not separating strong from weak; revise or replace.On Question 9:
Top quartile: 2 right, 3 wrong → P_high = 0.40Bottom quartile: 4 right, 1 wrong → P_low = 0.80Discrimination index = 0.40 − 0.80 = **−0.40** → broken question. Strong students are reading it as ambiguous; weak students aren't. Almost always a flaw in the distractor or the keyed answer.Sample sizes for reliable discrimination
The discrimination index is noisy on small classes. Rough guidance:
10 students: results are suggestive, not reliable. Use to flag, not to discard.30 students: reasonable signal; trust patterns across multiple questions.100+ students: reliable enough that single-question discrimination values mean something.For high-stakes item banks (AP, MCAT, SAT), each item is piloted across thousands of students before going live.
[How to Write Good Quiz Questions](/blog/how-to-write-good-quiz-questions)[Multiple Choice Distractor Design](/blog/multiple-choice-distractor-design)[Quiz Analytics — Teacher Guide](/blog/quiz-analytics-teacher-guide)[How to Write Hard Quiz Questions](/blog/how-to-write-hard-quiz-questions)[What Is Item Difficulty?](/blog/what-is-item-difficulty)How to compute item discrimination
The point-biserial correlation between scoring this item correctly and total exam score:
Calculate each student's total exam score.Rank students from highest to lowest total score.Take the top 27% and bottom 27% (the classic cut points).Calculate proportion correct in the top group, minus proportion correct in the bottom group.That difference is your discrimination index (D).Worked example: 100 students take a 50-question quiz. Top 27 average 85% on item 12; bottom 27 average 35%. D for item 12 = 0.85 - 0.35 = 0.50. Strong discrimination.
Most assessment platforms compute this automatically. If yours doesn't, export to a spreadsheet.
What discrimination scores mean
0.40 or above: Excellent. Item strongly distinguishes high from low scorers.0.30 to 0.39: Good. Acceptable for most exams.0.20 to 0.29: Marginal. Consider revising for next iteration.Below 0.20: Poor. The item isn't measuring the same thing the rest of the exam is.Negative discrimination: Broken. Top students are getting it wrong while bottom students get it right. Always indicates a flawed item.Why items have low or negative discrimination
Common causes:
Ambiguous stem. Students who think carefully see multiple interpretations; students who skim pick one quickly.Defensible "wrong" answer. A distractor that's also technically correct under a different reading.Mistyped answer key. The grading rubric marks the wrong option as correct.Item tests content the rest of the exam doesn't. Possibly a stray question that doesn't belong in the bank.Trick wording. Punishing for reading carefully rather than rewarding knowledge.Off-topic. The item is about something other than the exam's main focus; high-scorers haven't necessarily studied it.A negative-discrimination item should be removed or rewritten before being used again.
Discrimination vs. difficulty
These two metrics together identify item health:
High difficulty (easy item) + low discrimination: Everyone gets it right. No signal; consider removing.High difficulty + good discrimination: A useful warm-up item that still separates the bottom of the class.Medium difficulty + good discrimination: The sweet spot. Keep these.Low difficulty (hard) + good discrimination: Useful for spotting top scorers.Low difficulty + low or negative discrimination: Broken or misaligned. Investigate.The most-improved exams use a portfolio of items spanning the upper-right quadrant (decent difficulty, strong discrimination) plus a few high-difficulty discriminators to spread the top of the curve.
When discrimination doesn't apply
A few situations where item discrimination isn't the right metric:
Mastery exams. When you want everyone to eventually score 100%, discrimination by design goes to zero on every item. Use pass/fail by item instead.Single-attempt drug-knowledge or safety tests. Discrimination is reduced when one wrong answer means failure regardless. Use criterion-referenced scoring instead.Very small samples. Discrimination statistics with fewer than 30 students are noisy. Wait for more data.Using discrimination to improve your bank over time
The point of computing discrimination isn't to grade items academically; it's to maintain a quality bank:
After each administration, flag low-D items. Revise or remove before next iteration.Track items over multiple administrations. An item that scores well consistently is bank-worthy; one that swings wildly is unreliable.Pair with student feedback. When students complain about a specific item, check its discrimination first; often the data confirms their critique.See per-question analytics on your next quiz.
Ready to create your first quiz?
Use AI to generate quizzes from your own study materials in seconds.
Try SimpleQuizMaker Free