Skip to content
Glossary

What Is Item Difficulty? The Quiz Quality Metric Most Teachers Ignore

May 30, 20264 minSarah Mitchell
Share:XLinkedIn

Short answer. Item difficulty (denoted *p*) is the proportion of students who answered a given quiz question correctly. A question that 80% of students got right has *p* = 0.80. The metric is poorly named — *higher* difficulty value = *easier* question — but it's a foundational stat for evaluating quiz quality.

What the values mean

  • p = 1.0: Every student got it right. Either trivially easy, or a freebie. Not useful for distinguishing students.
  • p = 0.80-0.90: Easy. Good for confidence-building, foundational checks.
  • p = 0.50-0.70: Medium. Most useful for assessment — discriminates between students who know and don't know.
  • p = 0.20-0.40: Hard. Good for stretch questions; use sparingly.
  • p < 0.20: Either too hard for the cohort, or a broken question.
  • p = 0: No one got it right. Almost certainly a broken question.
  • How to use it

    For a typical quiz:

  • Aim for an average difficulty of 0.5-0.7 across all items
  • Don't have all easy items (p > 0.85) — the quiz doesn't differentiate
  • Don't have all hard items (p < 0.4) — the quiz frustrates without measuring
  • Mix difficulty levels so the quiz has a discrimination curve
  • Item difficulty + item discrimination

    Item difficulty alone is incomplete. A question with p = 0.50 might be:

  • A great question (top half of class gets it right, bottom half doesn't) — high [item discrimination](/blog/what-is-item-discrimination)
  • A coin-flip question (every student has 50/50 odds) — low item discrimination
  • A *reverse-discrimination* broken question (bottom half gets it right, top half misses) — negative discrimination
  • Use both metrics together. A good quiz has medium-difficulty items with positive discrimination.

    How modern quiz tools surface this

    Most LMSs and quiz platforms show item difficulty after each quiz administration:

  • Canvas, Blackboard, Moodle: quiz statistics page
  • SimpleQuizMaker: per-question analytics across all submissions
  • After every quiz, scan the difficulty distribution. The items with p > 0.95 (everyone right) and p < 0.15 (almost no one right) are usually the ones to review or rewrite.

    Common mistakes

  • Confusing the name. Higher value = easier. Not intuitive. Many teachers reverse this in their heads.
  • Ignoring difficulty data when grading. The "everyone missed Q5" pattern usually means Q5 is broken, not that everyone failed to learn.
  • Treating low p as a teaching problem when it's a question problem. Verify the question is well-written before concluding students don't know the material.
  • [What Is Item Discrimination?](/blog/what-is-item-discrimination)
  • [How to Write Good Quiz Questions](/blog/how-to-write-good-quiz-questions)
  • [How to Write Hard Quiz Questions](/blog/how-to-write-hard-quiz-questions)
  • [Quiz Analytics — Teacher Guide](/blog/quiz-analytics-teacher-guide)
  • Generate a quiz and see per-question difficulty after the first 5 submissions.

    Get weekly study & quiz tips

    Join teachers and students who get practical tips on quizzing, active recall, and AI-powered learning.

    Share:XLinkedIn

    Sarah Mitchell

    Curriculum Designer & Former High School Teacher

    Practice with AI-generated quizzes

    Ready to create your first quiz?

    Use AI to generate quizzes from your own study materials in seconds.

    Try SimpleQuizMaker Free