For Teachers

The Honest Quiz: Designing Assessments AI Can't Cheat

May 2, 20268 minJames Okafor · EdTech Researcher & Instructional Designer

In this article

1.The new reality
2.What AI is good at (and what to avoid)
3.What AI is genuinely bad at
4.A redesigned summative assessment
5.What you don't need to do
6.Using AI quiz generators alongside this
7.The longer-term shift

TL;DR. AI can ace generic factual questions, summary requests, and standard 5-paragraph essays. It struggles with personal reasoning tied to specific class context, oral defense, real-time problem-solving, and assessments that draw on lived classroom experience. Design for those, and AI becomes a study tool instead of an exam-day shortcut.

The new reality

A 9th-grader on her phone in 30 seconds can have ChatGPT write her DBQ essay, solve her algebra problem, and outline her science lab report. She can do this on her bus ride to school. The lockdown-browser approach is a finger in a leaky dam.

Banning AI in classrooms is not the answer either, for two reasons. First, students will be working alongside AI for the rest of their careers; teaching them to do so well is part of our job. Second, the kinds of assessments AI struggles with are also the kinds that produce better learning. Designing AI-resistant assessments is, mostly, designing better assessments.

What AI is good at (and what to avoid)

A 2024–2025 generation of students using GPT-4-class models can complete the following with high quality and zero original effort:

Multiple-choice tests on factual recall (history dates, vocabulary definitions, formulas)

Definitional short-answer questions ("What is photosynthesis?")

5-paragraph essays on common topics (causes of WWI, themes in *The Great Gatsby*)

Standard math problem sets (anything you'd find in a textbook end-of-chapter)

Code that solves canonical CS problems

If your assessment is one of these formats and the topic is well-covered online, you should expect high baseline AI performance. Your assessment is no longer measuring student knowledge — it's measuring willingness to type.

This isn't a reason to drop these formats entirely (they're still useful for in-class formative assessment). It's a reason not to use them as your primary high-stakes evaluation.

What AI is genuinely bad at

1. Specific class context

A question that references material covered uniquely in your class — a particular discussion, a specific experiment your students designed together, a guest speaker — has no online corpus for the AI to draw from. It will produce confident-sounding but factually wrong answers.

Example. Bad: *"Explain the causes of the French Revolution."* (AI nails this)

Good: *"Last Thursday's class discussion identified three causes of the French Revolution. Which did you find most persuasive, and why? Reference at least two specific points made by classmates."*

2. Real-time, low-stakes oral defense

Five minutes after a written submission, ask the student to verbally explain one specific paragraph of what they wrote. Stumbling, contradicting their own paper, or being unable to clarify a term they used — these are the signals.

This works at scale: build a 2–3 minute oral check into every major paper. You don't need to do it for every student every paper; random sampling produces enough signal to make AI use feel risky.

3. Process artifacts

Require students to show their thinking process, not just the final output:

Hand-drawn outlines of essays before drafting

Annotated source notes with personal reactions

Photo evidence of math work in progress

Drafts with comments explaining choices

AI can produce a final essay; it can't (yet) backfill a process trail that matches the student's actual cognitive style. Teachers who know their students can tell when the artifacts feel inconsistent with the student's prior work.

4. Personal reflection on classroom experience

Questions tied to the student's own learning journey: "What did you find most confusing about today's topic? What was the moment it clicked?" AI doesn't know what was confusing for *this* student — and a generic answer reads as obviously generic.

5. In-class application of recent material

Take last Tuesday's lesson on iambic pentameter. Today, give the class a poem they've never seen and 15 minutes to scan it. The combination of *not announced in advance* + *requires applying a specific recent skill* + *time-boxed in class* makes AI use impractical.

This is just well-designed in-class formative assessment, dressed up with new language. It's also some of the best-evidenced teaching practice in the cognitive science literature.

A redesigned summative assessment

Take a traditional unit assessment in U.S. history — chapter on the Civil Rights Movement. Old version:

> 25 multiple choice questions

> 5 short-answer questions

> One 5-paragraph essay: "Discuss the major events of the Civil Rights Movement and their significance."

AI scores 95+%. The student spends 10 minutes copy-pasting. You can't tell from the artifact alone.

Redesigned version:

10 multiple choice (in class, no devices) — fast factual baseline

One short answer (3 minutes, in class): "Three Civil Rights leaders we discussed had different theories of change. In one paragraph, explain which we discussed in class and which most resembled MLK's approach. Reference the specific examples from our class discussion on Tuesday."

One take-home essay (open AI use allowed): "Choose one event from the Civil Rights Movement and write a 600-word analysis of its long-term impact. Include: (1) a paragraph on what AI tools (if you used any) got wrong about this event, with citations to corrected sources; (2) a personal connection — how does this event relate to your own community or family history?"

Total time: ~75 minutes. The first part tests recall in conditions AI can't reach. The second tests integration with class-specific content. The third explicitly invites AI use *with critical evaluation* — a skill students will need professionally.

You learn more about each student's understanding from this assessment than from the old version, even though it's the same total time and effort.

What you don't need to do

You don't need to:

Become a forensic detective on every paper

Memorize AI-detector tool outputs (they have ~30% false-positive rates and are unreliable evidence)

Lecture students about academic integrity (some will, most won't, and the lecture doesn't change behavior)

Ban devices entirely (it's exhausting and pushes the problem elsewhere)

You do need to:

Redesign 1–2 assessments per year toward AI-resistance

Build oral defense and in-class application into your unit pacing

Have honest conversations about *when* AI use is appropriate and what citations look like

Using AI quiz generators alongside this

Worth being explicit: AI quiz generators (including SimpleQuizMaker) are tools for *teachers*, not tools to give students for exams. Their value is in the time savings of generating practice quizzes, exit tickets, and re-teach materials — content the teacher then administers in a controlled environment.

The same student who would use ChatGPT to write their essay will absolutely use a teacher-generated practice quiz to study legitimately. The asymmetry is by design.

The longer-term shift

The argument for AI-resistant assessment isn't just about catching cheaters. It's that the alternative — assessing what AI can do — was never a particularly useful measurement of student learning.

A history class that tests "explain the causes of WWI" was, even before AI, mostly testing whether students could recall the textbook. A history class that tests "examine three primary sources we read this week and identify which one most challenges our textbook's narrative — defend your choice in writing and orally" tests historical thinking.

AI made the bad assessments obviously bad. The good assessments were always good. The teacher work, increasingly, is shifting our practices toward the latter.

Related reading: [Critical Thinking Quiz Design](/blog/critical-thinking-quiz-design) · [Higher-Order Thinking Questions](/blog/higher-order-thinking-questions) · [Formative vs Summative Assessment](/blog/formative-vs-summative-assessment)

Get weekly study & quiz tips

Join teachers and students who get practical tips on quizzing, active recall, and AI-powered learning.

Share:X LinkedIn

James Okafor

EdTech Researcher & Instructional Designer

Practice with AI-generated quizzes

🏛️ History Quiz Generator 📐 Math Quiz Generator 🔬 Science Quiz Generator 📝 SAT Prep Quiz Generator