Universities Revive Blue Books as AI Detectors Fail

Theo Baker wrote in The New York Times that he watched a Stanford freshman sign a declaration saying he had not used ChatGPT while ChatGPT remained open in the next window. The scene unfolded on the deck of a yacht party financed by venture capitalists, in the class that had arrived on campus two months before OpenAI released the chatbot. Four years later, Baker wrote, those students were back in blue books and proctored rooms, and a friend summarized the new campus norm as "Now it's just normal."

Baker's reporting points past honor-code enforcement to the harder question of what now counts as evidence that a student learned. Stanford has expanded an exam-proctoring pilot from seven courses to more than 50, Princeton has approved mandatory exam proctoring for the first time in its 133-year honor-code history, and detector vendors themselves say their scores cannot anchor a misconduct case. The shared assumption across these responses is that a finished essay can no longer serve, on its own, as evidence of student learning.

Key Takeaways

Stanford and Princeton are reviving proctored exams as AI weakens finished assignments as proof of learning.
Student AI use is mainstream, but survey data separates general coursework use from assignment substitution.
Turnitin, Vanderbilt and UNF warn detector scores cannot anchor misconduct cases on their own.
Universities are moving toward process evidence: drafts, oral checks, revision memos and supervised work.

AI-generated summary, reviewed by an editor. More on our AI guidelines.

Survey data shows student use outpaced faculty response

In a French fiction class, a student scribbled on a Hudson River Trading notepad, and Baker noted that fresh graduates there can earn upward of $600,000 a year. Stanford sits close to AI money and status, down to Jensen Huang's autographed $4,000 graphics cards as dorm-room trophies. Inside Higher Ed's Student Voice survey, which The Implicator previously covered, found 85 percent of students used generative AI for coursework in the previous year, while 25 percent used it to complete assignments and 19 percent to write full essays.

The 25-percent substitution figure is the one universities have to plan around, because it separates assisted use from completed handoff. Tyton Partners' Time for Class 2025 found 42 percent of students used GenAI weekly or daily in spring 2025, compared with 30 percent of instructors, and more instructors said AI had increased workload than reduced it; 71 percent reported spending time monitoring cheating and 61 percent redesigning assessments.

Baker's ancient Greek art history professor put the campus version of that gap bluntly: "It's all we talk about."

Stanford and Princeton restore proctored exams

By April 2026, Baker wrote, students were taking exams by hand in blue books, a practice he framed against Stanford's century-old ban on faculty proctoring to show "confidence in the honor" of students. Stanford's own Academic Integrity Working Group said its proctoring pilot had grown from seven courses in spring 2024 to more than 50 courses by the reported quarter, while recommending oral exams and in-class writing for high-stakes restricted-AI contexts.

The Daily Princetonian reported that Princeton's stricter rule, scheduled for July 1, 2026, will require proctoring for all in-person exams and represents the most significant change to an honor system founded in 1893. In the paper's 2025 senior survey, more than 500 seniors answered: 29.9 percent said they had cheated, and 44.6 percent said they knew of Honor Code violations they did not report.

The proctoring response works inside a closed exam room and does not extend to take-home essays, lab reports or coding assignments, which is why both campuses are also recommending oral defenses and in-class writing for restricted-AI courses.

Track AI in education and work

Strategic AI news from San Francisco. No hype, no "AI will change everything" throat clearing. Just what moved, who won, and why it matters. Daily at 6am PST.

No spam. Unsubscribe anytime.

Detectors face false-positive and policy limits

Turnitin's own documentation says an AI writing score "should not be used as the sole basis for adverse actions against a student." Vanderbilt disabled Turnitin's AI detector after the company activated it with less than 24 hours' notice, then ran the arithmetic: a claimed 1 percent false-positive rate against 75,000 papers submitted in 2022 could incorrectly label about 750 papers.

Vanderbilt's current guidance says "a report to the Undergraduate Honor Council cannot be based solely on an artificial intelligence detector score." The University of North Florida cites reliability, privacy and burden-of-proof concerns, warning that open-source detector checks may create FERPA problems and cannot meet the burden of proof for misconduct.

Weixin Liang, Mert Yuksekgonul, Yining Mao, Eric Wu and James Zou reported that seven GPT detectors mislabeled TOEFL essays by non-native English writers at an average false-positive rate of 61.22 percent, with 89 of 91 essays in the test set flagged by at least one detector.

Career pressure pushes universities toward process verification

CNBC reported on May 18, 2026 that Dartmouth has raised $30 million to subsidize student internships, with eligible students receiving up to $6,500. CUNY, with 180,000 undergraduates, is integrating career advising, paid internships and apprenticeships into its degree programs. The CNBC and SurveyMonkey student survey that accompanied the report found two-thirds of respondents pessimistic about the job market and 49 percent considering changing the skills they had been building, with 40 percent open to switching their field of study because of AI.

Joseph Catrino, Dartmouth's career-design director, told CNBC: "Higher education needs to do better. We need to step up and help students be prepared." Both the Dartmouth internship program and CUNY's apprenticeship integration grade students partly on supervisor evaluations of work performed on the job.

Missouri's faculty guidance answers the question "Is using ChatGPT considered cheating?" with "It depends," tying the answer to instructor permission rather than a campus-wide ban. Tyton found only 28 percent of institutions had formal institution-wide GenAI policies and 32 percent were still developing them, which leaves day-to-day rules sitting with individual professors. The Stanford working group's recommendation pairs proctored exams with oral defenses and revision memos that document how a student arrived at a finished assignment. Princeton's mandatory proctoring takes effect July 1, 2026.

Frequently Asked Questions

Why are universities returning to blue books and proctored exams?

AI tools make it harder to treat take-home work as proof of learning. Proctored exams, blue books and oral checks give instructors controlled settings where students must show what they know without outside assistance.

Does the article say all student AI use is cheating?

No. The survey data separates broad coursework use from substitution. Inside Higher Ed found 85 percent used AI for coursework, while 25 percent used it to complete assignments and 19 percent to write full essays.

Why can’t universities rely on AI detectors?

Detector vendors and universities warn that scores can be wrong and should not be used alone. Vanderbilt, UNF and Turnitin all say misconduct findings require human review and other evidence.

What is process verification?

Process verification asks students to show how work was produced. Examples include drafts, source notes, prompt logs when allowed, revision memos, oral defenses, in-class writing and supervised workplace evaluations.

What changes for faculty?

Faculty need clearer assignment-level rules and more assessment designs that test reasoning, not just final output. Tyton found workload is already rising for monitoring and redesign.

AI-generated summary, reviewed by an editor. More on our AI guidelines.

Analysis

Marcus Schuler

San Francisco

Editor-in-Chief and founder of Implicator.ai. Former ARD correspondent and senior broadcast journalist with 10+ years covering tech. Writes daily briefings on policy and market developments. Based in San Francisco. E-mail: [email protected]