Camera obscura: the case against AI in classrooms

A good friend of mine is a univer­sity professor. Over the past year or so, he and I have talked about what the arrival of fluent AI text gener­a­tors would mean for teaching.

With the recent and sudden avail­ability of free AI chat­bots, schools have abruptly found them­selves on the vanguard of crafting poli­cies around human–AI inter­ac­tion. When I asked my friend how his univer­sity was handling it, he said the faculty was “freaking out”. But why? I asked. Hasn’t it been apparent for a few years that this day was immi­nent?

Nobody expected it to arrive so quickly, he said.

(Though I am currently co-counsel in two lawsuits chal­lenging gener­a­tive AI, this piece is about the ethical and peda­gog­ical conse­quences of chat­bots. Not their legality, which is, like all gener­a­tive AI, very much unre­solved.)

As a rule, I believe schools should prob­ably forbid chat­bots and other gener­a­tive AI. I also believe that most won’t, but that elite schools prob­ably will, and that over time this will lead to some weird effects on the educa­tion and labor markets—but I will leave that to the econ­o­mists to sort out.

Research means more than fact-checking

When I first used GitHub Copilot, I said it “essen­tially tasks you with correcting a 12-year-old’s home­work … I have no idea how this is prefer­able to just doing the home­work your­self.” What I meant is that often, the focus of program­ming is not merely producing code that solves a problem. Rather, the code tends to be the side effect of a deeper process, which is to learn and under­stand enough about the problem to write the code. The authors of the famous MIT program­ming text­book Struc­ture and Inter­pre­ta­tion of Computer Programs call this virtuous cycle proce­dural epis­te­mology. We could also call it by its less exotic name: research.

Is this always the case? Of course not—as a working programmer, some­times you really just want the answer, because you can’t always afford the cost of doing research for every little thing. In those cases, it suffices to adapt someone else’s solu­tion and move on.

But school is different

Broadly speaking, what we’re teaching students with writing assign­ments is how to make a persua­sive argu­ment. And the raw mate­rial of persua­sive argu­ments is cred­ible evidence. What makes evidence cred­ible? Often, the source of the evidence. Thus, learning how to gather sources and weigh their cred­i­bility is a hugely impor­tant writing skill for every student, up there with learning how to avoid plagia­rism.

Legal writing is maybe the most concen­trated example. Legal writing is always an argu­ment from authority: you have to cite to laws and cases that support your argu­ment. As a legal writer you also need to be able to confront and distin­guish authority contrary to your argu­ment, e.g.—“that case was over­ruled”, “that case doesn’t apply in the Ninth Circuit”, “the facts of that case are different”, and so on. The same applies to statutes. As my federal-tax professor in law school used to say of the tax code: “When you think you’ve found the answer—keep reading!" Why? Because often, the excep­tions to what you just read would appear farther down the page.

Research—whether you’re a lawyer, a programmer, or a student writer—is a process, not an outcome. The goal is not just to get an answer, but to learn and under­stand. To be at least thor­ough, at best exhaus­tive. To persuade your­self that you not only have the right answer, but that you didn’t over­look a better one.

Which brings us to the worst defi­ciency of the chat­bots: they’re all about outcome, not process. Ask a ques­tion; get an answer. It’s usually easy to verify whether a certain answer is, at a coarse level, accu­rate or not. It’s much harder to verify whether the answer is well supported by cred­ible evidence.

Fact-checking chatbots is not enough

A busi­ness-school professor recently wrote that he was allowing his class to draft essays using chat­bots, with the proviso that each student had to verify all facts, and would remain account­able for errors.

I under­stand why teachers might feel that embracing AI is a more work­able détente than banning it outright. And if it is embraced, a policy of holding students account­able for its errors is better than not.

And yet. Simply adopting that policy means that a key goal of writing assign­ments has been oblit­er­ated. A student who has a chatbot-provided answer might learn a little more by veri­fying that answer. But that student will never learn the broader intel­lec­tual and histor­ical context for that answer. Nor will they learn whether a better answer exists, because they have no path by which to discover it. (Program­mers may recog­nize this as the algo­rithmic problem of getting stuck at a local maximum.) In that sense, research is a form of insur­ance: by starting with a broad scope of mate­rial and winnowing down, we can feel confi­dent that we are arriving at the best answer.

By compar­ison, a chatbot is just a pinhole camera. Like every pinhole camera, the chatbot is severely constrained. A chatbot is gener­ating a sequence of text prob­a­bilis­ti­cally weighted by what it’s seen in the training data. That means that chat­bots are more likely to produce answers that are more common in the training data, even if those answers are popular for bad reasons. It also means that chat­bots will struggle with novel or contrarian ques­tions, because there is no antecedent in the training data. Good writers rely on research to ensure their work isn’t subject to these blind spots and biases.

“But the chat­bots will get better.” True. Ideally, class­room poli­cies about gener­a­tive AI will be, like other acad­emic-integrity poli­cies, designed to be resilient to tech­no­log­ical improve­ments. Further­more, a chatbot that could replace human research would have to aim for qual­i­ta­tively different goals than those of today.

I expect that gener­a­tive AI will migrate toward specialty models, as opposed to today’s one-size-fits-all approach. For instance, in law there are books known as “secondary sources” that summa­rize cases and law on certain topics, thereby allowing lawyers conducting research to quickly survey the bound­aries. Could some of these publi­ca­tions even­tu­ally be replaced by AI? Maybe. Though the virtue of these publi­ca­tions is the fact that another lawyer read every­thing for you, estab­lishing the cred­i­bility of the summary. (Indeed, that’s why these secondary sources often cost a lot.) A legal AI that’s wrong 10% of the time would be, in prac­tice, no better than one that’s wrong 90% of the time, because either way it can’t be trusted.

Is it already too late?

Unfor­tu­nately, I think the ques­tion of how to permis­sively incor­po­rate gener­a­tive AI into the class­room is going to seem quaint in 6–12 months when schools are contending with enor­mous amounts of unde­tectable cheating and plagia­rism enabled by these tools.

Grading systems will get distorted. Highly skilled students will complain, with good reason, that it’s unfair that they have to compete against robots. (This is why elite insti­tu­tions will forbid AI: this policy will help them attract these students.) Further­more, human­i­ties depart­ments, which have long been in decline, will face new argu­ments along the lines of “well, if you can ace that subject with an AI, it must not be worth­while.”

It’s possible that schools will end up forbid­ding gener­a­tive AI by default more often than I predict. They may get so far behind the curve that they have no choice but to use the nuclear option.

update, 15 days later

A thought exper­i­ment—above, I mentioned a professor who was having students draft essay assign­ments using a chatbot. Let’s call that person Professor Alice, and the chatbot MegaChat.

Suppose Professor Bob creates an “AI Essay Validator”. He has students write their essays as usual, but then they have to send them through the essay validator to provide the output that each student can hand in.

Students start to notice that in terms of point of view and argu­ment, some essays are changed very little by the validator; others are changed dramat­i­cally. (BTW I’m not cryp­ti­cally inviting an explo­ration of whether chat­bots have polit­ical or norma­tive biases. They have prob­a­bilistic tenden­cies, which is all that matters here.)

Later it’s revealed that under the hood, Professor Bob’s AI Essay Validator simply throws away each student’s orig­inal essay and gener­ates a new one using MegaChat.

My hunch is that moral and ethical objec­tions would be lodged against Professor Bob’s system based on its habit of system­at­i­cally restricting student expres­sion and imposing view­points.

If you agree, then two ques­tions:

  1. Func­tion­ally speaking, how is Professor Alice’s system different from Professor Bob’s (aside from elim­i­nating steps that don’t affect the outcome)?

  2. If the two systems are func­tion­ally equiv­a­lent, then why should we regard Professor Alice’s system as less objec­tion­able?

Perhaps Professor Alice gets a bonus point for honesty; perhaps Professor Bob’s students get value from the exer­cise of synthe­sizing their own thoughts. But I would argue that both profes­sors end up limiting their students’ expres­sion in the same way—through the filter of MegaChat.