A good friend of mine is a university professor. Over the past year or so, he and I have talked about what the arrival of fluent AI text generators would mean for teaching.
With the recent and sudden availability of free AI chatbots, schools have abruptly found themselves on the vanguard of crafting policies around human–AI interaction. When I asked my friend how his university was handling it, he said the faculty was “freaking out”. But why? I asked. Hasn’t it been apparent for a few years that this day was imminent?
Nobody expected it to arrive so quickly, he said.
(Though I am currently co-counsel in two lawsuits challenging generative AI, this piece is about the ethical and pedagogical consequences of chatbots. Not their legality, which is, like all generative AI, very much unresolved.)
As a rule, I believe schools should probably forbid chatbots and other generative AI. I also believe that most won’t, but that elite schools probably will, and that over time this will lead to some weird effects on the education and labor markets—but I will leave that to the economists to sort out.
When I first used GitHub Copilot, I said it “essentially tasks you with correcting a 12-year-old’s homework … I have no idea how this is preferable to just doing the homework yourself.” What I meant is that often, the focus of programming is not merely producing code that solves a problem. Rather, the code tends to be the side effect of a deeper process, which is to learn and understand enough about the problem to write the code. The authors of the famous MIT programming textbook Structure and Interpretation of Computer Programs call this virtuous cycle procedural epistemology. We could also call it by its less exotic name: research.
Is this always the case? Of course not—as a working programmer, sometimes you really just want the answer, because you can’t always afford the cost of doing research for every little thing. In those cases, it suffices to adapt someone else’s solution and move on.
Broadly speaking, what we’re teaching students with writing assignments is how to make a persuasive argument. And the raw material of persuasive arguments is credible evidence. What makes evidence credible? Often, the source of the evidence. Thus, learning how to gather sources and weigh their credibility is a hugely important writing skill for every student, up there with learning how to avoid plagiarism.
Legal writing is maybe the most concentrated example. Legal writing is always an argument from authority: you have to cite to laws and cases that support your argument. As a legal writer you also need to be able to confront and distinguish authority contrary to your argument, e.g.—“that case was overruled”, “that case doesn’t apply in the Ninth Circuit”, “the facts of that case are different”, and so on. The same applies to statutes. As my federal-tax professor in law school used to say of the tax code: “When you think you’ve found the answer—keep reading!" Why? Because often, the exceptions to what you just read would appear farther down the page.
Research—whether you’re a lawyer, a programmer, or a student writer—is a process, not an outcome. The goal is not just to get an answer, but to learn and understand. To be at least thorough, at best exhaustive. To persuade yourself that you not only have the right answer, but that you didn’t overlook a better one.
Which brings us to the worst deficiency of the chatbots: they’re all about outcome, not process. Ask a question; get an answer. It’s usually easy to verify whether a certain answer is, at a coarse level, accurate or not. It’s much harder to verify whether the answer is well supported by credible evidence.
A business-school professor recently wrote that he was allowing his class to draft essays using chatbots, with the proviso that each student had to verify all facts, and would remain accountable for errors.
I understand why teachers might feel that embracing AI is a more workable détente than banning it outright. And if it is embraced, a policy of holding students accountable for its errors is better than not.
And yet. Simply adopting that policy means that a key goal of writing assignments has been obliterated. A student who has a chatbot-provided answer might learn a little more by verifying that answer. But that student will never learn the broader intellectual and historical context for that answer. Nor will they learn whether a better answer exists, because they have no path by which to discover it. (Programmers may recognize this as the algorithmic problem of getting stuck at a local maximum.) In that sense, research is a form of insurance: by starting with a broad scope of material and winnowing down, we can feel confident that we are arriving at the best answer.
By comparison, a chatbot is just a pinhole camera. Like every pinhole camera, the chatbot is severely constrained. A chatbot is generating a sequence of text probabilistically weighted by what it’s seen in the training data. That means that chatbots are more likely to produce answers that are more common in the training data, even if those answers are popular for bad reasons. It also means that chatbots will struggle with novel or contrarian questions, because there is no antecedent in the training data. Good writers rely on research to ensure their work isn’t subject to these blind spots and biases.
“But the chatbots will get better.” True. Ideally, classroom policies about generative AI will be, like other academic-integrity policies, designed to be resilient to technological improvements. Furthermore, a chatbot that could replace human research would have to aim for qualitatively different goals than those of today.
I expect that generative AI will migrate toward specialty models, as opposed to today’s one-size-fits-all approach. For instance, in law there are books known as “secondary sources” that summarize cases and law on certain topics, thereby allowing lawyers conducting research to quickly survey the boundaries. Could some of these publications eventually be replaced by AI? Maybe. Though the virtue of these publications is the fact that another lawyer read everything for you, establishing the credibility of the summary. (Indeed, that’s why these secondary sources often cost a lot.) A legal AI that’s wrong 10% of the time would be, in practice, no better than one that’s wrong 90% of the time, because either way it can’t be trusted.
Unfortunately, I think the question of how to permissively incorporate generative AI into the classroom is going to seem quaint in 6–12 months when schools are contending with enormous amounts of undetectable cheating and plagiarism enabled by these tools.
Grading systems will get distorted. Highly skilled students will complain, with good reason, that it’s unfair that they have to compete against robots. (This is why elite institutions will forbid AI: this policy will help them attract these students.) Furthermore, humanities departments, which have long been in decline, will face new arguments along the lines of “well, if you can ace that subject with an AI, it must not be worthwhile.”
It’s possible that schools will end up forbidding generative AI by default more often than I predict. They may get so far behind the curve that they have no choice but to use the nuclear option.
A thought experiment—above, I mentioned a professor who was having students draft essay assignments using a chatbot. Let’s call that person Professor Alice, and the chatbot MegaChat.
Suppose Professor Bob creates an “AI Essay Validator”. He has students write their essays as usual, but then they have to send them through the essay validator to provide the output that each student can hand in.
Students start to notice that in terms of point of view and argument, some essays are changed very little by the validator; others are changed dramatically. (BTW I’m not cryptically inviting an exploration of whether chatbots have political or normative biases. They have probabilistic tendencies, which is all that matters here.)
Later it’s revealed that under the hood, Professor Bob’s AI Essay Validator simply throws away each student’s original essay and generates a new one using MegaChat.
My hunch is that moral and ethical objections would be lodged against Professor Bob’s system based on its habit of systematically restricting student expression and imposing viewpoints.
If you agree, then two questions:
Functionally speaking, how is Professor Alice’s system different from Professor Bob’s (aside from eliminating steps that don’t affect the outcome)?
If the two systems are functionally equivalent, then why should we regard Professor Alice’s system as less objectionable?
Perhaps Professor Alice gets a bonus point for honesty; perhaps Professor Bob’s students get value from the exercise of synthesizing their own thoughts. But I would argue that both professors end up limiting their students’ expression in the same way—through the filter of MegaChat.