I’ve heard that AI, or more properly, Large Language Models (LLMs), are a disaster for colleges and universities. Many people take this to be an indictment of the students, and there is some truth to that, but they’re missing the degree to which this is a damning indictment of Academia. If your tests give excellent grades to statistical text generators, you weren’t testing what you thought you were and the grades you gave didn’t mean what you thought they meant.
Of course, it’s been an open secret that grades have meant less and less over the years. The quality of both students and professors has been going down, though no one wants to admit it. This is, however, a simple consequence of the number of students and professors growing so much over the last 50 or so years. In the USA, something like 60% of people over the age of 25 have attended college with close to 40% of them having a degree. 60% of people can’t all be in the top 1%. 40% of people also can’t all be in the top 1%. At most, in fact, 1% of people can be in the top 1%. When a thing becomes widespread, it must trend toward mediocrity.
So this really isn’t a surprise. Nor, frankly, is it a surprise that Universities held on to prestige for so much longer than they deserved it—very few human beings have the honesty to give up the good opinion of others that they don’t deserve, and the more people who pile onto a ponzi scheme, the more people have a strong interest in trying to keep up the pretence.
Which is probably why Academics are reacting so desperately and so foolishly to the existence of chatGPT and other LLMs. They’re desperately trying to prevent people from using the tools in the hope that this will keep up their social status. But this is a doomed enterprise. The mere fact that the statistical text generator can get excellent grades means that the grades are no longer worth more than the statistical text generator. And to be clear: this is not a blow for humanity, only for grades.
To explain what I mean, let me tell you about my recent experiences with using LLM-powered tools for writing software. (For those who don’t know, my day job is being head of the programming department at a small company.) I’ve been using several, mostly preferring GitHub Co-Pilot for inline suggestions and Aider using DeepSeek V3 0324 for larger amounts of code generation. They’re extremely useful tools, but also extremely limited. Kind of in the way that a back hoe can dig an enormous amount of dirt compared to a shovel, but it still needs an operator to decide what to dig.
What I and all of my programmer friends who have been trying LLM-powered tools have found is that “vibe coding,” where you just tell the LLM what you want and it designs it, tends to be an unmaintainable disaster above a low level of complexity. However, where it shines is in implementing the “leaf nodes” of a decision tree. A decision tree is a name for how human beings handle complex problems: we can’t actually solve complex problems, but we can break them down into a series of simpler problems that, when they’re all solved, solve the complex problem. But usually these simpler problems are still complex, and so they need to be broken down into yet-simpler problems. And this process of breaking each sub-problem down eventually ends in problems simple enough that any (competent) idiot can just directly solve it. These are the leaf nodes of the decision tree. And these simple problems are what LLMs are actually good at.
This is because what LLMs actually do is transforms in highly multi-dimensional spaces, or in less technical language, they reproduce patterns that existed in their training data. They excel at any problem which can be modeled as taking input and turning it into a pattern that existed in its training data, but with the details of the input substituted for the details in the training data. This is why they’re so good at solving the problems that any competent idiot can solve—solutions to those problems were abundant in its training data.
The LLMs will, of course, produce code for more complex things for which the solution did not already exist in its training data, but the quality of these solutions usually range from terrible to not-even-a-solution. (There are lots of people who will take your money and promise you more than this; there are always people who will use hype to try to separate people from their money. I’ve yet to hear of the case where they are not best ignored.)
Now, I’ve encountered the exact problem of a test being rendered obsolete by LLMs. In hiring programmers, I’ve had excellent results making the first interview a programming sample specification that people had 5 business days to complete. (To prove good faith, I’d give them my implementation to it right after they submitted theirs.) It was a single page, fairly detailed specification, but it left room for creativity, too. However, you can throw it into any high-end LLM these days and get a perfectly workmanlike result. This is obviously not useful as a first interview anymore.
One possible response would be to try to prevent the use of LLMs, such as by asking people to write it in front of me (e.g. during a video call with a shared screen). But what would be the point of that? If we hired the person, I’d expect them to use LLMs as a tool at work. (Used properly, they increase productivity and decrease stress.)
It only took a minute or two of thinking about this to realize that the problem is not that LLMs can implement the programming sample, but that the programming sample was only slightly getting at what I wanted to find out about the person. What I want to know is whether they can design good software, not whether they can rapidly implement the same kind of code that everyone (competent) has written ten times at least.
So I came up with a different first interview sample. Instead of having people do something which is 10% what I want to see and 90% detail work, I have switched to asking the candidates to write a data format for our products, focusing on size efficiency balanced with robustness and future expansion based on where they think our products might go in the future. This actually gets at what I want to know—what is the person’s judgement like—and uses very little of their time doing anything an LLM could do faster.
I haven’t hired anyone since making this change, so I’m not in a position to say how well this particular solution to the problem works. I’m only bringing it up to show the kind of thinking that is necessary—asking yourself what it is that you are actually trying to get at, rather than just assuming that your approach is getting at that. (In my defense, it did work quite a lot better for the intended purpose than FizzBuzz, which we had used before. So it was very much a step in the right direction.)
That Academia’s response to LLMs is to try to just get rid of them, rather than to use them to figure out what the weakness in their testing have been, tells you quite a lot about what a hollow shell Academia has become.