AI Exposes a Major Problem with Universities

I’ve heard that AI, or more properly, Large Language Models (LLMs), are a disaster for colleges and universities. Many people take this to be an indictment of the students, and there is some truth to that, but they’re missing the degree to which this is a damning indictment of Academia. If your tests give excellent grades to statistical text generators, you weren’t testing what you thought you were and the grades you gave didn’t mean what you thought they meant.

Of course, it’s been an open secret that grades have meant less and less over the years. The quality of both students and professors has been going down, though no one wants to admit it. This is, however, a simple consequence of the number of students and professors growing so much over the last 50 or so years. In the USA, something like 60% of people over the age of 25 have attended college with close to 40% of them having a degree. 60% of people can’t all be in the top 1%. 40% of people also can’t all be in the top 1%. At most, in fact, 1% of people can be in the top 1%. When a thing becomes widespread, it must trend toward mediocrity.

So this really isn’t a surprise. Nor, frankly, is it a surprise that Universities held on to prestige for so much longer than they deserved it—very few human beings have the honesty to give up the good opinion of others that they don’t deserve, and the more people who pile onto a ponzi scheme, the more people have a strong interest in trying to keep up the pretence.

Which is probably why Academics are reacting so desperately and so foolishly to the existence of chatGPT and other LLMs. They’re desperately trying to prevent people from using the tools in the hope that this will keep up their social status. But this is a doomed enterprise. The mere fact that the statistical text generator can get excellent grades means that the grades are no longer worth more than the statistical text generator. And to be clear: this is not a blow for humanity, only for grades.

To explain what I mean, let me tell you about my recent experiences with using LLM-powered tools for writing software. (For those who don’t know, my day job is being head of the programming department at a small company.) I’ve been using several, mostly preferring GitHub Co-Pilot for inline suggestions and Aider using DeepSeek V3 0324 for larger amounts of code generation. They’re extremely useful tools, but also extremely limited. Kind of in the way that a back hoe can dig an enormous amount of dirt compared to a shovel, but it still needs an operator to decide what to dig.

What I and all of my programmer friends who have been trying LLM-powered tools have found is that “vibe coding,” where you just tell the LLM what you want and it designs it, tends to be an unmaintainable disaster above a low level of complexity. However, where it shines is in implementing the “leaf nodes” of a decision tree. A decision tree is a name for how human beings handle complex problems: we can’t actually solve complex problems, but we can break them down into a series of simpler problems that, when they’re all solved, solve the complex problem. But usually these simpler problems are still complex, and so they need to be broken down into yet-simpler problems. And this process of breaking each sub-problem down eventually ends in problems simple enough that any (competent) idiot can just directly solve it. These are the leaf nodes of the decision tree. And these simple problems are what LLMs are actually good at.

This is because what LLMs actually do is transforms in highly multi-dimensional spaces, or in less technical language, they reproduce patterns that existed in their training data. They excel at any problem which can be modeled as taking input and turning it into a pattern that existed in its training data, but with the details of the input substituted for the details in the training data. This is why they’re so good at solving the problems that any competent idiot can solve—solutions to those problems were abundant in its training data.

The LLMs will, of course, produce code for more complex things for which the solution did not already exist in its training data, but the quality of these solutions usually range from terrible to not-even-a-solution. (There are lots of people who will take your money and promise you more than this; there are always people who will use hype to try to separate people from their money. I’ve yet to hear of the case where they are not best ignored.)

Now, I’ve encountered the exact problem of a test being rendered obsolete by LLMs. In hiring programmers, I’ve had excellent results making the first interview a programming sample specification that people had 5 business days to complete. (To prove good faith, I’d give them my implementation to it right after they submitted theirs.) It was a single page, fairly detailed specification, but it left room for creativity, too. However, you can throw it into any high-end LLM these days and get a perfectly workmanlike result. This is obviously not useful as a first interview anymore.

One possible response would be to try to prevent the use of LLMs, such as by asking people to write it in front of me (e.g. during a video call with a shared screen). But what would be the point of that? If we hired the person, I’d expect them to use LLMs as a tool at work. (Used properly, they increase productivity and decrease stress.)

It only took a minute or two of thinking about this to realize that the problem is not that LLMs can implement the programming sample, but that the programming sample was only slightly getting at what I wanted to find out about the person. What I want to know is whether they can design good software, not whether they can rapidly implement the same kind of code that everyone (competent) has written ten times at least.

So I came up with a different first interview sample. Instead of having people do something which is 10% what I want to see and 90% detail work, I have switched to asking the candidates to write a data format for our products, focusing on size efficiency balanced with robustness and future expansion based on where they think our products might go in the future. This actually gets at what I want to know—what is the person’s judgement like—and uses very little of their time doing anything an LLM could do faster.

I haven’t hired anyone since making this change, so I’m not in a position to say how well this particular solution to the problem works. I’m only bringing it up to show the kind of thinking that is necessary—asking yourself what it is that you are actually trying to get at, rather than just assuming that your approach is getting at that. (In my defense, it did work quite a lot better for the intended purpose than FizzBuzz, which we had used before. So it was very much a step in the right direction.)

That Academia’s response to LLMs is to try to just get rid of them, rather than to use them to figure out what the weakness in their testing have been, tells you quite a lot about what a hollow shell Academia has become.

What Should Christians Make of AI?

In this video, I answer a viewer’s question about what Christians should make of AI. (It’s really the same thing that everyone should make of AI.

Basically, there are two senses of AI:

  1. Like us
  2. Something that does what we would do by intelligence.

All AI that exists is AI in sense 2, not in sense 1, though sense 1 wouldn’t be a massive problem if it did exist.

Testing Computer Programs

My oldest son, who does yet know how to program, told me a great joke about programmers testing the programs they’ve written:

A programmer writes the implementation of a bartender. He then goes into the bar and orders one beer. He then orders two beers. He orders 256 beers. He order 257 beers. He order 9,999 beers. He orders 0.1 beers. He orders zero beers. He orders -1 beers. Everything works properly.

A customer walks in and asks where the bathroom is. The bar catches fire.

It’s funny ’cause it’s true.

It’s easy, when you design a tool, to test that it works for the purpose the tool exists for. What it’s very easy to miss is all of the other possible uses of the tool. To take a simple example: when you’re making a screwdriver, it’s obvious to test the thing for driving screws. It’s less obvious to test it as a pry bar, a chisel, an awl, or a tape dispenser.

This disparity is inherent in the nature of making tools versus using them. Tools are made by tool-makers. The best tool makers use their own tools, but they are only one person. Each person has his way of solving a problem, and he tends to stick to that way because he’s gotten good at it. When he goes to make a tool, he makes it work well for how he will use it, and often adds features for variations on how he can think to use it to solve the problems he’s making the tool to solve. If he’s fortunate enough to have the resources to talk to other people who will use the tool, he’ll ask them and probably get some good ideas on alternative ways to use it. But he can’t talk to everyone, and he especially can’t talk to the people who haven’t even considered using the tool he hasn’t made yet.

That last group is especially difficult, since there’s no way to know what they will need. But they will come, because once the tool exists, people who have problems where this new tool will at least partially solve their problem will start using it to do so, since they’re better off with it than they were before, even though the tool was never meant to do that.

This isn’t much of a problem with simple tools like a screwdriver, since it doesn’t really have any subtleties to it. This can be a big problem with complex tools, and especially with software. When it comes to software design, you can talk to a bunch of people, but mostly you have to deal with this through trial-and-error, with people reporting “bugs” and you going, “why on earth would you do that?” and then you figure it out and (probably) make changes to make that use case work.

The flip side is a big more generally practical, though: when considering tools, you will usually have the most success with them if you use them for what they were designed to do. The more you are using the tool for some other purpose, the more likely you are to run into problems with it and discover bugs.

For me this comes up a lot when picking software libraries. Naive programmers will look at a library and ask, “can I use this to do what I want?” With more experience, you learn to ask, “was this library designed to do what I want to do?” Code re-use is a great thing, as is not re-inventing the wheel, but this needs to be balanced out against whether the tool was designed for the use for which you want to use it, or whether you’re going to be constantly fighting it. You can use the fact that a car’s differential means that its drive wheels will spin in the mud to dig holes, but that will stop working when car manufacturers come out with limited-slip differentials because they’re making cars for transportation, not digging holes.

That’s not to say that one should never be creative in one’s use of a tool. Certainly there are books which work better for propping up a table than they do for being read. Just be careful with it.