AI Exposes a Major Problem with Universities

I’ve heard that AI, or more properly, Large Language Models (LLMs), are a disaster for colleges and universities. Many people take this to be an indictment of the students, and there is some truth to that, but they’re missing the degree to which this is a damning indictment of Academia. If your tests give excellent grades to statistical text generators, you weren’t testing what you thought you were and the grades you gave didn’t mean what you thought they meant.

Of course, it’s been an open secret that grades have meant less and less over the years. The quality of both students and professors has been going down, though no one wants to admit it. This is, however, a simple consequence of the number of students and professors growing so much over the last 50 or so years. In the USA, something like 60% of people over the age of 25 have attended college with close to 40% of them having a degree. 60% of people can’t all be in the top 1%. 40% of people also can’t all be in the top 1%. At most, in fact, 1% of people can be in the top 1%. When a thing becomes widespread, it must trend toward mediocrity.

So this really isn’t a surprise. Nor, frankly, is it a surprise that Universities held on to prestige for so much longer than they deserved it—very few human beings have the honesty to give up the good opinion of others that they don’t deserve, and the more people who pile onto a ponzi scheme, the more people have a strong interest in trying to keep up the pretence.

Which is probably why Academics are reacting so desperately and so foolishly to the existence of chatGPT and other LLMs. They’re desperately trying to prevent people from using the tools in the hope that this will keep up their social status. But this is a doomed enterprise. The mere fact that the statistical text generator can get excellent grades means that the grades are no longer worth more than the statistical text generator. And to be clear: this is not a blow for humanity, only for grades.

To explain what I mean, let me tell you about my recent experiences with using LLM-powered tools for writing software. (For those who don’t know, my day job is being head of the programming department at a small company.) I’ve been using several, mostly preferring GitHub Co-Pilot for inline suggestions and Aider using DeepSeek V3 0324 for larger amounts of code generation. They’re extremely useful tools, but also extremely limited. Kind of in the way that a back hoe can dig an enormous amount of dirt compared to a shovel, but it still needs an operator to decide what to dig.

What I and all of my programmer friends who have been trying LLM-powered tools have found is that “vibe coding,” where you just tell the LLM what you want and it designs it, tends to be an unmaintainable disaster above a low level of complexity. However, where it shines is in implementing the “leaf nodes” of a decision tree. A decision tree is a name for how human beings handle complex problems: we can’t actually solve complex problems, but we can break them down into a series of simpler problems that, when they’re all solved, solve the complex problem. But usually these simpler problems are still complex, and so they need to be broken down into yet-simpler problems. And this process of breaking each sub-problem down eventually ends in problems simple enough that any (competent) idiot can just directly solve it. These are the leaf nodes of the decision tree. And these simple problems are what LLMs are actually good at.

This is because what LLMs actually do is transforms in highly multi-dimensional spaces, or in less technical language, they reproduce patterns that existed in their training data. They excel at any problem which can be modeled as taking input and turning it into a pattern that existed in its training data, but with the details of the input substituted for the details in the training data. This is why they’re so good at solving the problems that any competent idiot can solve—solutions to those problems were abundant in its training data.

The LLMs will, of course, produce code for more complex things for which the solution did not already exist in its training data, but the quality of these solutions usually range from terrible to not-even-a-solution. (There are lots of people who will take your money and promise you more than this; there are always people who will use hype to try to separate people from their money. I’ve yet to hear of the case where they are not best ignored.)

Now, I’ve encountered the exact problem of a test being rendered obsolete by LLMs. In hiring programmers, I’ve had excellent results making the first interview a programming sample specification that people had 5 business days to complete. (To prove good faith, I’d give them my implementation to it right after they submitted theirs.) It was a single page, fairly detailed specification, but it left room for creativity, too. However, you can throw it into any high-end LLM these days and get a perfectly workmanlike result. This is obviously not useful as a first interview anymore.

One possible response would be to try to prevent the use of LLMs, such as by asking people to write it in front of me (e.g. during a video call with a shared screen). But what would be the point of that? If we hired the person, I’d expect them to use LLMs as a tool at work. (Used properly, they increase productivity and decrease stress.)

It only took a minute or two of thinking about this to realize that the problem is not that LLMs can implement the programming sample, but that the programming sample was only slightly getting at what I wanted to find out about the person. What I want to know is whether they can design good software, not whether they can rapidly implement the same kind of code that everyone (competent) has written ten times at least.

So I came up with a different first interview sample. Instead of having people do something which is 10% what I want to see and 90% detail work, I have switched to asking the candidates to write a data format for our products, focusing on size efficiency balanced with robustness and future expansion based on where they think our products might go in the future. This actually gets at what I want to know—what is the person’s judgement like—and uses very little of their time doing anything an LLM could do faster.

I haven’t hired anyone since making this change, so I’m not in a position to say how well this particular solution to the problem works. I’m only bringing it up to show the kind of thinking that is necessary—asking yourself what it is that you are actually trying to get at, rather than just assuming that your approach is getting at that. (In my defense, it did work quite a lot better for the intended purpose than FizzBuzz, which we had used before. So it was very much a step in the right direction.)

That Academia’s response to LLMs is to try to just get rid of them, rather than to use them to figure out what the weakness in their testing have been, tells you quite a lot about what a hollow shell Academia has become.

GPT4 And Movie Scripts

I asked a friend to have GPT4 (the successor to chatGPT) write a prompt for itself to write a movie script. Here’s the prompt it came up with:

Craft an engaging and original movie script that revolves around an enigmatic object with mysterious powers. The story should include a diverse and dynamic ensemble of characters, each with their own unique backstory, who must come together to uncover the truth behind the object. The plot should feature unexpected twists and turns, with a focus on character development and emotional depth, while incorporating elements of suspense, action, and heartwarming moments. Please provide a complete script, including character dialogues, scene descriptions, and visual details.

I love how generic this description is. To a great degree that’s what chatGPT/GPT4 specializes in; it’s an engine for generating the most probable text that the people who wrote what it was trained on would say. That is, it is a cliché engine.

I should also note, in fairness, that chatGPT/GPT4 is woefully underpowered to generate a full movie script. Those are usually about 100 pages long (the rule of thumb is 1 minute per page for a competently written script), while chatGPT/GPT4 is limited to 4096 “words” (where “word” is defined a bit weirdly, almost more like “syllable”, though not quite). That said, it’s interesting to take a look at what little it can do.

So, here’s the script generated from the prompt:

Title: The Enigma of the Eon Orb

INT. MUSEUM – NIGHT

A dimly lit room is filled with ancient artifacts. We focus on a particular exhibit: a small, glowing ORB, encased in glass. This is the EON ORB.

Right off the bat, this is not right for a movie script. A movie script describes what the audience sees and hears, it does not give exposition to the reader as if it’s a narrator in a novel. Movie scripts are essentially the blueprints for creating a movie; you can’t do anything with “This is the EON ORB.” You could probably get away with “The camera focuses in on the exhibit with the EON ORB, though it’s only labeled ‘mysterious artifact'” That lets the prop department know which prop to use.

EXT. MUSEUM – NIGHT

Establishing shot of the museum. A heist is taking place; masked THIEVES break in and steal the Eon Orb.

Again, this is not how movie scripts work. In a movie script you have to say things like how many people we see, vaguely what they look like, and it needs to describe in detail what they do. This is more like the prompt you might give to an improv group.

Further, how are we supposed to see that they steal the EON ORB from outside the museum? There is a lot of detail available in HD movies these days, especially if you spring for the 4K version, but showing a heist inside a museum from the outside of the museum is asking a lot even of 4K.

CUT TO:

INT. CIA HEADQUARTERS – DAY

AGENT JAMES PARKER (40s, experienced, determined) is briefed about the theft. He learns about the Eon Orb’s mysterious powers and its ability to manipulate time.

Again, not how movie scripts work.

Also, if people knew about the Eon Orb’s mysterious powers and its ability to manipulate time, why was it just left in a museum, rather than in a securely guarded vault?

PARKER
(serious)
We need to assemble a team to retrieve this artifact before it falls into the wrong hands.

Who is he talking to? Why does he believe that it has not already fallen into the wrong hands?

CUT TO:

INT. VARIOUS LOCATIONS – DAY

That’s not how movie scripts work! You can’t cut to several different places. Or is this meant to be split-screened like the Ang Lee hulk movie which I never saw but I heard was unwatchable because he show several scenes simultaneously in split-screen?

Parker recruits a diverse group of experts for his team:

This is more like a summary than a script, and barely even that.

DR. OLIVIA MARTINEZ (30s, intelligent, strong-willed), an archaeologist with knowledge about the Eon Orb
LENA JONES (late 20s, athletic, tech-savvy), a skilled thief with a troubled past
RAYMOND “RAY” SMITH (40s, charismatic, mysterious), a history professor with expertise in ancient artifacts
RUBY ZHANG (early 20s, quiet, empathetic), a gifted artist who can recreate any object she sees

OK, the archaeologist with knowledge about the Eon Orb I get. A history professor expertise in ancient artifacts… I guess is defensible. Why is the CIA recruiting a tech-savvy skilled thief with a troubled past? Don’t they have tech-savvy people with thief-like skills but without the troubled past in the CIA already? And what on earth could the point be of recruiting an artist who can recreate any object she sees? Is she supposed to make commemorative figurines to help them remember their adventures once it’s over?


INT. CIA HEADQUARTERS – DAY

The team gathers, and Parker briefs them on the mission.

Yeah, this isn’t how you write a movie script, but it’s also pointless because it’s summarizing one of the few lines of dialog that comes immediately after.

PARKER
We have intel that a powerful criminal organization is behind the theft. Our mission is to retrieve the Eon Orb before they discover its true power.

The team agrees to work together, despite their differences.

GPT4 really believes in “tell, don’t show.” That said, why does having different jobs mean that they have to agree to work together “despite their differences”? Also, isn’t that what they did when they were recruited? Were they only recruited to hear the briefing and then they’d decide on what they were committing to?

EXT. VARIOUS LOCATIONS – DAY

The team travels to different parts of the world, following clues and leads. Along the way, they encounter dangerous situations and form strong bonds.

This is impressively generic, even by chatGPT standards. Also, it’s not a movie script. I need a macro for that…

INT. ABANDONED WAREHOUSE – NIGHT

The team discovers the ORGANIZATION’s hideout. They devise a plan to retrieve the Eon Orb.

The place where they discover the hideout of the ORGANIZATION (what a creative name for a criminal organization!) is outside of an abandoned warehouse? This is also where they devise their plan? They walked up to it planless and only when they were sitting around outside did they devise a plan???

Also, why is the team who is going to infiltrate a hideout in an abandoned warehouse a CIA agent, an archaeologist, a history professor, a thief with a troubled past, and an artist? Remember, these weren’t thrown together by coincidence, they were recruited by the CIA agent. When it comes to sneaking into a criminal organization’s hideout, there’s no one he’d rather have by his side than these people with (possibly aside from the thief) no relevant skills?

INT. ABANDONED WAREHOUSE – NIGHT

The team infiltrates the hideout in an intense action sequence. They succeed in retrieving the Orb, but they accidentally activate its powers.

CUT TO:

EXT. ANCIENT CITY – DAY

The team is transported back in time to an ancient civilization, where they meet the original GUARDIANS of the Eon Orb. They learn about the Orb’s true purpose: to maintain balance between the past, present, and future.

GUARDIAN
You must return the Orb to its rightful place in your time. The balance of the world depends on it.

Wow. What a plot. So the Eon Orb’s purpose is to maintain the balance of past, present, and future. It does this by being in the time where it’s supposed to be. So why does it have a user interface which you can accidentally trigger that sends it to the wrong time? Seriously. The thing’s function is best accomplished by not doing anything with it. Why does it have the ability to allow people to screw that up when literally nothing good can come of it. That would be like putting a self destruct button on a sandbag used for holding floodwaters back. That’s not helpful.

The team faces moral dilemmas and confronts their personal pasts. They grow and learn from their experiences.

Man did this follow the prompt to “feature twists and turns with an emphasis on character development”.

INT. ABANDONED WAREHOUSE – NIGHT

They return to the present, but are discovered by the Organization. A thrilling chase ensues.

I guess we’re supposed to take the script’s word for it that the chase was thrilling.

EXT. CITY STREETS – NIGHT

The team is cornered, but they use their individual skills to outsmart the Organization and escape.

I really want to know which skill the history professor used to outsmart the Organization and escape. No, scratch that, I want to know which object the soft-spoken artist copied perfectly during a thrilling chase which allowed her to escape.

INT. MUSEUM – NIGHT

The team returns the Eon Orb to its rightful place, restoring balance to the world.

The script never actually mentioned anything bad happening because the Eon Orb was gone, but OK.

EXT. MUSEUM – DAY

The team says their goodbyes, having formed lifelong bonds.

MARTINEZ
We’ve come a long way. I’m proud to call you all my friends.

RAY
Our paths crossed for a reason. I’ll never forget our journey together.

LENA
Who knows? Maybe we’ll cross paths again someday.

RUBY
(softly)
I hope so.

PARKER
Until then

Oh wow is that dialog original!

GPT4 suggests that the title of the movie should be “The Enigma of the Eon Orb,” but I suspect “Action Team Cliché: Origins” would be a better title.

I guess this wasn’t an entirely fair request. It’s not easy to write an original, engaging, heart-warming movie that focuses on character development. Let’s give it something easier.

Here’s GPT4’s attempt at a script for a sequel to Star Wars Episode VII: The Force Awakens:

Title: Star Wars Episode VIII: Shadows of the Past

INT. RESISTANCE BASE – DAY

Commander POE DAMERON is briefing a group of pilots, including REY and FINN.

POE
We’ve discovered an old Imperial outpost on the distant planet, Tython. Our intel indicates that there might be valuable information on the whereabouts of other potential Force-sensitive individuals.

REY
We can’t let the First Order get their hands on that information.

FINN
Agreed. We have to get there first.

EXT. SPACE – DAY

The Millennium Falcon and several X-wing fighters blast off towards Tython.

EXT. TYTHON – DAY

The Resistance team arrives on Tython and enters the ancient outpost.

INT. IMPERIAL OUTPOST – DAY

The team explores the outpost, uncovering hidden rooms and ancient artifacts.

REY
(sensing something)
There’s a strong presence of the Force here.

FINN
Let’s find that information and get out of here.
[4:26 PM]
As they continue to search, they discover a holocron with the data they need. However, they also awaken an ancient DARK SIDE ENTITY, which attacks them.

INT. IMPERIAL OUTPOST – BATTLE

The team fights the dark side entity. Rey’s powers are tested as she struggles to protect her friends.

EXT. TYTHON – DAY

The team escapes the outpost, but the entity follows them. A massive battle ensues between the Resistance and the dark side entity.

REY
We can’t defeat it alone. We need help.

POE
Rey, you have the power within you. Trust the Force.

Rey takes a deep breath and focuses her energy. She harnesses the power of the Force and defeats the entity.

EXT. RESISTANCE BASE – DAY

The team returns with the information and is greeted by General Leia Organa.

LEIA
Well done. With this information, we can find and train others like you, Rey.

FINN
The First Order won’t know what hit them.

REY
We’ll be ready.

The movie ends with the promise of a new generation of Jedi and the hope of defeating the First Order once and for all.

Yup. It was better than the one Rian Johnson wrote. (If you’re curious, here’s my twenty seven thousand word review of Star Wars Episode VIII: The Least Jedi.)

Of course, that’s a very low bar to clear. You have to dig in order to find it.