ChatGPT is Frequently Misunderstood

A while back I looked into a paper that described the actual mathematics/programming behind chatGPT (really, behind GPT3, which is basically the same thing but with a less sophisticated front-end). I find it interesting how often the thing is misunderstood.

ChatGPT is, as it will candidly tell you, a Large Language Model. That language part is very important. It does not model facts, or concepts; it has no understanding of anything. It does not try to have an understanding of anything. What it does is model language, but not in the way that grammarians model language. It models how language is used in practice. That is, it is a model of the sort of things that people actually say.

Without getting into the details of how the model works, it is enough to know that it was trained by taking in the appearance and order of words within approximately everything that the people at openAI were able to scrape off of the internet about two years before going public with chatGPT.

(This, btw, is the big improvement in chatGPT over previous large language models; the T stands for “transformer” and is a particular kind of use of matrices which allows the model to be “trained” in parallel, which allows for massively larger training sets than had previously been used in large language models. That said, its interesting to note that you can’t increase the training size to much larger than “approximately all of the text that has ever been written”, so on this basis alone we’d expect to see improvements in large language models slowing down after chatGPT because improved training is no longer an option. By contrast, this is not a problem for image-generating AI training. Generating massively larger numbers of images is quite straight forward.)

The output of chatGPT can loosely but accurately be described as “the words that the people who wrote the text chatGPT was trained on would probably say following whatever you said to it.” It doesn’t understand subjects and has no concept of what it is saying or the truth or falsity of what it is saying, it only has a concept of the probability of words appearing in an order on the basis of what words came earlier.

Like all modern “AI” stuff, the “AI” part takes input and produces output that a human being would not recognize as related to what the AI is being used for. Translating from a use case into the input of the AI model, and translating from its output back to the use case, is the work of programs written in normal programming languages that the programmers understand quite well. This front-end software is responsible for things like chatGPT being able to use references to previous subjects, or handle special cases like refusing to tell people how to commit crimes or adding a disclaimer to everything it says that is fitness-related that you should consult a doctor. They seem to be constantly adding more to this front-end, such as the ability to take instructions like, “write a sentence that ends in the word apple.” A language model only produces a set of words in order that are probable based on its training set; following directions is not related to this. Thus any amount of following directions is entirely in the front-end and consists of programmers looking at what kinds of instructions it’s been given and writing front-end code to handle those cases.

I’ve heard people say things like, “if they add the ability for it to check whether the things it is saying are true,” but no one has developed an AI which can identify what parts of a sentence constitute facts that could be checked. Consider a simple sentence like, “Mary, Queen of Scotts, once placed a bet on a game of tennis.”

What even are the facts that the sentence asserts? Let’s list them:

  • A person named Mary exists (or perhaps is a fictional character in a published work)
  • She was queen of the Scotts
  • The Scotts are a people, specially who live in Scottland
  • Scottland exists (or existed)
  • There was at least one event at which this Mary made a bet
  • That bet was on a game of tennis (anywhere)
  • A game of tennis was played during the lifetime of Mary, Queen of Scotts

Some of these facts do imply others. For example: if Mary did in fact bet on a game of tennis, then it must be the case that a game of tennis was played during Mary’s lifetime. That is not necessarily the order you want to check them, though; it is common when fact checking to start with the easy-to-check facts.

There are complications, though. When we say “tennis” do we mean the medieval game played in an indoor court in which the walls (and some roofs) were in play, or do we mean “lawn tennis” which is the modern use of “tennis.” Is the sentence asserting something about Mary relating to the modern game of tennis, or only to the game which was more popularly played in her day and from which the modern game called “tennis” (more properly, “lawn tennis”) is derived?

ChatGPT doesn’t even begin to have anything within its model relevant to answering any of these questions.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.