The Dunning-Kruger Effect

(This is the script for my video about the Dunning-Kruger effect. While I wrote it to be read out loud by someone who inflects words like I do, i.e. by me, it should be pretty readable as text.)

Today we’re going to be looking at the Dunning-Kruger effect. This is the other topic requested by PickUpYourPantsPatrol—once again thanks for the request!—and if you’ve disagreed with anyone in the internet in the last few years, you’ve probably been accused of suffering from it.

Perhaps the best summary of the popular version of the Dunning-Kruger effect was given by John Cleese:

The problem with people like this is that they have no idea how stupid they are. You see, if you are very very stupid, how can you possibly realize that you are very very stupid? You’d have to be relatively intelligent to know how stupid you are. There’s a wonderful bit of research by a guy called David Dunning who’s pointed out that to know how good you are at something requires exactly the same skills as it does to be good at that thing in the first place. This means, if you’re absolutely no good at something at all, then you lack exactly the skills you need to know that you are absolutely no good at it.

There are plenty of things to say about this summary as well as the curious problem that if an idiot is talking to an intelligent person, absent reputation being available, there is a near-certainty that both will think the other an idiot. But before I get into any of that, I’d like to talk about the Dunning Kruger study itself, because I read the paper which Dunning and Kruger published in 1999, and it’s quite interesting.

The first thing to note about the paper is that it actually discusses four studies which the researchers did, trying to test specific ideas about incompetence and self-evaluation which the paper itself points out were already common knowledge. For example, they have a very on-point quotation from Thomas Jefferson. But, they note, this common wisdom that fools often don’t know that they’re fools has never been rigorously tested in the field of psychology, so they did.

The second thing to note about this study is that—as I understand is very common in psychological studies—their research subjects were all students taking psychology courses who received extra credit for participating. Now, these four studies were conducted in Cornell University, and the classes were all undergraduates, so right away generalizing to the larger population is immediately suspect since there’s good reason to believe that undergraduates in an Ivy League university have more than a few things in common which they don’t share with the rest of humanity. This is especially the case because the researchers were testing self-evaluation of performance, which is something that Cornell undergraduates were selected for and have a lot invested in. They are, in some sense, the elite of society, or so at least I suspect most of them have been told, even if not every one of them believes it.

Moreover, the tests which they were given—which I’ll go into detail about in a minute—were all academic tests, given to people who were there because they had generally been good at academics. Ivy League undergraduates are perhaps the people most likely to give falsely high impressions of how good they are at academic tests. This is especially the case if any of these were freshmen classes (they don’t say), since a freshman at an Ivy League school has impressed the admissions board but hasn’t had the opportunity to fail out yet.
So, right off the bat the general utility of this study in confirming popular wisdom is suspect; popular opinion may have to stand on its own. On the other hand, this may be nearly the perfect study to explain the phenomenon Nassim Nicholas Taleb described as Intellectual Yet Idiot—credentialed people who have the role of intellectuals yet little of the knowledge and none of the wisdom for acting the part.

Be that as it may, let’s look at the four studies described. The first study is in many ways the strangest, since it was a test of evaluating humor. They created a compilation of 30 jokes from several sources, then had a panel of 8 professional comedians rate these jokes on a scale from 1-11. After throwing out one outlier, they took the mean answers as the “correct” answers, then gave the same test to “65 cornell undergraduates from a variety of courses in psychology who earned extra credit for their participation”.
They found that the people with the bottom quartile of test scores, who by definition have an average rank of being at the twelfth percentile, guessed (on average) their rank was the 66th percentile. The bottom three quartiles overestimated their rank, while the top quartile underestimated their rank, thinking that they were in the (eyeballing it from the graph) 75th percentile when in fact (again, by definition) they were in the 88th.
This is, I think, the least interesting of the studies, first because the way they came up with “right” and “wrong” answers is very suspect, and second because this isn’t necessarily about mis-estimation of a person’s ability, but could be entirely about mis-estimating their peer’s ability. The fact that everyone put their average rank in the class at between the 66th percentile and 75th percentile may just mean that in default of knowing how they did, Cornell students are used to guessing that they got somewhere between a a B- and a B+. Given that they were admitted to Cornell, that guess may have a lot of history behind it to back it up.

The next test, though unfortunately only given to 45 Cornell students, is far more interesting both because it used 20 questions on logical reasoning taken from an LSAT prep book—so we’re dealing with questions where there is an unambiguously right answer—and because in addition to asking students how they thought they ranked, they asked the students how many questions they thought that they got right. It’s that last part that’s really interesting, because that’s a far more direct measure of how much the students thought that they knew. And in this case, the bottom quartile thought that they got 14.2 questions right while they actually got 9.6 right. The top quartile, by contrast, thought that they got 14 correct when they actually got 16.9 correct.

So, first, the effect does in fact hold up with unambiguous answers. The bottom quartile of performers thought that they got more questions right than they did. So far, so good. But the magnitude of the error is not nearly as great as it was for the ranking error, especially for the bottom quartile. Speaking loosely, the bottom quartile knew half of the material and thought that they knew three quarters of it. That is a significant error, in the sense of being a meaningful error, but at the same time they thought that they knew about 48% more than they did, not 48,000% more than they did. The 11 Cornell undergraduates who took this class did have an over-inflated sense of their ability, to be sure, but they also had a basic competence in the field. To put this in perspective, the top quartile only scored 76% better than the bottom quartile.

The next study was on 84 Cornell undergrads who were given a 20 question test of standard English grammar taken from a National Teacher Examination prep guide. This replicated the basic findings of the previous study, with the bottom quartile estimating they got 12.9 questions right versus a real score of 9.2. (Interestingly, the top quartile very slightly over-estimated their score as 16.9 when it was actually 16.4) Again, all these are averages so the numbers are a little wonky, but anyway this time they over-estimated their performance by 3.7 points, or 40%. And again, they got close to half the questions right, so this isn’t really a test of people who are incompetent.

There’s another thing to consider in both studies, which is how many questions the students thought they got wrong. In the first study they estimated 5.4 errors while in the second 7.1 errors, and while these were under-estimates, they were correct that they did in fact get that many wrong. Unfortunately these are aggregate numbers (asked after they handed the test in, I believe) so we don’t know their accuracy on gauging whether they got particular questions wrong, but in the first test they correctly estimated about 40% of their error and on the second test they correctly estimated about 65% of their error. That is, while they did unequivocally have an over-inflated sense of their performance, they were not wildly unrealistic about how much they knew. But of course these are both subjects they had studied in the past, and their test scores did demonstrate at least basic competence with them.

The fourth study is more interesting, in part because it was on a more esoteric subject: it was a 10 question test, given to 140 cornell undergrads, about set selection. Each problem described 4 cards and gave a rule which they might match. The question was which card or cards needed to be flipped over to determine if those cards do match the rule. Each question was like that, so we can see why they only asked ten questions.

They were asked to rate how they did in the usual way, but then half of them were given a short packet that took about 10 minutes to read explaining how to do these problems, while the other half was given an unrelated filler task that also took about 10 minutes. They were then asked to rate their performance again, and in fact the group who learned how to do the problems did revise their estimate of their performance, while the other group didn’t change it very much.

And in this test we actually see a gross mis-estimation of ability by the incompetent. The bottom quartile scored on average 0.3 questions correct, but initially thought that they had gotten about 5.5 questions correct. For reference, the top quartile initially thought that they had gotten 8.9 questions correct while they had in fact gotten all ten correct. And after the training, the untrained bottom quartile slightly raised their estimation of their score (by six tenths of a question), but among the trained people the bottom quartile reduced their estimation by 4.3 questions. (In fact the two groups had slightly different performances which I averaged together; so the bottom quartile of the trained group estimated that they got exactly one question right.)

This fourth study, it seems to me, is finally more of a real test of what everyone wants the Dunning-Kruger effect to be about. An average of 0.3 questions right corresponds to roughly to 11 of the 35 people in the bottom quartile getting one question right while the rest got every question wrong. The incompetent people were actually incompetent. Further, they over-estimated their performance by over 1800%. So here, finally, we come to the substance of the quote from John Cleese, right?

Well… maybe. There are two reasons I’m hesitant to say so, though. The first is the fact that these are still all Cornell students, so they are people who are used to being above average and doing well on tests and so forth. Moreover, virtually all of them would have never been outside of academia, so it is very likely that they’ve never encountered a test which was not designed to be passable by most people. If nothing else, it doesn’t reflect well on a teacher if most of his class gets a failing grade. And probably most importantly, the skills necessary to solve these problems are fairly close to the sort of skills that Ivy League undergrads are supposed to have, so this skillset at which they are incompetent being similar to a skillset at which they are presumably competent might well have misled them.

The second reason I’m hesitant to say that this study confirms the John Cleese quote is that the incompetent people estimated that they got 55% of the questions right, not 95% of the questions right. That is to say, incompetent people thought that they were merely competent. They didn’t think that they are experts.

In the conclusion of the paper, Dunning and Kruger talked about some limitations of their study, which I will quote because it’s well written and I want to do them justice.

We do not mean to imply that people are always unaware of their incompetence. We doubt whether many of our readers would dare take on Michael Jordan in a game of one-on-one, challenge Eric Clapton with a session of dueling guitars, or enter into a friendly wager on the golf course with Tiger Woods.

They go on to note that in some domains, knowledge is largely the substance of skill, like in grammar, whereas in other places knowledge and skill are not the same thing, like basketball.

They also note that there is a minimum amount of knowledge required to mistake oneself for competent. As the authors say:

Most people have no trouble identifying their inability to translate Slovenian proverbs, reconstruct an 8-cylinder engine, or diagnose acute disseminated encephalomyelitis.

So where does this leave us with regard to the quote from John Cleese? I think that the real issue is not so much about the inability of the incompetent to estimate their ability, but the inability of the incompetent to reconcile new ideas with what they do actually know. Idiots may not know much, but they still know some things. They’re not rocks. When a learned person tells them something, they are prone to reject it not because they think that they already know everything, but because it seems to contradict the few things they are sure of.

There is a complex interplay between intelligence and education—and I’m talking about education, mind, not mere schooling—where intelligence allows one to see distinctions and connections quickly, while education gives one the framework of what things there are that can be distinguished or connected. If a person lacks the one or the other—and especially if they lack both—understanding new things becomes very difficult because it is hard to connect what was said to what is already known, as well as to distinguish it from possible contradictions to what is already known. If the learned, intelligent person isn’t known by reputation to the idiot, the idiot has no way of knowing whether the things said don’t make sense to him because they are nonsense or because they are too much sense, and a little experience of the world is enough to make many if not most people sufficiently cynical to assume the former.

And I think that perhaps the best way to see the difference between this and the Dunning-Kruger effect is by considering the second half of the fourth experiment: the incompetent people learned how to do what they initially couldn’t. That is, after training they became competent. That is not, in general, our experience of idiots.
Until next time, may you hit everything you aim at.