Awful Scientific Paper: Cognitive Bias in Forensic Pathology Decisions

I came across a rather bad paper recently titled Cognitive Bias in Forensic Pathology Decisions. It’s impressively bad in a number of ways. Here’s the abstract:

Forensic pathologists’ decisions are critical in police investigations and court proceedings as they determine whether an unnatural death of a young child was an accident or homicide. Does cognitive bias affect forensic pathologists’ decision-making? To address this question, we examined all death certificates issued during a 10-year period in the State of Nevada in the United States for children under the age of six. We also conducted an experiment with 133 forensic pathologists in which we tested whether knowledge of irrelevant non-medical information that should have no bearing on forensic pathologists’ decisions influenced their manner of death determinations. The dataset of death certificates indicated that forensic pathologists were more likely to rule “homicide” rather than “accident” for deaths of Black children relative to White children. This may arise because the base-rate expectation creates an a priori cognitive bias to rule that Black children died as a result of homicide, which then perpetuates itself. Corroborating this explanation, the experimental data with the 133 forensic pathologists exhibited biased decisions when given identical medical information but different irrelevant non-medical information about the race of the child and who was the caregiver who brought them to the hospital. These findings together demonstrate how extraneous information can result in cognitive bias in forensic pathology decision-making.

OK, let’s take a look at the actual study. First, it notes that black children’s deaths were more likely to be ruled homicides (instead of accidents) than white children’s deaths, in the state of Nevada, between 2009 and 2019. More accurately, of those deaths of children under 6 which were given some form of unnatural death ruling, the deaths of black children were significantly more likely to be rated a homicide rather than an accident than were the deaths of white children.

It’s worth looking at the actual numbers, though. Of all of the deaths of children under 6 in Nevada between 2009 and 2019, 8.5% of the deaths of black children were ruled a homicide by forensic pathologists while 5.6% of the deaths of white children were ruled a homicide. That’s not a huge difference. They use some statistics to make it look much larger, of course, because they need to justify why they did an experiment on this.

In fairness to the authors, they do correctly note that these statistics don’t really mean much on its own, since black children might have been murdered statistically more often than white children, during those time periods in Nevada. It doesn’t reveal cognitive biases if the pathologists were simply correct about real discrepancies.

So now we come to the experiment: They got 133 forensic pathologists to participate. They took a medical vignette about a child below six who was discovered motionless on the living room floor by their caretaker, brought the ER, and died shortly afterwards. “Postmortem examination determined that the toddler had a skull fracture and subarachnoid hemorrhage of the brain.”

The participants were broken up into two groups, which I will call A and B. 65 people were assigned to A and 68 to B. All participants were given the same vignette, except that, to be consistent with typical medical information, the race of the child was specified. Group A’s information stated that the child was black, while group B’s information stated that the child was white. OK, so they then asked the pathologists to give a ruling on the child’s death as they normally would, right?

No. They included information about the caretaker. This is part of the experiment to determine bias, because information about the caretaker is not medically relevant.

OK, so they said that the caretaker had the same race as the child?

Heh. No. Nothing that would make sense like that.

The caretaker of the black child was described as the mother’s boyfriend, while the caretaker of the white child was the child’s grandmother. Their race was not specified, though for the caretaker of the white child it can be (somewhat) inferred from the blood relation, depending on what drop-of-blood rule one assumes the investigators are using to determine the child is white. Someone who is 1/4 black, where the caretaker grandmother was the black grandparent, might well be identified as white, or perhaps the 1 drop of blood rule is applied at the grandmother could be at most 1/8 black for her grandchild to qualify to the racist experimenters as white. Why do they leave out the race of the caretaker despite clearly wanting to draw conclusions about it? Why, indeed.

More to the point, these are not at all comparable things. It is basic human psychology that people are far less likely to murder their descendants than they are to murder people not related to them. Moreover, males are more likely to commit violent crimes than females are (with some asterisks; there is some evidence to suggest that women are possibly even more likely to hit children than men are but just get away with it more because people prefer to look away when women are violent, but in any event the general expectation is that a male is more likely to be violent than a female is). Finally, young people are significantly more likely to be violent than older people are.

In short, in the vignette given to group A, the dead child is black and the caretaker who brought them in is given 3 characteristics, each of which, on its own, makes violence more statistically likely. In group B, the dead child is white and the caretaker who brought them in is given 3 characteristics, each of which, on its own, makes violence more statistically unlikely. For Pete’s sake, culturally, we use grandmothers as the epitome of non-violence and gentleness! At this point, why didn’t they just give the caretaker of the black child multiple prior convictions for murdering children? Heck, why not have him give such medically extraneous information as repeatedly saying, “I didn’t hit him with the hammer that hard. I don’t get why he’s not moving.” I suppose that would have been too on-the-nose.

Now, given that we’re comparing a child in the care of mom’s boyfriend to a child in the care of the child’s grandmother, what do they call group A? Boyfriend Condition? Nope. Black Condition. Do they call group B Grandma Condition? Nope. White Condition.

OK, so now that we have a setup clearly designed to achieve a result, what are the results?

None of the pathologists rated the death “natural” or “suicide.” 78 of the 133 pathologists ruled the child’s death “undetermined” (38 from group A, 40 from group B). That is, 58.6% of pathologists rules it “undetermined”. Of the minority who ruled conclusively, 23 ruled it homicide and 32 ruled it homicide. (That is, 17.2% of all pathologists ruled it accident and 24% of all pathologists ruled it homicide.)

In group A, 23 pathologists ruled the case homicide, 4 ruled it accident, and 38 ruled it undetermined. In group B, 9 ruled it homicide, 19 ruled it accident, and 40 ruled it undetermined.

This is off from an exactly equal outcome by approximately 15 out of 133 pathologists. I.e. if about 7 pathologists in group A had ruled accident instead of homicide, and 7 pathologists in group B ruled homicide instead of accident, the results would have been equal between both groups. As it was, this is a big enough difference to get statistical significance, which is just a measure of whether the random chance you see 95% of the time is sufficient to entirely explain the results. What it doesn’t do is show a pervasive trend. If 11% of the participants had reversed their ruling, the experiment would have shown that the 18.6% of forensic pathologists on an email list of board-certified pathologists who responded to the study were paragons of impartiality.

There’s an especially interesting aspect to the last paragraph of the conclusion:

Most important is the phenomenon identified in this study, namely demonstrating that biases by medically irrelevant contextual information do affect the conclusions reached by medical examiners. The degree and the detailed nature of these biasing effects require further research, but establishing biases in forensic pathology decision-making—the first study to do so—is not diminished by the potential limitation of not knowing which specific irrelevant information biased them (the race of the child, or/and the nature of the caretaker). Also, one must remember that the experimental study is complemented and corroborated by the data from the death certificates.

The first part is making a fair point, which is that the study does demonstrate that it is possible to bias the forensic pathologist by providing medically irrelevant information, such as the caretaker being far more likely to have intentionally hurt the child. Why didn’t they make all of the children white and just have half of the vignettes including the caretaker with multiple previous felony convictions, who was inebriated, repeatedly state, “I only hit the little brat with a hammer four times”? If we’re only trying to see whether medically irrelevant information can bias the medical examiner, that would do it too. But what’s up with varying the race of the child?

While it’s probably just to be sensationalist because race-based results are currently hot, it may also be a tie-in to that last sentence: “Also, one must remember that the experimental study is complemented and corroborated by the data from the death certificates.” This sentence shows a massive problem with the researcher’s understanding of the nature of research. Two bad data sources which corroborate each other do not improve each other.

To show this, consider a randomly generated data source. Instead of giving a vignette, just have another set of pathologists randomly answer “A”, “B,” or “C”. Then decide that A corresponds to undetermined, B to homicide, and C to accident. There’s a good chance that people won’t pick these evenly, so you’ll get a disparity. If it happens to be the same, it doesn’t bolster the study to say “the results, it must be remembered, also agreed with the completely-blinded study in which pathologists picked a ruling at random, without knowing what ruling they picked”.

Meaningless data does not acquire meaning by being combined with other meaningless data.

The conclusion of the study is, curiously, entirely reasonable. It basically amounts to the observation that if you want a medical examiner making a ruling based strictly on the medical evidence, you should hide all other evidence but the medical evidence from them. This, as the British like to say, no fool ever doubted. If you want someone to make a decision based only on some information, it is a wise course of action to present them only that information. Giving them information that you don’t want them to use is merely asking for trouble. It doesn’t require a badly designed and interpreted study to make this point.