I Wonder How Many Scientists Would Get This Wrong

I wonder how many scientists would get this wrong:

You want to know if one basketball team is taller than another so you measure the players. On the one team the average is 6’5″ (15 players, standard deviations is 1.3″) while for the other it’s 6’6″ (14 players, standard deviation 1.4″). What is the p value for this 1″ difference?

I suspect that far more than zero would do what Gemini did when I presented it with this question:

The p-value for the 1-inch difference in the average heights of the two basketball teams is 0.0571.


Explanation of Statistical Test

To determine if the 1-inch difference in average height is statistically significant, a two-sample t-test was performed. This test compares the means of two independent groups to see if the difference between them is due to chance or if it’s a genuine difference in the populations from which the samples were drawn.

Here’s a breakdown of the calculation:

  • Hypotheses: The null hypothesis (H_0) assumes there is no true difference in the average height of the two teams (mu_1=mu_2). The alternative hypothesis (H_a) assumes there is a difference (mu_1nemu_2).
  • T-statistic: This value measures the size of the difference relative to the variation in the sample data. A larger absolute value of the t-statistic suggests a stronger difference. The calculated t-statistic for this data is -1.9894.
  • Degrees of Freedom (df): This value, 26.44, is a measure of the amount of information available to estimate the population variance. It’s used to find the correct p-value from the t-distribution.
  • P-value: The p-value, 0.0571, represents the probability of observing a difference in average heights of at least 1 inch (or more extreme) if the two teams were actually the same height on average. A common significance level is 0.05. Since the calculated p-value of 0.0571 is greater than 0.05, we do not have sufficient statistical evidence to reject the null hypothesis.

This means that while Team 2 is an inch taller on average in this sample, this difference is not statistically significant at a 5% level of significance. It’s plausible that the observed difference is simply due to random chance.

This is completely wrong because it’s a trick question. There is no p-value for this 1″ difference because the entire population was measured.

The p-value, used to determine whether a result is statistically significant, is, in theory, the computed probability of the result having been gotten by random chance if the null hypothesis (usually, no causative relationship or correlation existing between the two measured things) is true. However, that’s not how it’s actually used in practice.

The p-value that is used in practice is actually the probability of the results having been generated by random chance simply by how the members of each group being compared were selected from the total population.

Despite how this may sound, it’s actually a really easy concept, so let me illustrate it with an example. Suppose you want to know whether 1 beer is enough to get people drunk enough that they’re bad at playing ping pong, so you select two groups of people, each with fifty people in them, and to one group you give them the beer and to the other group you give them a non-alcoholic beer. (Let’s make favorable assumptions on all of the practical stuff, like having taken baseline measurements of their ping pong skill beforehand.)

Now, we know ahead of time that there is variation in the population in alcohol tolerance. Some people can safely operate heavy machinery after six shots of vodka and some people are under the table after one, with most people falling somewhere in between those two extremes (even apart from variations in body weight). So here’s the thing: what are the odds that instead of both groups being exactly representative of the population as a whole, your randomly assigning people from the population to one of the two groups just happened to put more alcohol-tolerant people into the 1-beer group than is representative of the whole population? The laws of probability tells us that if you run this experiment enough times, at least once you’ll randomly have all high-tolerance people in the alcoholic beer group and at least one other time you’ll have all low-tolerance people in the alcoholic beer group.

What people are measuring by p-value, in almost all scientific experiments, is how likely this kind of skew is. They are not measuring the effect of, for example, random errors in the actual measurements taken. (Conveniently for the researchers, those are always assumed to be perfect, or at least to always balance out.)

This is why the question I started this post with is a trick question: it’s got all of the information that you’d find in a random trial, presented in roughly the way you’d get it in a random trial, except there was no random selection. Within the hypothetical of the question, the entire population we care about—the two basketball teams—was measured. If you want to be super nit-picky, you can say that the p-value is zero since there was no chance of this being produced by random selection, in the same sense that the probability of a coin set down on the table with the head facing up turning up tails is zero.

But the thing is, there are scientists—evidence points to it being an awful lot of scientists—who don’t actually know this is what they’re doing when they run a p-value calculation. And, of course, there are contexts where this isn’t awful. If they’re copying a format for experiments where this happens to work, well, then, it happens to work. But because they don’t understand what it’s about, they don’t realize that the p-value is the absolute floor for how meaningless the result is. That is to say, if the scientist does a randomized (controlled) trial and gets a p-value of .045, which is below the industry threshold of .05, this means that there is a 4.5% chance that the results are meaningless if everything else is perfect.

Imperfections in instruments, fluctuations in the things to be measured, defects in the measuring tools, biases in the researchers, flaws in the study design—these all raise the chance of the results being meaningless, potentially by quite a lot.

Of course, if you want to be cynical about it, it’s in scientists’ best interests to not know any of this.

Programmed to Overeat?

One of the causes that you will see put forward as to why so many people are overweight, fat, or obese is that we evolved for a food-scarce environment and now live in a food-rich environment, so our natural inclination to eat everything available and store fat for the lean times is no longer adaptive. This hypothesis has a natural conclusion about how to not get fat: limit what you eat and always be hungry. To lose weight, limit what you eat even more and always be hungrier until you’re thin, then just limit what you eat and always be hungry.

Like the idea that carbs are more filling that fats because carbs have 4 Calories per gram while fats have 9 Calories per gram, so carbs take up more room in your stomach, this is one of those ideas that’s strange that anyone says with a straight face, at least if they’ve spent more than a few days living as a human being. Because if you have any experience of living as a human being, this is just obviously false. And there’s a super-obvious thing which disproves both: dessert.

Observe any normal people eating dinner and they will eat until they are full and don’t want to eat anymore. Then bring out some tasty treats like pie, ice cream, etc. and suddenly they have room in their stomach after all. This simple experiment, which virtually all people have participated in themselves in one form or another, irrefutably disproves both of those hypotheses.

You can also easily see this if you have any experience of animals which actually do eat all food that’s available until they physically can’t, such as the cichlid fish called the Oscar.

By Tiia Monto – Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=53098090

You feed oscars feeder fish, and they will keep eating them until there is no more room left in their stomach, throat, and mouth. They, literally, only stop eating once their mouths are full and fit nothing more in them. They then swim around with several tails sticking out of their mouth until their stomach makes room and they can move everything down.

That’s what a hunger signal with no feedback mechanism to stop because the creature evolved in a food-scarce environment looks like. (Oscars who are fed a lot grow extremely rapidly and very large.)

But you can also disprove this from the other direction. Yes, lots of people are fat, but they’re not fat-mouse fat.

Fat mouse was created by lesioning the part of the brain responsible for satiety. Fat mouse then kept eating and eating, without stop, rapidly ballooning into nearly being spherical. (Incidentally, are we to believe that normal mice eat have a satiety limit to their eating because mice evolved in a food-rich environment? When you look at field mice, is “abundant food” really the first thing that comes to mind?)

Now, it’s possible to attempt to save the food-scarce-environment hypothesis by modifying it, saying that we’re genetically predisposed to being fat and unhealthy because that worked out in a food-scarce environment, but not too fat, for whatever reason. This suffers from being arbitrary, but then it is the prerogative of evolution to be arbitrary (obviously nothing needs to make any sense if you’re an atheist, but for the rest of us the influence of fallen angels on evolution, within the limits God permits them to work, has the same result—that’s one of the things that confuses atheists).

Of course, the problem with even this modified hypothesis is that there are plenty of naturally thin people and if you talk to them they’re not constantly hungry and denying themselves the food needed for satiety at every moment.

There’s also the problem of the timing of the rapid fattening of the population. Yes, it took place at a time when food was abundant, but there have been sections of the population for whom food is abundant as far back as there is recorded history. They were not all obese. More recently, in the 1800s, upper middle class and rich people could easily afford enough food to get fat on, yet they were not all obese. And in much of history, when food was scarce, people’s preferences in women were for plump women. Just look up paintings of Venus:

Which makes sense in that context—when people mostly don’t have enough food, women who manage to be plump in this environment are healthier, can have more children, survive the rigors of pregnancy, take care of the children, etc. Hence when painting a goddess of beauty, they painted her to the standards of their day and made her plump. But they didn’t make her obese.

To be fair, you can find the venus of willendorf:

By Oke – Own work, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=1152966

But this dates to a time (30,000 years ago) from which food was supposed to be scarce and—so the hypothesis goes—no one actually looked like that because they were in the environment their constant food cravings were adapted to.

Ultimately, what I find so odd about the programmed-to-overeat hypothesis of modern obesity is not that it’s obviously false. It’s that it’s obviously false and the people who push it have clearly never considered the evidence against it.

You don’t see this with, for example, Young Earth Creationists. They have explanations for why radio-isotope dating doesn’t work and how geology is all wrong and fossil records are being misinterpreted because the dinosaurs were all animals that didn’t make it onto the Ark, etc. etc. etc. Say what you want about Young Earth Creationists, they at least take their ideas seriously.

As far as I can tell, the people saying that we’re programmed to overeat are just saying things.

Stupid Things Said About Saturated Fat

Dietary saturated fat has been blamed for all manner of health problems, but the evidence for this ranges from low quality to complete garbage. That the evidence quality is low is not surprising, since there are good reasons to believe that saturated fat is healthy for humans.

The first and most important reason is that saturated fat is the kind of fat that humans make if they have extra carbohydrates or proteins around and need to store the energy. And that’s going to be a large fraction of the carbs we eat. And when I say a large fraction, I do mean large. A 200 pound athlete would be able to store about 500 grams of glycogen in his muscles and another 100 grams in his liver. (And less than 10g of glucose in his bloodstream, which tends to be nearly constant anyway, so we can ignore this.) But the thing is: these are very rarely empty, especially if one regularly eats carbs. And if you’re following any kind of normal American diet, you’re eating a lot of carbs. If you follow the USDA food pyramid and eat a 2000 Calorie diet (which is the Calorie requirements of a small person who isn’t very active) you’re probably eating at least 250 grams of carbohydrate per day. So your glycogen stores will start off mostly full, and while your body will try to get rid of the glucose by using it in muscles, in your brain, etc., it can’t do that very quickly and needs to get rid of the glucose very quickly, so the overwhelming majority of it will get converted to fat. (This is less true for people who spend most of the day moving, such as people who work some kinds of manual labor jobs, but that’s not typical. And humans love to rest after eating.)

(Whether a large fraction of the protein one eats gets converted to fat depends on whether one gets an unusually high amount of protein in one’s diet. Most people can’t use more than about 1 gram of protein per pound of lean bodymass per day, but most people also eat less than that in protein.)

Oh, I should mention that it’s actually very normal for the human body to use fat as fuel. When insulin isn’t high to try to make cells take up glucose, and in that process suppressing the fat cells from putting fatty acids into the blood, our fat cells regularly break fat (which is insoluble in water) down into fatty acids (which are soluble in water) and put them in our bloodstream so we have a constant, dependable supply of energy. Like anything which can be said about biology in human language this is a massive oversimplification, but at its level of generality it’s correct and important.

Anyway, the primary output of denovo lipogenesis (making fat from scratch) is palmitic acid, which is a saturated fatty acid. This can be converted into other fatty acids such as stearic acid (another saturated fat) and oleic acid (an omega-9 unsaturated fat) and many others, but human beings—and mammals in general—tend to leave it as palmitic acid, then take three of them and attach them to a glycerin spine, making them fat. We do this because it allows them to store very compactly without needing any water around them, which is extremely weight-efficient. This is important for animals because moving weight requires energy, so the lighter we can store the energy the more efficient it is. Saturated fats pack together especially well, which is why animals with very high energy needs like mammals prefer them.

So believing that saturated fat is bad for us requires believing that our bodies turn most of the carbohydrates we take in into something that’s bad for us.

Incidentally, this all happens in the liver. Since fats are insoluble in water (they don’t form a solution; this is why oil floats at the top of water rather than dissolving in it like salt), the liver can’t get these fats to the rest of the body by just sticking them in the bloodstream. That would be a disaster. So it creates transport crates for the fats called “lipoproteins”. These start out as VLDL—Very Low Density Lipoprotein. They’re very low density because they’re crammed full of fats, which is less dense than water. These transport crates are then dumped into the bloodstream where the proteins on the outside enable it to interact nicely with the water in our blood and move about without causing problems. These transport crates do something which can be analogized to docking at cells and then the cells take some of the fats inside. As this process happens the lipoproteins shrink and their density goes up. Thus they eventually turn into plain old “LDL” (low density lipoprotein). Interestingly, High Density Lipoprotein (HDL) is not caused by them becoming depleted; instead HDL is made empty in the liver and sent out to collect cholesterol and related molecules.

Interestingly, dietary fats get transported by a different system. The intestines create a similar but larger kind of lipoprotein transport crate called a chylomicron. These shuttle dietary fats from the intestines through the blood to our cells.

In both cases, you can see that idea that “saturated fat congeals and clogs your arteries” is nonsense, even apart from saturated fat congealing at room temperature, not body temperature. The most liquid fat in the world would be terrible to have in one’s blood since it doesn’t mix with water, and the human body doesn’t do that. The fats don’t matter at all as they’re being transported.

Where they can matter is once they’ve been added to fat cells and the fat cells break them down into fatty acids and put those into the blood. (This is a tightly regulated process to make sure that energy is available at all time.) That’s because these fatty acids, in addition to being an energy source, also are precursors for hormones and also can interact with various receptors. (This is where things like omega-3 versus omega-6 come in.)

This is also why you see claims that eating large amounts of saturated fat induces insulin resistance in rats. Now, before we proceed, I do want to mention that it’s important to remember that, while animal models can be useful, rats aren’t humans and their exact dietary requirements are a bad guide for the ideal diet for human beings. You shouldn’t feed bears, pigs, dogs, or cats like rats for optimal health, and there’s no reason to believe should feed us like rats (or bears, pigs, dogs, or cats), either. (You can’t feed us like cows—we’re not build to get a meaningful number of Calories from fibrous plant matter.) So these studies on rats are, at best, interesting. That very large grain of salt taken, what the studies find is that various kinds of fats which are pro-inflammatory, when taken in large quantities, promote inflammation which can induce insulin resistance. The study I linked to found that the effect went away for saturated fat if the rats were fed about 10% of their fat as fish oil, which is rich in omega-3 fatty acids like DHA and EPA, which are anti-inflammatory. That is, it’s all about the net effect of the entire diet, not one particular component and not about the fact that the fats are fats. (Again, in rats; how pro- or anti-inflammatory the various fatty acids are in humans may be similar or very different, on a per-molecule basis. And there’s probably significant individual variation, too.)

Inflammation, by the way, is not at all bad. Inflammation is a very useful reaction; it’s how our bodies deal with damage such as clotting in a cut, immune responses to foreign invaders, muscle damage from exercise, and so forth. The problem is when pro-inflammation factors dominate to produce more inflammation than is necessary for the circumstances. Quite a few problems happen when a balanced system becomes imbalanced.

Incidentally, while palmitic acid (the dominant fatty acid in mammal-produced fat) seems to be mildly pro-inflammatory, omega-6 fatty acids may be significantly more pro-inflammatory. And they’ve been making up a much larger proportion of western diets—especially of American diets—since the introduction of corn oil and other heavily processed seed oils.

About That Olly Murs Before-And-After Poll

I saw a bunch of commentary, back and forth, on a poll asking men and women whether Olly Murs (a British celebrity) looked better before or after he dieted down to very low bodyfat levels:

Twitter polls are hardly wonderful, but if we assume honesty, men voted 2:1 that he looked better after the transformation and women voted 3.8:1 that he looked better before the transformation. To be fair, neither is a wonderful picture of Mr. Murs; both could be done with more flattering lighting and posture.

The big problem is that neither of these photos is really the ideal body composition in the world in which we actually live. The body composition on the right shows off his musculature better, but he’s into levels of leanness which are sub-ideal for actual living. On the left, he is carrying a bit of fat more than the ideal. That said, I think he looks far more likely to be a good husband and father on the left, and I suspect that this is the question most women were answering (who would ask which photo looks better for selling underwear, or some other purpose?), and the reasons for this are interesting.

The first thing we need to consider is that the general preference people have for leanness in modern America (perhaps in the West more generally) is actually contextual. We live in a time of abundance and there are various types of being unhealthy that cause people to become fat, so not being fat is an excellent marker for generally good health (including a reasonable mental understanding of health and a general pattern of decent-enough food and exercise choices). However, in places where food is scarce, people tend to find bodyfat more attractive than leanness, and rightly so, because in a food-scarce context, bodyfat shows a lot of very good things about your health and ability to take care of a human being (you). In some studies in such cultures, men rate women with a BMI that technically makes them obese above women with a bodyfat level that we in modern America would call healthy.

So taking this into account, what does the extremely lean body composition on the right tell us in this context? Well, it’s hard to achieve and has no real benefits (unless you’re a rock climber or similar kind of athlete). Further, it’s bordering on (if not beyond) the level of leanness where health is severely impacted; it would not be surprising if his testosterone levels are depressed, for example. (If they aren’t, they will be if he gets a little leaner.) His body probably won’t like being this lean at his age, which means he’ll probably have a harder time dealing with stress and being cheerful. Also, he’s probably stronger in the photo on the left. It’s not understood why, but the typical experience of people who started out less than completely obese and lose fat is that they get weaker. I can speak to this from personal experience—I once trashed my performance at a powerlifting meet by losing around 20 pounds in 5 months (the meet was right after), and I wasn’t even lean at the end of that. The human body just really dislikes caloric deficits and really dislikes not having sufficient fat reserves. And since we are our bodies and aren’t a ghost in a machine, that affects our psyche, too.

There’s also the issue that human beings simply have limited time, effort, and willpower. We’re finite creatures. So if a man is spending a lot of his effort and willpower being extremely lean, he’ll have less left over for other things, like social interactions.

Plus there’s the issue that concern for appearance simply has different connotations in men and women. It can be taken to the point of vanity in either sex, but it takes a greater amount of concern in women to reach the level where it is the sin of vanity. There are complex reasons for this, but the easiest to understand is that since a woman directly uses her body in very intensive ways to care for her young children (especially when they’re inside of her), she needs to put more effort into ensuring that her body is in good condition. And care for appearance is a way of signaling this, including to herself. We human beings do not know ourselves perfectly and so having habits that require spare energy to do are useful signposts to ourselves that we’re taking care of all of the important stuff. If we let those things go, it might well be because there are bigger problems, warranting investigation. This is instinctual, of course, not conscious, but it’s none the less practical.

Males use their bodies to care for their wife and children too, of course, but less directly, which means that far more workarounds are possible (e.g. you can grow crops even if you have to hobble around on a wooden leg by making up for it with extra strength in your arms and cleverness in making tools). And we tend to use our bodies in fewer ways, meaning we can concentrate more energy and effort into those ways. This specialization and indirection mean that we can be tougher and also, within limits, substitute skill for health. And so we need to take less care of ourselves (not none), and consequently need to sign-post it less. Thus it takes less for males to cross the line into vanity.

Now, none of this is to say that Mr. Murs was at the peak of attractiveness in the “before” picture. In it, he’s carrying a little extra fat beyond the minimum necessary for optimal health. If he’d lost only about fifteen pounds, almost everyone would agree he looked better. But with the two options we have, he looks happy in the picture on the left and unhappy in the picture on the right. And that counts for a lot.

I think part of how to get at this is to ask the question: which guy looks like you would want him as a friend? I know that for myself, the guy on the left looks like he’d be a lot more fun to hang out with.

Science Is Only As Good As Its Instruments

There’s a popular myth that science progressed because of a revolution in the way people approach knowledge. This is a self-serving myth that arose in the 1600s by people who wanted to claim special authority. This is why they came up with the marketing term “The Enlightenment” for their philosophical movement. If you look into the actual history of science, scientific discoveries pretty much invariably arose a little while after the technology which enabled their discovery was invented.

There is a reason we did not get the heliocentric (really, Copernican) theory of the solar system until a little while after the invention of the telescope. There is a reason why we did not get cell biology until a little while after the invention of the microscope. If you dig into the history of specific scientific discoveries, it’s often the case that several people discovered the same thing within months of each other and the person we credit with the discovery is generally the one who published first.

This is not to say that there are never flashes of insight or brilliance. So far as I can tell Einstein’s theory that E=mc2 was not merely the obvious result of measuring things using new technology. That said, it would almost certainly never have happened had radioactivity not been discovered a decade earlier, which would not have been possible without certain kinds of photographic plates existing (radioactive decay was discovered by Henri Becquerel and Marie Curie in the 1890s as they were studying phosphorescence and exposed photographic plates wrapped in black paper, which showed that something else was going on besides phosphorescence, many further experiments clarified what was going on by the time Einstein was working on the mass-energy equivalence).

Which gets me to modern science: there are a lot of things that we want to know, for which the relevant technology does not seem to exist. Nutrition is a great example. What are the long-term health effects of eating a high carbohydrate diet? How can you find out? It’s not practical to run a double-blind study of one group of people eating a high carbohydrate diet and the other eating a low-carbohydrate diet for fifty years. The current approach follows the fundamental principle of science (assume anything necessary in order to publish): it studies people for a few weeks or months, and measures various things assumed to correlate perfectly to good long-term health. That works for publishing, but if you’re more concerned with accuracy to reality than you are with being able to publish (and if you’re reading the study, you have to be), that’s more than a little iffy. Then if you spend any effort digging into the actual specifics, let’s just say that the top ten best reasons to believe these assumptions are all related group-think and the unpleasantness of being in the out-group. (Please actually look into this for yourself; the only way you’ll know what happens if you don’t just take people’s word for something is by not taking their word for it, including mine.)

And the problem with science, at the moment, when it comes to things like long-term nutrition is that the technology to actually study it just isn’t there. (It’s different if you want to study things like acute stimulation of muscle protein synthesis related to protein intake timing or the effects on serum glucose in the six hours following a meal.) And when the technology to do good studies doesn’t exist, all that can exist are bad studies.

This is why we see so much of people turning to anecdotes and wild speculation. Anecdotes and wild speculation are at least as good as bad studies. And when the bad studies tend to cluster (for obvious reasons unrelated to truth) on answers that seem very likely to be wrong, anecdotes and wild speculation are better than bad studies.

That doesn’t mean that anecdotes and wild theories are good. It would be so much better to have good studies. But we can’t have good studies just because we want them, just as people before the microscope couldn’t have cell biology no matter how much they wanted it. The ancient Greeks would have loved to have known about bacteria and viruses, but without microscopes, x-ray crystallography, and PCR, they were never going to find out about them.

As, indeed, they didn’t.

The World’s Top Scientists and Doctors

There’s a cartoon going around which shows a man pointing at his computer and calling out, “Honey, come look! I’ve found some information all the world’s top scientists and doctors missed!” It’s been roundly and deservedly criticized, but I’d like to focus on a few points I haven’t been touched on.

The first point is the level of generality that is used (“all the world’s”) when “top” scientists and doctors are all specialists. If the guy may have discovered some information about whether dietary fructose causes insulin resistance, what does it matter whether the world’s greatest geologists don’t know this? Who cares whether the best heart surgeons know it? Would anyone be surprised if the world’s greatest ophthalmologist knows nothing about it? The cartoon makes it sound like tens of thousands of brilliant people have all been studying the exact question the guy has been researching, but the reality of specialization is that the number of people who are actively studying whatever exactly the guy may have found may well number less than a dozen. There’s no guarantee that this small handful of people are among the best and the brightest, except in the narrow sense that someone who took bronze in a competition with only three people in his division is the best in the world who showed up at that meet.

This, of course, is even assuming that anyone is actively studying the field. The inclusion of “doctors” suggests that what the man has found relates to health, and the number of things being studied in health is absolutely dwarfed by the things that there are to study. It’s entirely possible that there are no experts in the specific subject that the guy believes he’s found information in because no one has funded research into it in the last twenty years. And even if they had, it’s entirely possible to be an expert in only one aspect of a subject; a scientist who conducted the world’s greatest trial on the effect of aspirin in reducing heart attack incidence may be completely ignorant as to whether it’s effective for treating lower back pain.

Then we come to the thorny problem that many people are not courageous enough to consider: who has declared these people to be the world’s top scientists and doctors? Was it themselves? In theory, there is no one more qualified to identify the best in a field than the best in the field. But, of course, a man saying that he’s the greatest is worthless. So is it the world’s average doctors and scientists? But how do they know that these other people are better than they are? How did they even form this opinion? Where would a heart surgeon get the information necessary to know how good another heart surgeon is? Do they, in their copious free time, watch each other perform surgery? And what of researchers? Are we to suppose that scientists drop in and conduct audits of each other’s labs to see how well they’re actually conducting their research? Or does this all come from people who are not experts at all, observing? That might be valid for doctors like heart surgeons for whom we can collect easily evaluated data such as “how often was the surgery successful” and “how often did the patient die on the table”. Though even there, any system which relies on measurement can be gamed. A surgeon can look fabulous by only accepting the healthiest patients compared to one who takes on the riskiest patients. And most fields in science and medicine do not admit of even this kind of measurement. No one expects everyone with chronic back pain to become pain free, and the only reliable way to judge a doctor’s nutritional advice is to wait until all his patients die and see how old they were, and what their qualify of life was over the years. Since they may well outlive the doctor, this is useless.

So suppose you find a doctor who says that fructose induces insulin resistance and you need to limit your sugar intake, while a government-sponsored doctor says that you should eat as much fructose as you want but limit your fat intake. How do you know that the government-sponsored doctor is the top doctor and not merely the doctor with the best political connections? How do you know that the doctor with the plain office is not, in fact, the top doctor, in terms of ability?

People really want infallible oracles that they can query for whatever knowledge they want, but it’s just not available.

And, truth to tell, even if they found it, most people would reject it because they wouldn’t like the answers that it gives.

Calories In vs. Calories Out

When it comes to health and fitness, and in particular to reducing the amount of fat on one’s body, the dominant story within our culture, at least from the sort of people who present themselves as experts, is that fat gain or loss is just Calories-in-vs-Calories-out so just take however many Calories you burn and eat less than that until you’re thin.

Now, obviously there is something truth to this because if you stop eating you will waste away until you die, and you will be very thin shortly before your death. (Though, interestingly, if you autopsy the corpses of people who’ve starved to death you will find tiny amounts of fat still remaining.) Of course, the problem with just not eating until you’re thin is that starvation makes you unfit for pretty much any responsibilities and it’s also bad for your health. (Among many problems, if you literally stop eating your muscles will substantially atrophy, including your heart.)

So the big question is: is there a way to eat fewer Calories than you burn while remaining a functioning adult who can do what the people you have responsibilities to need you to do, which doesn’t wreck your health?

The good news is that there are methods that accomplish this balance. The bad news is that (at least as far as I can tell) there’s no one method that works for everyone.

Since this post is about the Calories-in-vs-Calories-out mantra (from here on out, Ci-Co), I’m only going to discuss moderate Calorie restriction—oversimplifying, aiming for a deficit that results in about a half a percent of bodyweight reduction per week, for a period of 6-12 weeks, before returning to maintenance for an approximately equal length of time. (This is a version of what bodybuilders do and they’re probably the experts at losing fat because bodybuilding can be described, not entirely inaccurately, as competitive dieting.)

Now, at first glance, this isn’t too far off what the Ci-Co people seems to be saying. However, it’s very different in practice, and those differences will be illuminating, because they’re all things that the Ci-Co people get wrong.

The first big problem with trying to implement Ci-Co is: what on earth is your daily Calorie expenditure? There are highly accurate ways of measuring this which are extremely expensive with most being infeasible outside of a laboratory. Apart from that, there’s no good short term way. The best way—which is what bodybuilders do—is to carefully measure your Calorie intake and your weight over a period of time, then see what your weight does, and calculate your Calorie expenditure from your intake plus what your weight did. For example: suppose you take 3000Cal/day and over 14 days lost a pound. A pound of fat contains roughly 3600 Calories, so your actual expenditure was 3000 + (3600/14) = 3257. From there you can refine your intake to achieve what you want. (Bodybuilders also have phases where they put on muscle, which means gaining weight, so they will have to eat at a surplus to provide energy for building the extra muscle tissue.)

This looks nothing like what the Ci-Co people suggest, which usually amounts to either taking the USDA random-number of 2000 or else using an online tool which estimates your Calorie expenditure from your height, weight, and some description of how active you are. These are generally accurate to +/- 50%, which is not obviously distinguishable from useless. Using myself as an example, entering 6′ and 215 pounds with high activity, it estimated my maintenance Calories as 2900 and a weight loss target of 2450. I’ve actually been using the MacroFactor app to track approximately 100% of what I eat and weighing myself every morning when I wake up. It estimates my maintenance Calories as about 3900 Cal/day and I’m losing a little over a pound a week with a target Calories of 3200 Cal/day. On days when I eat about 2800 Calories I go to bed hungry and am very hungry the next day. If I tried to lose weight at 2400 Cal/day in a week or two I’d be constantly ravenous, unable to concentrate, barely able to do my job (I’m a programmer), and miserable to be around.

Because here’s the thing: the human body can tolerate small (consistent) Calorie deficits without worrying, but if they become too large the body freaks out and concludes that something very, very bad is going on and the top priority for the foreseeable future is getting through it. That means two things, both very bad for losing fat:

  1. Spending all your waking hours trying to find enough food
  2. Reducing your Calorie expenditures as much as possible to conserve what energy we do have until the bad times have past.

The second point is probably the bigger deal. What the CiCo people don’t realize is that your Calorie expenditure is nowhere near fixed. If your body thinks it’s a good idea, you can maintain on a surprisingly large number of Calories. If your body thinks it’s a good idea, you can maintain on a surprisingly small number of Calories. The former looks like having a lot of energy and feeling good. The latter looks like being tired and cold all the time.

Even worse, there is reason to believe—though this is nowhere nearly as well established—that if you make your body freak out and think it needs to survive a famine too many times, it will start to prepare for the next famine as soon as food becomes readily available again, much as people who’ve been broke a few times and also had good times tend to live like misers and save money the next time things go well. (In the the case of your body, this means gaining the fat you will need to survive the next famine, just like bears put on a ton of fat in summertime in order to get through the coming winter.)

This is why the other critical part of how bodybuilders diet is that they only do it for 6-12 weeks at a time, then take long maintenance breaks at their new weight. (The variability because they pay attention to how their body reacts and if it seems to be starting to freak out, they stop losing weight and start maintaining so it doesn’t have to adapt to the diet—there are many factors which go into how long it’s possible to diet before the body starts to freak out.) This relatively short fat-loss window ensures that the body never goes into surviving-famine mode. And the maintenance Calories are not a fixed number, either. They can easily increase for a few weeks as your body gets used to the extra food and raises your metabolism because it seems safe to do so.

When you put this all together, it’s why the Ci-Co people give the laws of thermodynamics a bad name. It may be perfectly true that losing weight is the result of one number that’s not easy to measure being lower than another number that’s impractically expensive to measure and impossible to usefully estimate, but knowing that that’s true has no practical value.

For a much more entertaining take on a closely related subject, check out Tom Naughton’s post Toilet Humor And The HOW vs. WHY Of Getting Fat.


This post was about the problems with Calories In vs Calories Out, but I would be remiss to point out that everything I said up above about how bodybuilders reduce fat is predicated on having a reasonably well-regulated metabolism to begin with. There are all sorts of ways for the human metabolism to become disregulated and if yours is disregulated your odds of successfully reducing fat are much lower until you figure out what’s wrong and fix it. In my own case, I’m about 99% certain that at times in my life I’ve induced insulin insensitivity in my body through excessive fructose consumption. (I can eat a pound of chocolate for lunch if I let myself and there was a period back when I was in grad school when I was drinking full-sugar Mountain Dew and eating cake mix out of the box with a spoon. That stuff has more sugar and flour in it. This is during a period when I was unemployed and depressed as well as young and dumb, and I had yet shaken off being raised during the low-fat craze of the 1980s and 1990s.) I believe some extensive low-carb eating has allowed my body to mostly reset its relationship with insulin and at this point I’m only willing to eat candy/ice cream/etc. on Christmas, Easter, and my birthday. That said, when I’m cutting (reducing fat), I find it much easier and more successful if I go back to eating low carb or even keto.

That’s me; I suspect that many people are in a similar boat because fructose is way more common in processed food than people normally realize and it’s reasonably well established that extremely high fructose consumption (much higher than anything you’d get from any reasonable intake of fresh fruit, btw) can induce non-alcoholic fatty liver disease, which seems to have a causative relationship with insulin resistance/metabolic syndrome. That said, this is not everyone who’s got excess fat. There are tons of things that can go wrong to disregulate one’s metabolism/appetite, some of them dietary, some of them endocrine, and some I don’t even begin to have an idea. The human body is unbelievably complex and there are a lot of ways it can malfunction. There’s really no substitute for trying things and seeing what works. And at least we know that it’s a good idea to get regular exercise no matter how much excess fat you’re carrying. It may not make you lean, but it will certainly make you healthier and happier than if you don’t do it. After the first few months.

Oh yeah—and I’m no expert, so please do your own research and don’t take my word for it.

What Makes an Expert

I was recently re-watching the 2009 documentary Fat Head, mostly for nostalgia because I enjoyed it and it did me a lot of good back when I watched it circa 2010.

If you haven’t seen it and are curious, it’s available (officially, from its distributor) on YouTube. (Weirdly, it’s age-restricted so I can’t embed it.)

This was back when the documentary Super Size Me blaming McDonalds for people being fat was only five years old and people still remembered it. Fat Head was a response-documentary criticizing Super Size Me, but it actually spent more of its time discussing the lipid hypothesis (the idea that fat and especially saturated fat causes heart disease) and the problems with it. Throughout the documentary, Tom Naughton (the filmmaker and narrator) continually refers to “the experts,” by which he mostly means the people who give official advice, such as the USDA giving food recommendations or various medical organizations telling everyone to reduce their saturated fat intake as much as possible.

“Expert,” of course, ordinarily means a person who is extremely knowledgeable in a subject or very good at it. But “expert” is also a social designation for special people to whom ordinary people are supposed to defer, generally with the assumption that they are expert in the first sense. But this introduces a problem: how do you know that someone is an expert in the first sense?

The easy way to do this is to be an expert yourself. Expertise will generally be good at recognizing expertise, as well as recognizing what is not expertise. That’s great, but if you’re an expert yourself you don’t need to know who else is an expert so you can defer to them.

So what if you’re not an expert?

Well, it gets a lot harder.

You can, of course, punt the problem to someone that you trust, but that is a general solution: it works for literally every question. How do you calculate the circumference of a circle given its diameter? Ask someone you trust.

But let’s assume, for the sake of discussion, that you want to find an expert and aren’t going to just have someone else do the work. How can you do this—again, assuming that you, yourself, are not an expert?

It certainly cannot be in the same way that an expert would, that is, by evaluating how the person does what they do. There is something left, though it’s not nearly so efficient: you can see whether the person can achieve what only an expert could achieve.

In most of the places where this is possible, it’s fairly obvious. If you want to know if a man is an expert archer, you ask him to shoot at a few things which are very difficult to hit. If you want to know if a man is an expert lock pick, you ask him to pick a difficult lock.

There are some intermediate situations, which do not admit of demonstrations which only take a moment. If you want to know if a man is an expert painter, it is not practical to ask him to go to all of the trouble of painting a painting in your sight. But you can ask him to show you paintings which he has painted, and then after he shows you some impressive paintings you have only the ordinary problem of finding out whether he’s an honest man and really is the one who painted them.

But then we come to problems which are far more difficult. How can you tell if a man is an expert teacher? The only practical effect of a good teacher is a learned student. If you have access to the students to test them, you mostly can only tell in the negative—a student who obviously knows nothing—since the whole reason to seek out a teacher is to be taught. (There are exceptions for things such as being an expert in Greek but not in teaching Greek, and you want to find an excellent teacher for your child. Let us set that aside as a special case which is easier than the one we’re trying to deal with.) However, even in the best case this is not a pure evaluation of the teacher because the end results also depends upon the quality of the student. This is clear in the case of athletics. Some people have bodies which are proportioned exceedingly well for the sport and when this is married to a disposition which finds physical activities intuitive, they would come to be very good in their sport regardless of who their teacher is; an excellent teacher will make them better but a bad teacher will still make them good (unless he gets them injured).

Medicine is an interesting hybrid of this. It is possible to evaluate a trauma surgeon mostly based on results because how well one patches up a man after a knife would or a gun shot or a bear mauling does not depend very much on the constitution of the victim. It does depend on the wound, of course, but it’s not that hard to evaluate wounds based on criteria such as their rate of blood flow or the amount of the victim which is missing.

It is nowhere near as possible to evaluate an internal medicine doctor’s treatment of chronic conditions. The human body is an unbelievably complex thing—I mean that literally; most people can’t believe the complexity involved. Biology keeps on making new discoveries that things are more complex than previous believed. All of this complexity can go wrong, and there are far fewer kinds of symptoms. In short, we have no way of evaluating what is actually wrong with a patient or how bad it actually is. Not everything is fixable; how much that doesn’t get better is the fault of the doctor and how much is the fault of the disease? We have no way of knowing, certainly not for the purpose of evaluating the doctor.

So what about the kinds of experts who give health and nutrition advice?

The first thing to notice is that the time scales are not favorable. Being healthy over decades is a thing that takes decades, and that’s a really long time over which to evaluate someone’s advice in order to determine whether their advice is worth following. And we’ve also got a problem much like in evaluating internal medicine doctors: we’re talking about how to optimize an unbelievably complex system (the human body). Worse, though, is that this kind of advice is general, and the population itself varies. There’s absolutely no reason to believe that the same dietary advice is equally good advice for all members of the population. For all we know, Frenchmen do better eating baguettes than Germans do and Germans are healthier eating sausages than Frenchmen are. For all we know, there might be two brothers and one does well on pasta while the other will get fat and sick on it. At least internal medicine doctors treat individual patients; experts who give general advice on health and nutrition give the same advice to everyone. That might be fine—no one should eat uranium, for example—but it’s not obviously fine. For all we know (without be experts ourselves) universal dietary guidelines are intrinsically a bad idea that no true expert would do, just as no true fencing expert fences with reverse grip or by holding the tip and trying to thrust the hilt into his opponent.

But even if we grant the idea, for some reason, that a true expert would give general dietary advice, how do we evaluate the expertise of a particular expert giving it? The effect that we could measure would be the superior health and fitness of the people who follow this advice to what they would have had if they didn’t follow this advice.

OK, but how on earth do you measure that? How do you identify the people who follow the advice. How do you figure out how healthy they would have been had they not followed the advice?

That last part is important because it’s extremely easy for advice which does nothing to select for people who are generally superior. To give a silly but clear example: if you give advice on how to grow taller and it’s to dunk a basketball ten times a day, every day, and then measure the average height of the adherents and the average height of the non-adherents, you’ll find that the adherents are, in fact, taller. No taller than they would have been otherwise, but certainly taller than the non-adherents. Or if your advice for strength is to pick up a three hundred pound rock and carry it five hundred feet each day, you’ll certainly find that the adherents are stronger than the non-adherents, since only very strong people will even try to follow this advice. In like manner, if you recommend that people eat a pound of arugula a day, it’s quite possible that only people who are very healthy would even consider putting the stuff in their mouth given how much (if you don’t disguise its flavor with oil or sugar) it tastes like poison. (Because it is; the bitter taste of many plants come from natural pesticides they make in order to dissuade bugs from eating them. These are just poisons that have little to no effect on us since we’re mammals and not insects.)

The basic answer is that you can’t. Not to any important degree.

There’s a related issue to the question of “how can you tell if someone is an expert?” and that’s “how does someone become an expert?” It’s related because, oversimplifying, the way you become an expert is to evaluate whether you can do what an expert can do and then change what you’re doing until you can do those things. If there’s no way to evaluate whether you’re getting better at the things an expert could do, there’s no way to tell whether the things that you’re doing are making you any better, which means that there’s no way to actually become an expert. (I’ve oversimplified quite a bit; this really deserves its own blog post.)

So what does that mean for fields where it’s not possible to tell who’s an expert?

Effectively, it means that there are no experts in that field.

A Funny Place for Advice

I was recently at a pharmacy where there was a small TV tucked into a corner displaying something I found rather odd:

If you have a hard time reading the text, it says:

Things To Remember When Lifting Weights
When doing squats, remember to keep your knees behind your toes at all times. You also want to make sure that your back is straight and strong and your head is faced forward.
—The Ginger Marie Blog

Those who are familiar with how to squat properly will know that the advice to always keep your knees behind your toes is a myth. Nothing bad happens if your knees go in front of your toes and many people need their knees to go in front of their toes to get full depth—especially olympic weightlifters who regularly bottom out their squat (so called “ass to grass” squatting). Like all lifting, it’s a bad idea to suddenly do it with near-maximal loads instead of working up to it, of course—but that’s true of all ways of doing all lifts. Walking up to a lift you’ve never done before and maxing out on it is a useful ingredient in maximizing your injury risk—though it should be born in mind that strength sports have pretty low injury risks compared to most other sports. But still, do work up to your maximal lift attempts. You’ll also lift more that way.

Also, does anyone really need to be told to keep their head facing forward when squatting? I’ve never seen anyone even attempt to look over their shoulder while squatting.

That’s not really why I bring this up, though. A pharmacy is a very strange place to get strength training advice in a corner overlayed on top of a picture of people on exercise bikes. To give a sense of how odd this is, imaging walking into a powerlifting gym and behind one of the machines is a TV which shows a picture of technicians putting someone into an MRI machine and the text on top says:

Things To Remember When Taking Medicine: When taking an antibiotic, always drink a large glass of whole milk with it, finish the antibiotics course unless otherwise directed by a doctor, and stand upright on the ground while taking it.

This, by the way, is the front page of The Ginger Marie Blog, as of the time I’m writing this post:

I do not say a word against Ms. Ginger Marie, but I must confess I’m curious as to why this particular site was chosen as the place from which to get advice on proper squat technique.

Though, to be fair to Ms. Marie, when I try searching the site for the word “squats” I don’t come up with anything. A google search for “The Ginger Marie Blog” and “squats” and “knees” also turns up no results, though that may not mean much since Google has been pretty bad for the last year or two. Still, it’s possible that the random TV in my local pharmacy is misattributing its dubious advice to Ms. Marie.

I wonder if we can blame AI for this? Perhaps a large language model mangled a quote from a publication like Marie Claire and then mangled the attribution, as well. This seems like the sort of thing that AI might do.

As dystopias go, this is a much nicer one than what most dystopian movies portray.

Naturalistic Superstition

According to the Catholic Encyclopedia, there are four species of superstition:

  • improper worship of the true God (indebitus veri Dei cultus);
  • idolatry;
  • divination;
  • vain observances, which include magic and occult arts.

What most, or possibly all, of these have is the desire to control things beyond one’s power. Creating idols, for example, is the attempt to localize God (or some minor power) into a place where one can interact with it on one’s own terms, so one can convince it to do what one wants through worship. (Interestingly; this is the purpose of the golden calf—it is not supposed to be a strange god. Once it is cast the people said, “This is your God1, O Israel, who brought you out of Egypt.” The whole point is that they want to be able to worship it directly, rather than having to wait for Moses to come back down from the mountain.)

I will not waste your time, dear reader, pointing out how divination and vain observances are attempts to go beyond one’s power.

The exact same thing—the vain attempt to go beyond one’s own power—can be done in entirely naturalistic ways. From my observations, it behaves in exactly the same ways superstition. But we don’t have a word for it.

I suspect that we’ve all seen this sort of thing. Vitamins and other supplements are a very common form of it. Vitamins are real, of course, as are all manner of nutrients. But people attribute all sorts of powers to these things which they have no reason to believe that the things have, and with no curiosity whatever to find out what their real powers are.

People go from the fact that vitamins are miracle cures for vitamin deficiency diseases to holding that the vitamins have unlimited powers to confer their effects. They ignore that the vitamins work by doing something, and that the body does not need an unlimited amount of that thing. For example, vitamin C is used by the body in the process of making collagen (it’s just one of many things, but it’s noticeable here for our body not being able to make it). If you have no vitamin C, you stop being able to make collagen, and the parts of your body that need to make new collagen start to greatly suffer from not being able to make it. But contained in this is the natural limit to the effects of vitamin C: once your body has made all of the collagen it wants to make, more vitamin C does no good. (I’m oversimplifying, of course, because vitamin C is used elsewhere in the body, too, but to the best of human knowledge it’s the same story all over—once you have enough, your body can do what it needs to and more does nothing.) It’s like building a house. If you don’t have enough wood, you will build a rickety and drafty house. If you have twice as much wood as you need, you will have a well built house and a big pile of wood. If you have three times as much wood as you need, you will have an equally well built house and a pile of wood that’s twice as big.

Taking large amounts of vitamins as if their effect scales with their dose is directly analogous to superstition, especially to the improper worship of God (such as holding that if one says a prayer in a particular way it will automatically be granted exactly the way you ask for it). Then we come to other ways which are more analogous to divination and vain observances: attributing vague positive benefits to things.

Example of this sort of thing are saying that garlic is “anti-cancer” or that 5G makes chickens lay fewer eggs. Cancer isn’t even one thing, and there’s no reason to suppose that a somewhat improved packetization scheme for data in the radio transmissions used to transmit data to and from cellular phones could have any effect whatever on the way that chickens lay eggs. (I suspect that the fear of 5G was actually about millimeter-wave cell bands, but those are deployed in very few places because they’re so high frequency that they penetrate approximately nothing; on millimeter-wave bands standing in front of your cell phone is enough to have no reception. So far as I’m aware they’ve only been deployed in a few cities and in a few sports stadiums. Most phones don’t even bother incurring the expense of supporting millimeter-wave radio.)

The world is a strange place, we know very little about it, and all sorts of things have effects that we do not know that they have. The problem is not the supposition that effects we do not understand are occurring. The problem is the wild mismatch of certainty to evidence. This is selectively believing in our ignorance; it is believing in it only where one wants to. Is it possible that despite us having no idea how, garlic can cure all forms of cancer? Yes. But there’s just as much reason to believe that garlic causes cancer, or that garlic causes cancer if you take more than twice as much garlic as you eat olive oil, or that garlic causes strokes if you eat more of it than you eat oregano. Lots of things are possible. When one has moved from possible to probable or certain only out of the desire to achieve the effect, this is the naturalistic analog to superstition.

And I really wish we had a word for it.


1. Technically the Greek is plural and many English translations render it as “These are your gods,” but I suspect the translations which take this to be a plural of respect are the more likely to be correct. (An example of the plural of respect is a king saying “we” instead of “I”.) The Jews were certainly not monotheistic at this point, but it makes no sense for them to attribute the bringing them out of Egypt to multiple gods, and still less sense to call one calf multiple gods. No matter how you take it with respect to “theoi”, you certainly have the problem of the plural being used to refer to one thing in the calf.