How to lie with statistics I: confounders.

In 1954, Darrell Huff wrote the now-classic How to lie with statistics, an introduction to basic statistics for the layman. The book became a bestseller, and the title itself became a trope to such a degree that it has spun numerous spin-offs, one recent example being the excellent article How not to lie with ethnography (published in Sociological Methodology, link here).

Some time ago, the Danish newspaper Politiken published an article based on a “survey of more than one million meals”, finding that increased cheese consumption is associated with a higher BMI (article here, nb: in Danish) with a title explicitly stating that “More people become overweight as a result of eating cheese at breakfast”.

The article illustrates two statistical fallacies, of which I’ll focus on the second

  1. That a large sample size is in itself a guarantee of quality of the research,
  2. Comparison of groups differing on other parameters than the one we’re interested in.

I won’t be talking about 1) right now – although it is interesting how often one sees it – but rather 2).

Even though the study just measures body mass index and frequency of consumption of cheese, it is phrased in a strongly causal language. Furthermore, consumption is badly measured, as it is merely measures frequency and not intensity; gorging on an entire Brie once a week would be measured as less consumption than having two slices of Edamer twice or thrice a week.

Leaving this aside, we still have the problem that since we’ve measured only BMI and cheese consumption, we cannot state an unspurious correlation for the simple fact, that people who consume cheese more often may differ significantly from their comparison groups. They may be older, or they may prefer another form of cuisine.

The statement that increased cheese consumption leads to overweight may very well be true, since cheese is quite intensive in calories. But it remains under-identified, since we have no idea about the size of the effect, or indeed whether it really exists!
However, the observations are also possible in a situation, where cheese leads to weight loss.

The title of this blog post may seem misleading, because we have not identified whether cheese consumption actually leads to weight gains in the population. However, drawing from the Gettier basis we can at least conclude that even if what the originators believe is true, they are not justified in believing in it from this study – whether that should be counted as a lie may be more of a philosophical question.