andreas baumann, numbers guy.

statistics, religion, game theory, sociology.

How to lie with statistics I: confounders.

In 1954, Darrell Huff wrote the now-classic How to lie with statistics, an introduction to basic statistics for the layman. The book became a bestseller, and the title itself became a trope to such a degree that it has spun numerous spin-offs, one recent example being the excellent article How not to lie with ethnography (published in Sociological Methodology, link here).

Some time ago, the Danish newspaper Politiken published an article based on a “survey of more than one million meals”, finding that increased cheese consumption is associated with a higher BMI (article here, nb: in Danish) with a title explicitly stating that “More people become overweight as a result of eating cheese at breakfast”.

The article illustrates two statistical fallacies, of which I’ll focus on the second

  1. That a large sample size is in itself a guarantee of quality of the research,
  2. Comparison of groups differing on other parameters than the one we’re interested in.

I won’t be talking about 1) right now – although it is interesting how often one sees it – but rather 2).

Even though the study just measures body mass index and frequency of consumption of cheese, it is phrased in a strongly causal language. Furthermore, consumption is badly measured, as it is merely measures frequency and not intensity; gorging on an entire Brie once a week would be measured as less consumption than having two slices of Edamer twice or thrice a week.

Leaving this aside, we still have the problem that since we’ve measured only BMI and cheese consumption, we cannot state an unspurious correlation for the simple fact, that people who consume cheese more often may differ significantly from their comparison groups. They may be older, or they may prefer another form of cuisine.

The statement that increased cheese consumption leads to overweight may very well be true, since cheese is quite intensive in calories. But it remains under-identified, since we have no idea about the size of the effect, or indeed whether it really exists!
However, the observations are also possible in a situation, where cheese leads to weight loss.

The title of this blog post may seem misleading, because we have not identified whether cheese consumption actually leads to weight gains in the population. However, drawing from the Gettier basis we can at least conclude that even if what the originators believe is true, they are not justified in believing in it from this study – whether that should be counted as a lie may be more of a philosophical question.


Mark inflation in the Danish Gymnasium?

In 2007, the Danish high school (Gymnasium) was reformed, both with regards to grading and with regards to structure. I graduated high school in 2007, meaning I was marked on the old scale, but since I took a gap year, I entered university along with the first cohort of students marked on the new scale. An official scale for converting average marks was devised, and much was made of it’s fairness, and to be honest, I’ve never met anyone who had trouble due to the new scale.


At the same time as this, the gymnasium expanded in social scope: whereas in 2001 59% of students went on to some form of gymnasium (including the vocational varieties HTX and HHX), 71% went on to do so in 2012 – mostly at the cost of the vocational training. Naturally, a proportion of these students may be what we would call marginal students in term of academic aptitude – the gymnasium is the qualification required for entering most forms of tertiary education and the system is free (and students over the age of 18 receive a bursary).

This implies that everything else equal, we should expect the marks on average to fall when a larger proportion of a cohort enters. Similarly, the proportion of high marks (a 10, 11 or 13 on the old scale, a 10 or 12 on the new) should increase only slightly, since persons at this level of aptitude should have been expected to be well absorbed already at 59% of a cohort entering the system. But what really happened? Below is a picture of the development for the general gymnasium (STX). On the left, we see the proportion of high marks, on the right the average mark as recalculated using the official scale.


As can be seen, both increase. From 2005 to 2013, the proportion of marks that are in the high category increased by 71%. The average of marks increased much more slowly, but still indicative of mark inflation, giving what we should expect from the demographic considerations outlined above. In short, there is reason to suspect that the gymnasium is really suffering from mark inflation.

Why should we care? Since marks form the basis for university admittance, they directly impact life outcomes. In a situation with mark inflation, high marks become less informative, since they do not discriminate well among students. This means that mark inflation reduces the efficiency of using average marks as a way of allocating university placements.

I currently have a request for more complete data lodged with the bureau responsible for education statistics, so I hope that I will be able to revisit this more thoroughly.

Euphoriant substitution: a small step towards a much better world.

People want to have a good time; that much is a given. More stringently, we may argue that man is a pleasure-seeking animal and for this very reason, he will go to great lengths to achieve this pleasure. One of the ways is through the use of euphoriants (broadly construed), that is, chemicals used with the purpose of inducing pleasure. This much seems to be a given across cultures: if there is access to euphoriants, people will use them – and they will even incur great disutility to enable this use (case in point: “prison wine“, a truly horrible concoction brewed in prisons).


However, these euphoriants affect the body differently. They induce different highs, and some do great harm to the organism, while others are less harmful. They also differ in price. These differences form the basis for substitution in euphoriant use: I might choose to smoke a joint instead of drinking a pair of pints with a friend, because I have work tomorrow and know from experience that I tolerate cannabis much better than alcohol, etc.* This forms the basis of the utilitarian argument for marijuana reform: that because some people tolerate cannabis better, do less harm to their environment when using cannabis instead of alcohol, etc., they should be allowed to partake in the use of this. Simply put, cannabis is harmful, but if it is less harmful than alcohol in typical use, people substituting into cannabis use would be a societal good.


What I find interesting is that while pharmaceutical innovation is generally high, we have seen very little innovation from pharma companies when it comes to euphoriants. Certainly, there is black market innovation, which has led to the so-called “designer drugs”, but these drugs are plagued by the normal problems with drugs: no follow-up on effects, no testing prior to release, etc. The reason for this lack of corporate innovation when it comes to euphoriants is certainly not due to low ROI; the drug market is a large a market as ever. Rather, it is due to the fact that in many legal frameworks, it is the very capacity of inducing euphoria that is illegal, not the adverse effects – for example, in the Danish Law on Euphoriants (§1, link in Danish).


It is well-known that MDMA (ecstasy) has very few adverse effects, compared with other euphoriants**. For this reason, one could argue that MDMA should be legalised to allow more efficient substitution into this drug from more harmful drugs such as alcohol. However, my main point is that pharma companies should be able to develop varieties of, say, MDMA, which reduces the adverse effects even further and allows substitution into these drugs, reducing adverse effects much more. This would be a small, but important step, towards a better world. However, as long as it is the very ability to induce euphoria that is illegal, we will not see progress in this area.

*) This blog does not endorse the use of any euphoriant except mathematics.
**) Of course, the only euphoriant truly free of adverse effects is this.

A bit on methodology.

Let me start by telling you something you (probably) didn’t learn in science class; the reluctance to adopt the heliocentric model did not stem from religious convictions, but from the fact that for a lot of years, the geocentric models provided better fit for the observations. One of the main arguments for adopting the heliocentric model was not empirical, but theoretical: it was not credible that Nature should have been created with the tardiness of the epicycles. The heliocentric model was much more simple and beautiful. Of course, this preference for beautiful models led to reluctance to replace the perfect circles with the more mundane ellipse.


This conflict between theoretically beautiful and empirically functional models is everywhere in economics and social science. In economics, the two eminent economists Ronald Coase (of eponymous theorem fame) and Milton Friedman argued opposite positions. 


Friedman proposed a “black-box” view of scientific models in the article “The Methodology of Positive Economics” (1953), wherein not only one, but the only criteria for economic models is their ability to predict future data. This led him to view assumptions – the building stones of theories – not as statements of statements about the real world, but as “symbols”, wherein one could compress as much information as desired to make the theorems one intended to state stand out as particularly beautiful, as evidenced in this quote:

Truly important and significant hypotheses will be found to have “assumptions” that are wildly inaccurate descriptive representations of reality, and, in general, the more significant the theory, the more unrealistic the assumptions (in this sense)


Coase (in the article “How should economists choose?”) argued that theories are not merely black boxes to produce predictions; they should also inform our understanding of the real world, We evaluate our beliefs about the world through the validation of complex theories, and this may lead us to accept or reject certain assumptions. 


I’ve noticed a thing about presenting papers; when you present a paper to a gathering of economists, they’re keen on adressing endogeneity issues, functional form etc. However, when you present a paper to a gathering of sociologists, they typically attack the theoretical foundations of your work: are your models of how actors make decisions theoretically well-founded? Maybe the sociologists are the ones who really have taken Coase’s advice to economists to heart. 

Equating probabilistic and deterministic statements.

Equating probabilistic and deterministic statements.

I haven’t written anything on this blog for a long time, and I regret to announce that my return is marked by a link – albeit, a very great one.

Larry Wasserman has this wonderful post explaining the essence of Simpson’s paradox. Since the error of equating a probabilistic statement of the form “X is on average better than Y” with the statement “X is better than Y” is probably the most common of all misinterpretations of statistics, I urge you very much to read his post.

“How did you ge…

“How did you get a hold of my razor?! Quick, give it back, before somebody gets hurt!”

William of Ockham (reputed)

Testing the fairness of a coin II.

In my last post, I discussed how to test a coin for fairness – a common topic in introductory probability theory classes. Commonly, fairness is defined as \Pr(H)=\Pr(T)=0.5 \wedge \mathrm{Cov(X_i,X_{i-1})}=0. So far, so good.

However, consider the following Stata code

set obs  2000
egen x = fill(1 0 0 1 1 0 0 1)

which generates 2000 observations following the {1, 0, 0, 1,…} pattern. Then, we call easily generate a lagged variable and calculate the autocorrelation

gen lag_x = x[_n-1]
pwcorr x lag_x, sig

We find that x satisfies \Pr(X=1)= \Pr(X=0) = 0.5 and $\latex \mathrm{Cov(x_i, x_{i-1})} = 0$ (or rather, we do not find evidence to reject these two conditions). However, clearly the pattern isn’t random. The human mind spots patterns in data very easily, which is one of the (many, many reasons) you should always graph your data.

Actually, if we consider the 2-tuples (x_{i-1},x_i) instead, we find something funny. First of all, the 2-tuples lead themselves very easily to interpretation as binary representations. Secondly, when done so, the linear correlation between (x_i,x_{i-1}) and (x_{i+1},x_{i+2}) is equal to -1!^1. In other words, while there is no first-order correlation, the chain can be completely specified from a second-order function!

Certainly, this example is trivial. The human eye is indeed very keen to pick out such sequences. But what about a 237th-order generating function? Should our definition of randomness require that there exists no Xth order generating function for the sequence we’re talking about?

1) I find justification in this translational procedure because the translation is bijective, and – as such – there is direct (machinal) translateability between sequence 1 and sequence 2.

Testing the fairness of a coin.

One of the examples most oftenly used in the introduction of probability is the toss of a coin: we toss a coin, say, 1000 times, and then calculate the frequency of one outcome, such as heads. We then define a fair coin as having \Pr(Heads)=\Pr(Tails)=0,5. Now we often perform certain procedures to “test” whether the coin is fair – technically, we compute the probability of the observed proportion under the assumed model (“the fair coin”).

However, consider a coin that produces the following results: in odd throws, it will come up heads, and in even throws, it will come up tails. With this coin, we will have a long-run frequency of heads of one-half. So, we find no reason to reject the null hypothesis of “fair coin”.

However, clearly this coin doesn’t produce random results. While it is a fair coin with regards to the proportion of heads or tails, it displays strong auto-correlation. This means that, given the results of throw n, we are very confident in predicting throw n+1. This means that we should take one more aspect into consideration when determining whether or not a coin is fair: independence of the throws.


Now, consider series of 8 throws (I guess a mathematician would call them 8-tuples). What about finding the series “HHHTHT” in these 8-tuples? Clearly, there are 2^8 = 256 possible 8-tuples, of which the following contain the series:

  1. xxHHHTHT
  2. xHHHTHTx
  3. HHHTHTxx

There are 2^2 = 4 8-tuples of type 1), 4 of the second and 4 of the third – i.e., 12 8-tuples containing the series. We should expect that around 4,7% of the 8-tuples contain this series. However, it is possible to produce a number of 8-tuples (an outcome space) that makes the coin fair according to the standards above, even though it contains a much greater number of this series than one would expect – which leads me to my point:

We often call statistics the science of uncertainty (or randomness); but randomness is not a concept that is unique to statistics. Consider the information theoretical approaches to uncertainty: the two rules above for a fair coin is equivalent to maximizing the Shannon entropy over the parameter space (p,\rho). When moving ahead in statistics, we should always ponder what we really mean by “the null model”.

The end of liberal religion?

A classical theme in the sociology of religion is secularisation: the disappearance of religion as such in the modern world, the disintegration of the “sacred canopy” (Berger) or the “disenchantment of the world” (Weber). The secularisation thesis was pretty uncontested until ca. the 1970s, where the New Religious Movements drawing adherents from the counterculture of the 60s and the Iranian Revolution led scholars to question the dogma of disappearing religion.

However, even though religion might be returning, maybe we need to think about what forms of religion are disappearing and what forms remain. The characteristic forms of NRMs in the 60s were rather obscure religions such as the ISKCON (probably better known as the Hare Krishna movement), Transcendental Meditation or something like that. The Iranian revolution marked the upsurge of a more literalist Shia Islam. The Moral Majority of the 1980s wasn’t built on Episcopalianism or Presbyterianism, classic mainline American Protestant denominations; rather, it relied heavily on Evangelicals and Baptists.

Looking at the American or European religious landscapes of today, you notice something odd; the liberal religion of yesterday, the faiths of Tillich or Bultmann seem to be on the wane. The world of the future may be the world of Dawkins and Khomeini, not Tillich or Eisenhower.

How do we measure religiosity?

I’ve recently been thinking about how to accurately measure religiosity. Of course, this depends heavily on what your idea about religiosity is (what “religiosity” really means in your theoretical framework).

Say religion is a good, and religiosity is a measure of preference for spending ressources – time, money, devotion (an intensely scarce good) – on this good versus other goods or services. This is consistent with some of the common items used to measure religiosity, such as church attendance or incidence of prayer. This works both for an exogenous and endogenous conceptions of religiosity.

Your problem then lies in defining spending on religious goods and services. Is yoga religion? Is astrology? Imagine a situation where you want to measure consumption of soft drinks. “Soft drink” is a pretty stable and uncontested concept, so you don’t really have a problem in defining this. Measuring religiosity by measuring participation in traditional church activities correspondends to measuring soft drink consumption by Coke sales, because you miss a lot of substitution effects (what if people start drinking 7-Up instead of Coke?).

Another way to do it is to ask respondents to rate themselves on, say, a ten-point scale. This leads to two problems:

  • People might rate themselves compared to their friends, not the population as a whole. This is a problem exactly because people select into social groups consisting of people like themselves. If you’re very religious, it is much more likely that you’ll have a friend that is even more religious than you than it is to know such a person if you’re practically areligious.
  • Religion is a contested contest. When studying religion, you very often run into groups who claim not to be religious, but clearly seem to be religion in some sense. Maybe this problem can be alleviated by terming it “spirituality”.

Yet another way of thinking about measuring it is in terms of cognitive activation. One should expect that very religious persons activate the cognitive religion domain more often, and that it is more salient in their world-interpretation. One would have to measure this in asking questions the explore, say, the religious connotations of ill health or related measures.