andreas baumann, numbers guy.

statistics, religion, game theory, sociology.

Month: February, 2013

What we should teach and you should learn.

Normally, the stats curriculum taught to social scientists tend to emphasize inferential techniques such as analysis of variance and regression over descriptive and dimensionality-reducing techniques such as factor analysis or cluster analysis.

Sooner or later, we’re going to have to change this.

The future of data analysis – whether in the natural sciences, computer science or social science – is dealing with “Big Data” (or – as I’ve heard it called – organic data, as opposed to the designed data of a RCT). Normally, when we deal with these complex sets, we have a very large (both in terms of data points and dimensionality) set of informations, and most of the measures contained in this set are very bad measures of the latent traits we’re really interested in. For this reason exactly, it’s going to be more and more important for us to be able to extract the latent traits from the data.

For this reason, I think that majors in sociology, political science, etc. should move towards emphasizing

  • Data harvesting in real life: how to extract the information you’re interested in from a social media or a database,
  • Data processing: How to adapt the data set for use in an investigation.
  • Dimensionality reduction: How to get from the manifest variables in the data set to the latent variables you’re interested in.

Most of this should come as an “add-on” for the regression courses currently offered, not as a substitution. Indeed, maybe a shared foundation in introductory linear algebra is the way to tie these two sets of statistics courses together.

 

Advertisements

Cui bono? Or: context matters.

The non-steroidal anti-inflammatory drug (NSAID) diclofenac has come under some flak recently, since a recent paper showed that it has a risk profile similar to the NSAID rofecoxib (brand name Vioxx), which was withdrawn from circulation for this reason exactly.

NSAIDs function by inhibiting two enzymes, COX-1 and COX-2. Inhibition of COX-1 is typically correlated with stomach complaints, and for this reason, research into selective COX-2 inhibitors have been conducted. However, research findings such as the ones revealed in the paper referenced seem to indicate that selective COX-2 inhibitors is related to increased risk of cardiovascular events. For this reason, the authors recommend that doctors should prescribe non-selective COX inhibitors, and that countries should consider banning diclofenac.

The problem with the paper referenced is that it fails to consider differential applications of NSAIDS. The two major user groups of these drugs are people engaging in sports (and suffering related injuries) and people suffering from arthritis.

Many studies – including the meta-analysis in the paper referenced above – indicate an increased risk of ca. 40% of cardiovascular events when using selective COX inhibitors. For this reason, the authors recommend banning drugs such as diclofenac.

However, while I applaud the authors’ commitment to revealing this increased risk, it is only a problem for one of the user groups, namely, the arthritis sufferers, where increased cardiovascular risk is very prevalent. For the user group consisting of athletes and athletic people – a group with very low levels of cardiovascular risk – the impact of a 40% increase in cardiovascular risk is very low. On the other hand, the adverse effects relating to stomach complaints affects this group as well, and banning selective inhibitors would therefore lead this group to suffer increased adverse effects without a practically significant reduction in risk.

Summa summarum: the adverse effects of drugs cannot be evaluated without considering differential impacts on different user groups. For this reason, the selection of non-selective COX inhibitors for the elderly and selective inhibitors for otherwise healthy people with low cardiovascular risk is – in my opinion – better administered in the dispensation part of the system instead of the regulation part. Physicians can – and should – make contextual decisions which perform better than catch-all decisions in terms of individual welfare.

A sociological defence of the free market.

In my opinion, there are two streaks of defence for the free market: an economical approach that tends to emphasize the optimal allocation of ressources and a philosophical approach emphasizing the emergence of the market from the natural rights of people.

But what are the sociological arguments in favor of the free market? Some of the classical sociologists were not terribly fond of modernity and capitalism: think of Weber’s famous work on the relation between Protestantism (Puritanism) and the rise of capitalism:

Denn indem die Askese aus den Mönchszellen heraus in das Berufsleben übertragen wurde und die innerweltliche Sittlichkeit zu beherrschen begann, half sie an ihrem Teile mit daran, jenen mächtigen Kosmos der modernen, an die technischen und ökonomischen Voraussetzungen mechanisch-maschineller Produktion gebundenen, Wirtschaftsordnung erbauen, der heute den Lebensstil aller einzelnen, die in dies Triebwerk hineingeboren werden – nicht nur der direkt ökonomisch Erwerbstätigen –, mit überwältigendem Zwange bestimmt und vielleicht bestimmen wird, bis der letzte Zentner fossilen Brennstoffs verglüht ist. Nur wie »ein dünner Mantel, den man jederzeit abwerfen könnte«, sollte nach Baxters Ansicht die Sorge um die äußeren Güter um die Schultern seiner Heiligen liegen389. Aber aus dem Mantel ließ das Verhängnis ein stahlhartes Gehäuse werden. Indem die Askese die Welt umzubauen und in der Welt sich auszuwirken unternahm, gewannen die äußeren Güter dieser Welt zunehmende und schließlich unentrinnbare Macht über den Menschen, wie niemals zuvor in der Geschichte. Heute ist ihr Geist – ob endgültig, wer weiß es? – aus diesem Gehäuse entwichen. Der siegreiche Kapitalismus jedenfalls bedarf, seit er auf mechanischer Grundlage ruht, dieser Stütze nicht mehr. (Weber, “Die Protestantische und der Geist des Kapitalismus”, Mohr-Verlag, 203-4).

Likewise, Durkheim was famously cautious about how society was meant to stick together in the modern, industrialised world – the question of social solidarity.

This is also why some conservatives are inherently sceptical towards the market; capitalism is inherently anomic. But this is also what we should emphasize in evaluating capitalism – it has an enormous power in empowering the actor.

Most human societies were massively dominated by the structures they contained. This is what we typically think about as when we adress the lack of social mobility in pre-modern societies. Some vocations were only open to some persons, and people could do very little to amend their situations. Furthermore, due to the level of development in society, most societies had very little variation in professions: most people lived off the land in some way or the other.

Capitalism, the division of labour and the market changed this fundamentally, by bringing about differentiation and growth; less people had to work to produce the most basic of goods, such as foodstuffs. The combination of the disjuncture between heritage and personal career (personified in the idea of the nouveau riche) and the freedom for creative work created by a more effective division of labour meant that an enourmous amount of people were empowered to create their own lives.

In my view, that is the basis of the sociological defence of the market.

Who are “the people”?

In Denmark, the debate is currently raging over whether to reintroduce fixed book prices for the first twelve months on the market; this would allow book sellers and the traditional publishing houses better conditions. On the other hand, it shrinks the availability of cheap crime novels and cookbooks available in supermarket stores – very possibly to the dismay of the ordinary consumer.

This led the Consumers’ Council (Forbrugerrådet) to argue against the regulation of the book market from a consumer’s point of view (nb: link is in Danish). It’s rather rare to see the Consumers’ Council argue against regulation, so that alone makes the topic interesting.

There is some evidence pointing to the Council being justified in it’s arguments; market constraints tend to lead to less adaptability between the demand and supply sides of the market. In that sense, regulation can harm consumers, because it reduces the necessity of publishers to adjust production to the market demand.

However, what is misguided in this consideration is the fact that it only considers current consumers. One thing that is remarkable about books in contrast to almost any other product is that they tend to last forever – not the physical copies, but the content. Right besides my computer I have the collected works of Shakespeare, who died almost 400 years ago – and yet his work lives on.

Considerations involving future generations are not uncommon in politics; for example, they’re very common in considering environmental impact. When we perform cost-benefit analyses of long-run impacts, we typically introduce an intertemporal discount rate – how much weight should we place on events in the future versus effects now? This has a lot of reasons, one of the main reasons being uncertainty, that is, we weigh events taking place in a thousand years less than those that would take place tomorrow, ceteris paribus, because our uncertainty is necessarily greater for the more distant events ^1.

In the above case, the book market was analysed under complete discounting; that is, that no future benefits could be considered. However, I believe that this is wrong. Books are culture, that is, there should a concern for the eternal, not the temporal, when regulating the market.

This also explains why the Danish Conservatives are for this regulation: a central idea in conservative ideology is the idea of the generational contract. This is the idea that since we received this society from our parents, we too have a duty to surrender to our children in good condition. This is a common argument behind conservative conservationism.

I’d argue that this idea can be extended to the cultural domain: we need to support books that aren’t necessarily profitable, but might have lasting cultural impacts. And actually, I’d argue that the fixed book prices are a good way to do this: based on historical evidence, they should increase the willingness to take risks by publishers, without political “winner-picking” in the production of cultural goods.

^1) One of the main criticisms against Bjørn Lomborg was that he set the intertemporal discount rate much higher than others working within environmental economics.

Statisticians a…

Statisticians and Computer Scientists have done a pretty poor job of thinking of names for procedures. Names are important. No one is going to use a method called “the Stalin-Mussolini Matrix Completion Algorithm.” But who would pass up the opportunity to use the “Schwarzenegger-Shatner Statistic.” (Larry Wasserman)

Sampling frames matter.

Sampling is the basis for all survey research, and correspondingly, for a lot of research in the social and biomedical sciences. However, how to sample is not necessarily pondered at the level necessitated by the research question. In the below post, I illustrate this problem.

A suggested sampling frame.Let’s say we want to investigate some quantity X in the population – we’re interested in the entire population. We want a representative sample for a number of reasons (hint: CLT), and therefore we adopt the sampling frame suggested below:

Assumptions:

  • Every household has one and only one landline telephone.
  • All landline telephone numbers are listed in the phone registry.

Procedure

  • Pick a number at random from the phone registry.
  • Dial that number, and if the phone is picked up, ask to talk to the person in the household, whose birthday is up next.
  • Repeat until desired sample size is achieved.

Sounds pretty random, doesn’t it – at least under the assumptions mentioned? Take a moment to think about it.

The point is, of course, that this frame isn’t random. You’re sampling every person with a probability inversely proportional to the number of persons in their household. But bad can that be?

A simulation
Let’s say that we’re hired by a municipality interested in knowing to what degree people use the public pools. In a municipality of 120,000 people, the household distribution is that 57% of households contain one person, 29% two persons and 14% four persons ^1.

Persons living in single-persons households tend to be students, young adults and the elderly. They don’t frequent the pools that much: 50% of them don’t visit the pool at all, while the other half only visit the pools once a year.

Persons in two-person households are young pairs and pairs, where the children have moved out. While the first category don’t really go to the pool, the other category tends to be heavy users. 25% never go to the pool, 25% go to the pool once a year, 25% go five times a year and 25% go eight times.

Persons in the four-person household categories are families; they use the public pools extensively. 25% of them go there five times a year, 25% of them go there six times a year and 50% go there eight times a year.

This of course gives us a mean of 3.58 with a standard deviation of 3.25.

Let’s draw a simple random sample of these people (n = 1000) and look at the results. We find a mean of 3.53 and a 95% CI of [3.33;3.73] – not bad.

However, if we sample according to the scheme above, we would have found a mean of 2.24 and a 95% CI of [2.06;2.42] – rather far from the true mean, which isn’t contained within the CI.

In sum: sampling frames matter. And they matter more than you think.

^1) I’m making all these numbers up.

The US economy is both overregulated and underregulated at the same time.

The US economy is both overregulated and underregulated at the same time.

Nothing is ever simple.

Assessing risk, vol I: Doctors’ orders.

Humans seem to be immutably bad at assessing risk. Some risks are routinely underestimated, while others routinely are overestimated. People tend to differ, not only with respects to what risks they’re willing to take, but also to how they assess these risks.

Think about an offer of some cash reward for a one in a thousand chance of certain, immediate death. How much money would you take for that risk? When you ask people about this, many of them wouldn’t consent to a million dollars for a one-in-a-million-chance, despite the fact that those of us who regularly bike around in large, trafficked cities take far more intensive risks everyday (and with a lot lower compensation!).

It’s not that we’re unaware of the risks associated with everyday behaviour; they might not be very salient to us, but nonetheless we’re aware of them. People don’t smoke because they’re unaware of the numerous risks associated with this; they just disregard these risks (see for example CMAJ 2000)^1.

But, this is all a question of habit. What’s more interesting is that even professionals relatively often display a limited understanding of the distribution of risk within their own field of expertise ^2. One such example may be doctors and their evaluation and recommendations on provoked (medically indicated) abortions in response to teratogenic risks.

A teratogenic risk is a risk to the føtus under pregnancy. A lot of things poses teratogenic risks – smoking, drinking larger amounts of alcohol, some pharmaceuticals.

What’s interesting about this is – in surveys of physicians, it has been found that they tend to overestimate the teratogenic risk and recommend abortion more often than would be indicated under the current best assessment of the risk (Amer. J. Roentgenol. 2004; Teratology 1992; Pharmacy Practice 2008).

Why do the physicians overestimate these risks? One reason might be that they prefer to err on the side of caution: rather than face a patient given birth to a handicapped child (and the physician maybe facing a malpractice lawsuit for not advocating an abortion), they recommend that the patient have an abortion and begin a new pregnancy. If this is the case, one should expect the rate of overestimating teratogenic risk to be highest for drugs particularly potent in the first trimester because of the time discounting invoked^3. That, of course, invokes rational considerations on the part of the physician – which could be a debatable frame. I haven’t had the chance to assess whether or not the overestimation of teratogenic risk in the surveys referenced conform to this pattern.

However, another reason for overestimating might be that the same cognitive bias as with the example in the bet of one million dollars against a tiny chance of certain death: we tend to be more risk-averse to behaviour that we don’t know than to behaviour we face every day, because we update our risk models based on our experience. This would explain why people seem to downplay the adverse health effects of acetaminophen (Tylenol / Paracetamol) for pregnant mothers, while grossly overestimating the teratogenic and føtotoxic effects of cocaine use in pregnant mothers.

^1) Why do people then smoke? Maybe they lie when surveyed and they really don’t believe that smoking poses these risks. Maybe they don’t have the character to quit. Maybe they discount their future risk against present utility and choose to smoke. Or maybe, they’re just stupid.
^2 ) One obvious field is of course investors, but I feel that the example offered above gives a better illustration.
^3) It’s generally considered more invasive and traumatic to abort late in the pregnancy than early in the pregnancy.

Institutional heritage in sports.

The Baltimore Ravens just won the Super Bowl. I only watched the start of the game – American football doesn’t really interest me – but I picked up that their victory was unexpected. I suspected as much, because I haven’t ever heard of the Ravens, contrary to the Steelers or the Packers.

I read the Wiki page on the Super Bowl, and one thing I noticed is the rather even distribution of victories amongst teams, compared with sports such as the English championship in football (currently the Premier League), where Man U has won ten times in the least twenty years.

What factors determine the amount of institutional heritage in sports? I’ve no idea – do you?

Why registration of studies matter.

Recently, I linked to a petition, asking for signatures for the implementation of fixed protocols in clinical studies, such that all studies must be registered and all results reported, to avoid the “publication bias” problem, in which people only report findings that conform to some scheme of interest.
This matters for a number of reasons – it taints this wondrous human endeavour that we call science, but also for more pragmatic reasons, such as the fact that private medico firms use a lot of ressources trying to replicate scientific findings that might be the result of chance. To illustrate this, I’d like to tell you about two statistical concepts: significance and power (i).

A significant finding is one, which under some criterion is unlikely under the null hypothesis. Think about comparing two means m_1 and m_2, with the difference D = m_1-m_2. The null hypothesis is D = 0. This could be the height of girls and boys, and you wonder whether there is a systematic difference in their mean heights.
The thing is, when comparing their mean heights, you want to have some idea about whether or not the difference stems from random variation. Say, you could have sampled the tallest boys and the smallest girls by chance. The significance of your findings models this the assumptions of the null hypothesis – i.e., that there is no difference in reality. You run a test, and you find that the significance (p-value) of D is 0.04. This means that the chance of seeing a value of D as extreme (size of the absolute value) as this is 4%.
One thing you should know about significance: under the null (in a correctly designed RCT), it’s uniformly distributed on [0;1] – meaning that every value has the same point probability. This leads to the fact that even when there is no effect (no difference in mean heights, in the example above), one would still find a significant result from time to time by chance alone.
Power is not quite as intuitively understood, in my opinion, but it relates to the ability to detect an effect of a certain size: when you’re looking for a small effect, you need to look at more data points than if you’re looking at a large effect. Normally, in designing tests, you have a trade-off between significance and power: if you want to be very sure that you’re not picking up results by chance, you lose some of your ability to detect small effects. Think about this for a moment.The problem with clinical trials is the fact that if you test, say, 50 candidate drugs under a 5% significance criterion, you end up with a 92.3% chance of finding a significant effect of one of the drugs (0.923 = 1-0.95^50). This is itself not problematic: it’s a function of how our probability-based modeling of the real world, and things can be done about this (ii).The problem is: you’re the researcher in question, and you report the one drug that you found had a positive effect, without reporting the others. Why should you? They didn’t turn up significant results, did they? Then, your candidate drugs is picked up on, and more extensive testing is performed with larger sample sizes, etc. Maybe they can’t replicate your result, because it was the result of a multiple comparison problem.

Furthermore, when more research is performed into a field, you’ll see effect sizes shrink: because the pilot studies tend to be smaller than subsequent studies, they detect effects in the right tail of the effect distribution (iii). Having some central forms of registration for studies allow us to ponder all the evidence together: studies showing a significant effect, and studies not showing any effect. This allows practitioners to choose better options for patients, and it allows medico firms to concentrate their efforts on drugs where the data do seem to indicate promising aspect, instead of basing their efforts on statistical artifacts. A wise man proportions his belief to the evidence, as David Hume said. Let’s act as wise men and not proportion our belief to those bits of evidence that seem convenient to us.

 

(i) These are very elementary introductions. If you’re interested in a more rigourous approach, consult an introduction to mathematical statistics, such as Hogg, McKean & Craig. (ii) What to do about multiple comparisons problems is a matter of discussion: a modern response is to use a multilevel Bayesian modeling of the estimates.
(iii) One thing that is interesting is that research into the supermorbidity and -mortality of smokers do not seem to exhibit this pattern of shrinking effect sizes, which might suggest that studies don’t control for enough covariates.
http://www.amazon.com/Introduction-Mathematical-Statistics-Robert-Hogg/dp/0130085073