In my last post, I discussed how to test a coin for fairness – a common topic in introductory probability theory classes. Commonly, fairness is defined as . So far, so good.

However, consider the following Stata code

set obs 2000

egen x = fill(1 0 0 1 1 0 0 1)

which generates 2000 observations following the {1, 0, 0, 1,…} pattern. Then, we call easily generate a lagged variable and calculate the autocorrelation

gen lag_x = x[_n-1]

pwcorr x lag_x, sig

We find that x satisfies and $\latex \mathrm{Cov(x_i, x_{i-1})} = 0$ (or rather, we do not find evidence to reject these two conditions). However, clearly the pattern isn’t random. The human mind spots patterns in data very easily, which is one of the (many, many reasons) you should always graph your data.

Actually, if we consider the 2-tuples instead, we find something funny. First of all, the 2-tuples lead themselves very easily to interpretation as binary representations. Secondly, when done so, the linear correlation between and is equal to -1!. In other words, while there is no first-order correlation, the chain can be completely specified from a second-order function!

Certainly, this example is trivial. The human eye is indeed very keen to pick out such sequences. But what about a 237th-order generating function? Should our definition of randomness require that there exists no Xth order generating function for the sequence we’re talking about?

1) I find justification in this translational procedure because the translation is bijective, and – as such – there is direct (machinal) translateability between sequence 1 and sequence 2.