The Apes—Points To Ponder

The statistical problem

Where scientists cannot create a closed system, they will attempt to verify a hypothesis statistically. If an experiment does not always provide the same result, but when repeated produces the result a statistically significant number of times, the hypothesis is deemed to be supported if not proven. Common examples of this are found in the field of medicine.

When searching for the cause of a particular disease, epidemiologists will conduct experiments to try to identify the responsible pathogen. If you review such experiments, you will find that there is normally a control group of people in the locality under investigation, who show no symptoms of the disease. Their health is compared to a group suffering from the disease. If the pathogen is isolated, it will normally be found, by test, to exist in the bodies of most of the infected group—but not all of them. In the control group it will be found to be absent in the bodies of most, but not all. This is a strong sign that the identified pathogen is the cause.

You might protest the fact that the pathogen cannot be isolated in every one of the infected group, and that it can be found in one or two of the “uninfected” group. But the human body is a very complex system and there can be great variability from one such system to another. The few in the uninfected group, who show signs of the pathogen may have very robust immune systems and antibodies that can cope with the pathogen. On the other side of the line, those who showed no evidence of the pathogen, but had symptoms of the disease, may have been affected by undetectable levels of the pathogen.

In any event, with epidemiology, that is merely the beginning of the story. The next steps are to proceed from these results to identify how infection by the pathogen occurs (by contagion, by insect bite, etc.) and to find ways to prevent transmission. Where such campaigns are successful it is clear that the pathogen has been nailed.

The point is that the statistics only demonstrated a correlated association. Such associations do not prove causation at all, they only indicate the possibility of causation. Nevertheless, such statistical data is often imputed to demonstrate causation, even among scientists. The fault is not in statistics itself, but in its abuse.

The book Spurious Correlations by Tyler Vigen presents many excellent and amusing examples of correlations that are clearly have no direct relation to causation. They include:

Figures from 1999 to 2009 demonstrate a 99.79% correlation between US spending on science, space and technology and US suicides by hanging, strangulation and suffocation.
Figures from 1996 to 2008 demonstrate a 95.23% correlation between Math doctorates awarded in the US and the amount of uranium stored at US nuclear power plants.
Figures from 1999 to 2009 demonstrate a 95.45% correlation between US crude oil imports from Norway and US drivers killed in a collision with a railway train.

At above 95%, all of these are very high correlations, demonstrating how slippery correlation can be in any scientific context. And yet, contemporary science cannot proceed without using statistical correlation. If a scientist can present high correlation along with a convincing explanation of why A causes B, the hypothesis is likely to be given credence. Contemporary science is obliged to walk this line.