A number of news stories claim that a recent

paper by Lovejoy proves that the probability that the warming of the past century is entirely due to natural causes is less than one percent. I find the conclusion plausible enough, but, so far as I can tell, there is no way that it can be derived in the way Lovejoy is said to have derived it.

The first problem, probably the fault of the reporters not of Lovejoy himself, is the misinterpretation of what the confidence result produced by classical statistics means. If you analyze a body of data and reject the null hypothesis at the .01 level, that means that if the null hypothesis is true, the probability that the evidence against it would be as strong as it is is less than .01—the probability of the evidence conditional on the null hypothesis. That does not imply that the probability that the null hypothesis is true given that the evidence against it is that strong is less than .01—the probability of the null hypothesis conditional on the evidence. The two sound similar but are in fact entirely different.

My standard example is to imagine that you pull a coin out of your pocket, toss it without inspecting it, and get heads twice. The null hypothesis is that it is a fair coin, the alternative hypothesis that it is a double headed coin. The chance of getting two heads if it is a fair coin is only .25. It does not follow that, after getting two heads, you should conclude that the probability is .75 that the coin is double headed. For previous discussions of this issue see

this one in the contest of

*World of Warcraft* and

this in the context of DNA analysis of mummies.

The second problem is that, so far as I can tell, there is no way Lovejoy could have calculated the probability that natural processes would produce 20th century warming from the data he was using, which consisted of a reconstruction of world temperature from 1500 to the present. The paper is sufficiently complicated so that I may be misinterpreting it, but I think his procedure went essentially as follows:

Assume that changes in global temperature prior to 1880 were due to random natural causes. Use the data from 1500 to 1875 to estimate the probability distribution of natural variation in global temperature. Given that distribution, calculate the probability that natural variation would produce as much warming from 1880 to 2008 as occurred. That probability is less than .01. Hence reject the assumption that warming from 1880 on was entirely due to natural causes at the .01 level.

The problem with this procedure is that data from 1500 on can only give information on random natural processes whose annual probability is high enough so that their effect can be observed and their probability calculated within that time span. Suppose there is some natural process capable of causing a global temperature rise of one degree C in a century whose annual probability is less than .001. The odds are greater than even that it will not occur even once in Lovejoy's data. Hence he has no way of estimating the probability that such a process exists. The existence of such a process would provide an explanation of 20th century warming that does not involve human action. So he cannot estimate, from his data, how likely it is that natural processes would have produced observed warming, which is what he is claiming to do. 20th century warming would, in that case, be what Taleb refers to as a

Black Swan event. If one swan in a thousand is black, the observer looks at five hundred swans, finds all of them white, and concludes, incorrectly, that the probability of a black swan is zero.

How does Lovejoy solve that problem? If I correctly read the paper, the answer is:

Stated succinctly, our statistical hypothesis on the natural
variability is that its extreme probabilities ... are
bracketed by a modified Gaussian...

In other words, he is simply assuming a shape for the probability distribution of natural events that affect global climate. Given that assumed shape, he can use data on the part of the distribution he does observe to deduce the part he does not observe. But he has no way of testing the hypothesis, since it is a hypothesis about a part of the curve for which he has no data.

If I am correctly reading the paper—readers of this post are welcome to correct me if they think I am not—that means that Lovejoy has not only not proved what reporters think he has, he has not proved what he thinks he has either. A correct description of his result would be that the probability that natural processes would produce observed warming, *conditional on his assumption about the shape of the probability distribution for natural processes that affect global temperature*, is less than .01.

One obvious question is whether this problem matters, whether, on the basis of data other than what went into Lovejoy's paper, one can rule out the possibility of natural events capable of causing rapid warming that occur too infrequently for their probability to be deduced from the past five hundred years of data. I think the answer is that we cannot. The figure below is temperature data

deduced from a Greenland ice core. It shows periods of rapid warming, some much more rapid than what we observed in the 20th century, occurring at intervals of several thousand years. The temperature shown is local not global—we do not have the sort of paleoclimate reconstructions that would be needed to spot similar episodes on a global scale. But the fact that there are natural sources of very rapid local warming with annual frequency below .001 is an argument against ruling out the possibility that such sources exist for global warming as well.