"Climategate"—Forget the Emails: What Will the Hacked Documents Tell Us?

|

Last week thousands of emails and documents were hacked from the British Climate Research Unit (CRU) at the University of East Anglia. The emails revealed a lot of scientific nastiness and some efforts to suppress climate research with which CRU leaders disagreed.

But I've been wondering what insights about how climate data is put together by CRU researchers the hacked documents and models might hold? I assume that climate catastrophe skeptics are analyzing the CRU documents now. In one very preliminary analysis, British statistician William Briggs is not impressed by the CRU climatologists' statistical acumen: 

How do we know the temperature?

We have no direct observations of temperature—neither at the Earth's surface or in the atmosphere—for the vast majority of history. So how can we know what the temperature used to be in the absence of actual measurements?

We can't. We can only guess.

That's right, all those squiggly-line pictures of temperature you see from before roughly 1900 are reconstructions; the lines are what is spit out of statistical models. They are therefore merely guesses. Even stronger, we have no way to know exactly how good these reconstructions are. If we did, then, obviously, we would know the actual temperatures, because the only way to know the actual model error is to compare the model's predictions against the real data (which we don't know).

To emphasize: the actual—as opposed to theoretical model—error is unknown. But we must try and estimate this error—it is of utmost importance—otherwise we cannot make decisions about the reconstructions….

CRU document

One example from something called a "SOAP-D-15-berlin-d15-jj" document. A non-native English speaker shows a plot of various proxy reconstructions from which he wanted to "reconstruct millennial [Northern Hemisphere] temperatures." He said, "These attempts did not show, however, converge towards a unique millennial history, as shown in Fig. 1. Note that the proxy series have already undergone a linear transformation towards a best estimate to the CRU data (which makes them look more similar, cf. Briffa and Osborn, 2002)."

In other words, direct effort was made to finagle the various reconstructions so that they agreed with preconceptions. Those efforts failed. It's like being hit in the head with a hockey stick.

Briggs then goes on to list nine possible sources of error that could arise in reconstructing proxy temperature data and finds that only one was properly accounted for by CRU researchers:

Sources of reconstruction uncertainty

Here is a list of all the sources of error, variability, and uncertainty and whether those sources—as far as I can see: which means I might be wrong, but willing to be corrected—are properly accounted for by the CRU crew, and its likely effects on the certainty we have in proxy reconstructions:

  1. Source: The proxy relationship with temperature is assumed constant through time. Accounted: No. Effects: entirely unknown, but should boost uncertainty.
  2. Source: The proxy relationship with temperature is assumed constant through space. Accounted: No. Effects: A tree ring from California might not have the same temperature relationship as one from Greece. Boosts uncertainty.
  3. Source: The proxies are measured with error (the "on average" correlation mentioned above). Accounted: No. Effects: certainly boosts uncertainty.
  4. Source: Groups of proxies are sometimes smoothed before input to models. Accounted: No. Effect: a potentially huge source of error; smoothing always increases "signal", even when those signals aren't truly there. Boost uncertainty by a lot.
  5. Source: The choice of the model m(). Accounted: No. Effect: results are always stated the model is true; potentially huge source of error. Boost uncertainty by a lot.
  6. Source: The choice of the model m() error term. Accounted: Yes. Effect: the one area where we can be confident of the statistics.
  7. Source: The results are stated as estimates of ? Accounted: No. Effects: most classical (frequentist and Bayesian) procedures state uncertainty results about parameters not about actual, physical observables. Boost uncertainty by anywhere from two to ten times.
  8. Source: The computer code is complex. multi-part, and multi-authored. Accounted: No. Effects: many areas for error to creep in; code is unaudited. Obviously boost uncertainty.
  9. Source: Humans with a point of view release results. Accounted: No. Effects: judging by the tone of the CRU emails, and what is as stake, certainly boost uncertainty.

There you have it: all the potential sources of uncertainty (I've no doubt forgotten something), only one of which is accounted for in interpreting results. Like I've been saying all along: too many people are too certain of too many things.

I hope that a lot of independent researchers will be taking close looks at the CRU documents to check on the accuracy of their interpretations of climate data. Of course, this wouldn't be an issue if climate researchers had made their data publicly available in the first place.

Whole Briggs analysis here.

Quick Addendum: It turns out my hopes for independent analysis are being fulfilled. Over at CBS News, correspondent Declan McCullagh reports that independent programmers are now looking into CRU code and finding some pretty disturbing things:

In addition to e-mail messages, the roughly 3,600 leaked documents posted on sites including Wikileaks.org and EastAngliaEmails.com include computer code and a description of how an unfortunate programmer named "Harry"—possibly the CRU's Ian "Harry" Harris—was tasked with resuscitating and updating a key temperature database that proved to be problematic. Some excerpts from what appear to be his notes, emphasis added:

I am seriously worried that our flagship gridded data product is produced by Delaunay triangulation—apparently linear as well. As far as I can see, this renders the station counts totally meaningless. It also means that we cannot say exactly how the gridded data is arrived at from a statistical perspective—since we're using an off-the-shelf product that isn't documented sufficiently to say that. Why this wasn't coded up in Fortran I don't know—time pressures perhaps? Was too much effort expended on homogenisation, that there wasn't enough time to write a gridding procedure? Of course, it's too late for me to fix it too. Meh.

I am very sorry to report that the rest of the databases seem to be in nearly as poor a state as Australia was. There are hundreds if not thousands of pairs of dummy stations, one with no WMO and one with, usually overlapping and with the same station name and very similar coordinates. I know it could be old and new stations, but why such large overlaps if that's the case? Aarrggghhh! There truly is no end in sight… So, we can have a proper result, but only by including a load of garbage!

One thing that's unsettling is that many of the assigned WMo codes for Canadian stations do not return any hits with a web search. Usually the country's met office, or at least the Weather Underground, show up – but for these stations, nothing at all. Makes me wonder if these are long-discontinued, or were even invented somewhere other than Canada!

Knowing how long it takes to debug this suite—the experiment endeth here. The option (like all the anomdtb options) is totally undocumented so we'll never know what we lost. 22. Right, time to stop pussyfooting around the niceties of Tim's labyrinthine software suites—let's have a go at producing CRU TS 3.0! since failing to do that will be the definitive failure of the entire project.

Ulp! I am seriously close to giving up, again. The history of this is so complex that I can't get far enough into it before by head hurts and I have to stop. Each parameter has a tortuous history of manual and semi-automated interventions that I simply cannot just go back to early versions and run the update prog. I could be throwing away all kinds of corrections—to lat/lons, to WMOs (yes!),: As the leaked messages, and especially the HARRY_READ_ME.txt file, found their way around technical circles, two things happened: first, programmers unaffiliated with East Anglia started taking a close look at the quality of the CRU's code, and second, they began to feel sympathetic for anyone who had to spend three years (including working weekends) trying to make sense of code that appeared to be undocumented and buggy, while representing the core of CRU's climate model.

One programmer highlighted the error of relying on computer code that, if it generates an error message, continues as if nothing untoward ever occurred. Another debugged the code by pointing out why the output of a calculation that should always generate a positive number was incorrectly generating a negative one. A third concluded: "I feel for this guy. He's obviously spent years trying to get data from undocumented and completely messy sources."

Programmer-written comments inserted into CRU's Fortran code have drawn fire as well. The file briffa_sep98_d.pro says: "Apply a VERY ARTIFICAL correction for decline!!" and "APPLY ARTIFICIAL CORRECTION." Another, quantify_tsdcal.pro, says: "Low pass filtering at century and longer time scales never gets rid of the trend—so eventually I start to scale down the 120-yr low pass time series to mimic the effect of removing/adding longer time scales!"

Go here for McCullagh's report.

Hat tip to my colleague Peter Suderman for the McCullagh link.