Monday, November 23, 2009

The Teaching Moment From The CRU Data Leak

I care about the data, its corroboration, transparency, and the validity of models. I won't touch the role of consensus in investigatory science - Michael Crichton covered this better than I could, here:

Let's be clear: the work of science has nothing whatever to do with consensus. Consensus is the business of politics. Science, on the contrary, requires only one investigator who happens to be right, which means that he or she has results that are verifiable by reference to the real world. In science consensus is irrelevant. What is relevant is reproducible results. The greatest scientists in history are great precisely because they broke with the consensus.
With the "ClimateGate" incident rippling out from the CRU data leak, people on the skeptic side are having a field day assaulting the bloody barricades of the professional credibility of a government sponsored climate research team.

I'm not. I know what it's like to merge data sets together to build a complete picture. I have empathy for what the people who're working at Hadley were doing; it's a lot of tedious, mind numbing work, prone to error and revision and re-doing. It's work that makes sitting down to file your taxes look like playing Sudoku for fun.

I can build a plausible chain of intent for everything the Hadley crew did. At no point does it involve "And now, now we shall perpetrate FRAUD!" with maniacal glee. I don't need to throw stones; they're getting enough to build a patio hurled at them.

I'd like to express an opportunity. We have a chance to educate on the fundamental science, rather than proselytize in sound bites. That window will close once this gets mulched in the news cycle, and exploiting that window is more important than smearing people. It's time to put away petty revenge, and teach.

We have a window where we can get an open, clear, and public debate on the climate going. In an effort to preserve their reputations, the Hadley people will eventually realize that their best effort is to present ALL the data, noise and all, and their methods of filtering it.

We need to let them have that opportunity. And then we need to establish that any data set that's out there, and the methods that are used to filter it, massage it, demonstrate how it works, are documented. I would recommend that we talk to the folks at code repositories like SourceForge about versioning information and data set check in.

That it takes FOIA requests to get data sets and methodologies released is a travesty in climate science. I propose that policy decisions can only be made based on data sets that are published with an open source or creative commons license, such as Creative Commons Attribution-No Derivative, and that the source code of all statistical tools and all statistical methods used to interpret and analyze them be made open as well. We're all being asked to pony up; shouldn't we allow informed citizens to see what it is they're buying with their tax dollars?


  1. Just a few points, but there seems to be some impression that most, if not all, data and models used in climate science are not publically available. This is utterly untrue.

    The vast majority of data, such as temperature data, being used is open and freely available to anyone who cares (See NASA, NOAA, etc). In the specific case of CRU (and some other research institutes also not in the US), some of the data is from National Meteorological Offices who have a remit to commercialize their data, and hence will only let scientists use it if they agree to keep it proprietary.

    While I agree this is bad and unneeded on the part of Met Offices, it is undeniable that:
    a) Proprietary data is only a subset of that used by CRU
    b) This data is either used in a proprietary manner or it cannot be analysed by scientists at all.

    So the question is: In the (small number) of cases where data cannot be released, because the owners of the data will not allow it, should any science be done on that data?

    Maybe you think not, but I think there is some value in it. Should they allowed to be published after peer review? I still think yes, as people can attempt to verify the results with independent datasets (very valuable!) even if they can't recheck the data themselves. (But again note that the Met Office is likely to be willing to let other scientists see the data as long as they agree to similar terms)

    To class this as "taking FOIA requests" doesn't really acknowledge the realities of the situation. In this instance the FOI requests were denied specifically because of this commercial agreement on the data.

    The above is what I believe to be true. Now for a little opinion: The demand for data, models and more openness is a canard by certain people who attempt to hunt for the equivalent to gaps in the fossil record in the evolution argument.

    The GISS data and model are right there, open ready for anyone to look at and critque. How many of the FOIAers have produced critisim of the methodology and data based on this openness?

  2. M, my understanding is that Steve McIntyre and co. have been doing exactly that, where they can. And they have found problems which have later been corrected as a result of their criticism.

    I think the code used to make the hockey stick graphs has not been available. and this is particularly of interest because it is where the claim that the latest warming is unprecedented comes from.

    The GISS model is of less interest possibly because it is one of a class of models that don't seem to make very good predictions. Nonetheless it would be interesting to see if anyone has analysed that code to see to what extent claims hold up that these models are tweaked until they hindcast correctly.

  3. McIntyre has commented on and used GISTEMP data, and pointed out corrections which have been taken in.

    My position remains: If the data sets, methods of analysis and source code are not open source, we should not be using them for public policy generation.

    The way to avoid being besieged by requests for data is to put the data out there. Give up the gate keeper role. Open a dialog.

    When independent attempts to recreate the MBH'98 and '99 studies cannot get the same results by the methods given, using the data given, the MBH '98 and '99 studies should NOT be used in the IPCC 4AR of 2007 (where they showed up in chapter 6 - after the Wegman analysis pointed out they couldn't be replicated, nor could North for the National Academy of Sciences.) covers this, with excerpts from the paper, the question and response, and the assessment from North are all covered.

    I'll be doing a post tomorrow on data sources, signal, noise, and relative weighting on data proxies and instrument information.

  4. Having done this sort of thing for a living... I think I agree with Mann. This code is you LIVELIHOOD. You don't just give it to every Tom, Dick, and Harry. There is also a great resistance to give it to a dishonest hack like McIntyre.

    However Ken.... you make some really good points... I'll think about it.

  5. It is rather difficult to have an honest discussion on the science of global warming when prominent scientists publicly deny the evidence of climate stability and global cooling, that they acknowledge in private.
    When the director of CRU UEA literally hopes the MET UK forecast of climate stability until at least 2020 is 'wrong'?
    How does one begin an honest dialogue with multiple million dollars of research funding at stake?

    If the director refuses to publicly acknowledge global cooling/climate stability for the past decade, due to a feared backlash within his own scientific community, there is not much room left for 'honest debate'.