Friday, November 27, 2009

More Contributors, an Open Source Model and Data Source

Thanks to a poster (G-Man), I've been pointed to UCAR's CCSM3.1 data set and source code.

I've brought on Larry Ramey (and am hoping to bring on Eric Raymond) to give a walk through of what that software does and what its data set means. Ideally with both of them compiling it, running it and cross checking each other.

Larry and Eric are both pretty blunt at what they consider bovine scatology; neither of them is *likely* to be posting, but both will be looking over my posts before they go up. They also take opposite views on the issue....aside from saying that both sides should open source everything.

If, in the private review section, they can't agree on something, I'll probably put three posts out - my post, one summarizing Eric's objections, one summarizing Larry's objections and so on.

One of the reasons why some stuff in climatology is NOT open sourced - including data sets - is because it's funded by private sources and is very much commercial information and practice. Larry asked me what my response would be to someone who had a model that had a mixture of public and private data, some with restrictions on it.

My answer was "I'm sorry, but unless we can show all the steps along the way, we cannot run it - it's outside the purview of this blog." It is my hope that by showing that, yes, a bunch of talented amateurs can run the open source stuff and give a reasonable explanation of what each model is doing, that we can 'clear the air' a bit. Perhaps, about the time of the fifth Assessment Report, we can post an honest to god literature analysis.

That being said, there's a LOT to the UCAR modeling stuff. Indeed, there may be more there than two very smart people can independently confirm or analyze. It's a huge data set.

There will be periodic updates on the progress of that thread, but more articles here and there about what data sources are, signal and noise, just so there's something to read here.


  1. You may want to look at what Michael Tobis has to say at

  2. Interesting project. I'll chip in if I can. But I predict you'll get bogged down in the initial attempt to get it to compile and run, and then give up. Unless you can get some help from the NCAR folks, or get a few tame climatology PhD students to help out.

  3. CCSM provides a bulletin board, at

  4. Steve, I don't have a tame Ph.D student handy, but Larry Ramey has worked with some of these models before, possibly even this one. Plus, he's got a few associates at NCAR.

    I suspect that between Eric and Larry and that CCSM bulletin board, we've got about a 50-60% chance of getting the model running on at least Larry's computer, and about a 25% chance of getting it to run on Eric's in addition to Larry's. I think you'll increase the odds that we at least get it running on Larry's. Dunno if you can help us get it running on Eric's.

    I intend to follow the Jerry Pournelle Computer Column Model, and give a narrative of how we tried to get it to work.

    Hopefully, if we can get it up and running, we can get enough publicity to find a suitably patient climatologist (perhaps Ray Pierrehumbert might be willing, if we can get the software to run.) to help us understand the workings of the tools.

    If we can get it to run, the project morphology is to say "Hey, look, we managed to do it, this is how it worked, this is what we learned, this is what we still have our doubts on. Now go do what we did.".

  5. Oh if you're looking for more data, I found something while browsing some PDFs in the CRU data.

    On page 3 of FOIA/documents/MannHouseReply.pdf it lists as note 2 a number of URLs that Mann states are "data, descriptions of methods, and results related to my research". The URLS are :-

    Hope that helps.

  6. KB: "One of the reasons why some stuff in climatology is NOT open sourced - including data sets - is because it's funded by private sources and is very much commercial information and practice."

    Ken - Your comment above is based on a misunderstanding. Some *meteorological* datasets are not open because they are developed by national weather forecasting services who get part of their operational funding from commercial weather services. They tend to have licenses that allow academics to use the the data for free, but restrict use by all others. That's the issue with some of the CRU data.

    *Climatology*, in contrast, is nearly always publicly funded research. There are almost no existing commercial climate services yet. So, it's either open source already, or it's not available because the climate modeling centres never realised anyone would be interested. They will all let you have the code if you want to do bona fide climate research with it, although most of them want you to sign some sort of license agreement. The main exception is the UK Met Office Hadley Centre, who build both climate models and operational weather forecasting models from the same source code, and so have some reason to protect their commercial interests. But they'll still let you have the code if you can convince them you're doing worthwhile research with it.

    Here's my rundown of what's available:

    By the way, if you want to get anywhere, you'll have to build a relationship of mutual respect with any climate modeling group whose code you work with. Given the attitude of some of your team, that might not be possible.

  7. Thank you for the correction on that, Steve.

    You are correct; meteorological data is very distinct from climate data.