Wednesday, December 2, 2009

Amateur Climate Modelling - Getting The Code

One of the easiest models to get at is the UCAR CCSM 3.0 model. You go to their web site (linked above), sign up for it with a login, and wait for an approval. One of the team members did this after 7 PM MST on Sunday on Thanksgiving weekend. Either there was a research assistant/post-doc working the holiday doing the validations, or the process was automatic; we got the login inside of 30 minutes. For our trembling remnants of faith in humanity, we're hoping this was automated and not some poor post-doc stuck there.

You do have to give a name, phone number and why you're downloading it. I'm not sure what would happen if I put in, say, "James Inhofe", his Congressional office number (202-224-4721), and "To prove the fraud once and for all!"...but I suspect that, as amusing as that might be, I'll leave that particular exercise in validating wingnuttery to someone else.

Now, scientists are guys who know programming to do something else. This is not the same as being a production coder, and scientists are traditionally about a decade behind the rest of the computer-science world when it comes to computer languages, largely because they're chained to legacy code and data sets that they lack the budget to uprate to modern standards. (The first person who manages to write a documented, production grade conversion filter running the myriad forms of climate data into a JSON archive will probably be offered several graduate students to do with as they please...)

Which means that scientists are even worse about documenting code than most programmers are. You need to twist arms and legs, and threaten to force them to teach English Composition, to get them to document code. (The number of graduate students I've known who've been forced to use cutting edge computers from 1992 more than 15 years after they've been obsolete is terrifying.)

You can guess our trepidation at opening the documentation. We were expecting spaghetti, intermittently documented in Sanskrit by cut and paste. We were wrong. It's in pretty decent shape for scientific code and documentation. Overall, this is a pleasant surprise, and we chalk it up to the fact that we're on a 3.0 release.

Then comes our first stopping point.

You see, the code says it's only certified to run on an IBM PowerPC machine, which is not the architecture we have. Now, this isn't exactly surprising; these guys aren't going to be testing the code on every hardware platform out there from the Commodore VIC 20 on. This isn't the sign of any sort of conspiracy, they're just documenting what THEY run it on.

However, architecture differences matter. Brief digression for people who aren't computer geeks (and likely want further affirmation that they don't want to become said):

Computers can't add, subtract, multiply or divide base 10 (the system we use). They use Base 2. (2 looks like 10, 4 looks like 100 and 5 looks like 101) This is fine for integer math 3+7=10 ALWAYS works on a computer. Its not so hot for floating point math (3.3 + 2.3 more or less = 5.6) Certain numbers do not convert well to base 2 representations. If you paid attention to the computer press in the early 1990s, you may remember the "Pentium floating point math error". That was, functionally, this problem embedded in real silicon, and to most computer scientists, it was a tempest in a thimble; none of them ever trusted non-integer math, because they'd been doing it on computers that would flip a bit when something like 10/7 would overflow. In short, computers don't handle decimal fractions well; one consequence of this is that banking software actually calculates how many pennies you have, and formats dollars and cents as the very last step, because they can avoid using floating point math at all.

Our team member doing this project tells a story about finding a bug in a terrain collision algorithm where the end result of a pair of long series of calculations was showing anomalous errors due to floating point math issues. Once it was identified, the solution was to store it as a 64 bit floating point operation, until the final calculation, and then convert it to a 32 bit number; this avoided a couple of hidden rounding error issues. There's a lot more on this subject for people who are truly interested.

Anyway - back onto the climate model, the question becomes "How hardware dependent are these models?" Which led to more digging. Basically, computers can screw this stuff up in two places - the hardware and the compiler. For the last four or five years, the hardware has rarely been a point of stress; 64 bit computers are the norm for anything purchased in the last 3 years, and there are crazy compiler tricks if you need more than a 64 bit register for your numerical operations.

Which runs into the compiler. UCAR has only validated their model on the IBM XL compiler. They've run it on MIPSPro, Cray, NEX and Compaq compilers. On the Linux side of the fence, which is what we'll probably be running this on, they've used the Portland compiler.

We don't know that g++ or Microsoft Visual C 7.1 won't work. We won't know if they will, either, though it seems likely. However, if we show up with an error on a compiler they haven't validated against, they're perfectly within their rights to say "You guys are, um. Courageous. We don't have the resources to help you. Write us if it works!"

Now, one of our team members has a strong suspicion. They're using the Portland compiler because they want a debugger that doesn't inspire postal rages and murder sprees. (One of the common rants about open source code is that it's usually documented by computer nerds for computer nerds, and that only the weak want graphical debuggers. We're weak, and want to have lives left over for petting cats and the like.)

A quick look through their forums shows evidence of people trying to compile it with g++/gfortran; this is hopeful and probably evidence that, like scientists usually do, they'd rather not be caught out promising something they can't deliver.

So the next step is to spend some time figuring out what dependencies have to be overcome to run a make with this, and to choose what compiler to use. We'd love to use the Portland compiler; if any of our readers here have an extra license and can make a donation of it, please let us know.

9 comments:

  1. The ESG account approval process isn't automated. I approved your account. And yes, I do check requests regularly between 6:30am and ~10pm MST every day.

    ReplyDelete
  2. Thanks, G-Man. Worth knowing.

    Can you make sure that Eric Raymond's request gets expedited, if he unburies himself enough to tackle one?

    ReplyDelete
  3. There's also the annual CCSM Workshop, held in the last half of June in Breckenridge, Colorado. It's attended by ~400 scientists and others, and it's a semi-formal meeting to discuss CCSM. See more details at

    http://www.ccsm.ucar.edu/events/workshops.html

    ReplyDelete
  4. Hey G-man.... If you know anyone at the EOL group... put my resume on the top of the stack ok? Thanks. :)

    ReplyDelete
  5. Yeah, mine got approved about half an hour after the request as well. Half an hour with a human in the loop is pretty good. (I guess it's just luck of the draw but still... good show)

    Now comes the following along at home part.

    ReplyDelete
  6. Just FYI - CCSM4 will be released middle of 2010; it's intended to be buildable with Intel, Lahey, PGI, IBM and Pathscale Fortran compilers, as well as gfortran.

    ReplyDelete
  7. I suspect that floating point errors may be the least of the problems encountered. It's a concern of course, but I can think of many other things that might influence the results even if the software functions perfectly.

    ReplyDelete
  8. Thanks, G-Man. Perhaps by then, we'll have barked our ships enough to know what to do with it.

    ReplyDelete