Data set gathering is incredibly tedious and expensive, and in many ways nonreplicable. For example, there are plenty of dendochronology data sets out there (measurements of the width of tree rings as a measurement of rainfall, CO2 and nitrogen in soils); not all of them are useful as climate proxies. Some - coming from very carefully selected points - may be good measurements of local climate effects.
This is particularly important for creating climate records prior to about 1640, when the first scientifically reliable thermometers came about.
One of the important tasks in ANY scientific endeavor is to winnow the data. Particularly, in the effects of climatology, you have a lot of noisy, geographically discontiguous data sets that need to be interpreted and a chronology built. Even the thermometer records from roughly 1860 onwards have issues; Pielke et. al 2007 documents some of the noise signals in the thermometer data. (I'm not going to touch the issue on whether Dr. Pielke's work was blocked from publication by Dr. Karl at NCDC; the paper is a good discussion of where noise comes into contemporary data.).
What this means is that I'm somewhat more forgiving of fudge factors. I've been around practicing science enough to know that everything has fudge factors in science, because if you don't limit the variable set, you can't quantify the outcomes with any reliability.
That lack of reliability is why science reports things in probabilities. Which gets turned by people on both sides of the political spectrum into "But that means you're not certain of the outcome. I'll change my position when you can give me absolute certainty." Normally, this comes about when you have a data read or analysis that reveals an uncomfortable truth.
Good science discusses where the uncertainties in the data set and analysis methods are. Unfortunately, good science that does this routinely gets picked apart by agenda-hawks in the method described above.
Between noisy data sets coming from geographically discontiguous areas, and having to state uncertainty percentages, it is NOT unreasonable to say 'the data are clearly wrong'. The data can be from bad instrumentation, the data could be corrupted by factors that aren't being accounted for, and the data could be correct; when you have multiple data sources and one or two of them are clear outliers, there may well be an instance where 'the data are clearly wrong'. (This is why I don't consider the Phil Jones email to be a smoking gun.)
A later post is going to cover what data sets are being used for climate science, what their noise sources are, and how those noise sources are corrected for. It will be written for a technical layman (because that's what I really am); I've got friends who know more who'll look things over to make sure that any obvious gaffes are fixed. There will, almost certainly, be elisions of technical information.