Posted by: climatewonk | December 3, 2009

Climate Gate Post 3 — PCA, BCP, OMG!

We left off yesterday with the M&M paper attempting to cast doubt over the methods and choice of bristlecone pines in the MBH98 600 year paleoclimate temperature reconstruction.

So what exactly is the problem with MBH98?

Three things: Principal Component Analysis (PCA), bristlecone pines and the divergence problem in dendroclimatology.

Real Climate has a post on PCA, which defines it thusly:

A procedure by which a spatiotemporal data set is decomposed into its leading patterns in both time (see ‘Principal Component’) and space (see ‘Empirical Orthogonal Function’) based on an orthogonal decomposition of the data covariance matrix.


From the Wikipedia entry, a bit more detail:

Principal component analysis (PCA) involves a mathematical procedure that transforms a number of possibly correlated variables into a smaller number of uncorrelated variables called principal components. The first principal component accounts for as much of the variability in the data as possible, and each succeeding component accounts for as much of the remaining variability as possible.

PCA is the simplest of the true eigenvector-based multivariate analyses. Often, its operation can be thought of as revealing the internal structure of the data in a way which best explains the variance in the data.

PCA is the simplest of the true eigenvector-based multivariate analyses. Often, its operation can be thought of as revealing the internal structure of the data in a way which best explains the variance in the data. If a multivariate dataset is visualised as a set of coordinates in a high-dimensional data space (1 axis per variable), PCA supplies the user with a lower-dimensional picture, a “shadow” of this object when viewed from its (in some sense) most informative viewpoint.

PCA is closely related to factor analysis; indeed, some statistical packages deliberately conflate the two techniques. True factor analysis makes different assumptions about the underlying structure and solves eigenvectors of a slightly different matrix.

Simple enough?

What’s going on with this paleoclimate research?

The instrumental record only goes back a century and a half or so, so we have no way to judge if temperatures today are warmer, colder or similar to those in the distant past. We need temperature proxies to help reconstruct the climate of the past. Now some might say this is a fool’s game, but there are many things we could not measure directly that required some kind of proxy measure. The climate is one of them.

Based on research on the relationship between environmental conditions and tree growth, dendroclimatology attempts to examine tree growth patterns and their relationship to climate variables today and in the past. This is done to better understand the science of tree growth, but especially in order to create a temperature reconstruction of past climate. Tree ring characteristics are considered to be proxy measures of past climate.

The assumptions, in very simplified language, are:

1. Tree ring width and density are related to environmental variables such as temperature, moisture, insolation, and site conditions such as aspect and location.

2. The relationship between ring width/density and temperature, in environments in which temperature is the limiting factor, is constant over time, in that x increase/decrease in temperature will result in y increase/decrease in width/density.

3. By calibrating tree ring width/density to the instrumental temperature record, it is possible to determine the temperature of the past by examining tree ring width/density from tree cores of living and fossil trees.

Seems logical and for the most part, it seems to hold true. Of course, there are always problems with any set of assumptions about physical phenomenon. Are there other factors that may affect tree width/growth and what effect do they have? How much individual variation exists? Do different species respond more or less to temperature fluctuations? What is the best way of collecting representative samples of trees so that any conclusions drawn from the research are valid or reliable? What possible errors exist in the methods and analysis?

MBH98 and 2000 used PCA to construct a paleoclimate temperature reconstruction using dendro data, in particular, tree rings. The actual method is quite complicated, and I am not a statistician, but suffice to say that when McIntyre went looking at MBH98 and 2000, he brought up a number of criticisms of the work in question.

Here is an excerpt from the abstract of their paper in Geophysical Research Letters:

Their method, when tested on persistent red noise, nearly always produces a hockey stick shaped first principal component (PC1) and overstates the first eigenvalue. In the controversial 15th century period, the MBH98 method effectively selects only one species (bristlecone pine) into the critical North American PC1, making it implausible to describe it as the “dominant pattern of variance”.

In response, various papers and enquiries looked into the PCA analysis in MBH98 and 2000. For example, Wahl and Ammann (2007) argue that:

Our examination does suggest that a slight modification to the original Mann et al. reconstruction is justifiable for the first half of the 15th century ( ∼ +0.05–0.10◦ ), which leaves entirely unaltered the primary conclusion of Mann et al. (as well as many other reconstructions) that both the 20th century upward trend and high late-20th century hemispheric surface temperatures are anomalous over at least the last 600 years.

So, the problem was the way PCA was used in the research to create a temperature reconstruction. M&M argue that the method MBH used relied too heavily on a small subset of the data, in particular the bristlecone pines, to do the reconstruction and that they are not reliable for use because of the problem of divergence.

More on divergence later.

For now, there have been a number of investigations into the contention of M&M that the methods used in MBH98 and 2000 were biased towards creating a hockey stick graph and that its conclusions are therefore flawed and do not support the contention that surface temperatures in the last half of the 20th century were warmer than any time in the past 600, 1,000 or 2,000 years.

The National Research Council conducted an investigation in 2006, and its report, Committee on Surface Temperature Reconstructions Over the Past 2,000 Years, concluded:

The basic conclusion of Mann et al. (1998, 1999) was that the late 20th century warmth in the Northern Hemisphere was unprecedented during at least the last 1,000 years. This conclusion has subsequently been supported by an array of evidence that includes both additional large-scale surface temperature reconstructions and pronounced changes in a variety of local proxy indicators, such as melting on ice caps and the retreat of glaciers around the world, which in many cases appear to be unprecedented during at least the last 2,000 years.

In other words, the NRC report concluded that while there were some small issues with the use of PCA in the 1500 ca data, the basic conclusions about unprecedented late 20th century warmth held.

The report concludes:

Surface temperature reconstructions for periods prior to the industrial era are only one of multiple lines of evidence supporting the conclusion that climatic warming is occurring in response to human activities, and they are not the primary evidence.

Next: Wegman and Divergence


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: