Monday, August 1, 2011

Cell weighting schemes for the Earth.

I referred in my CRUTEM3 based reconstruction to two separate weighting schemes, involving dividing the Earth into cells in two different ways. I'll say more about that here. It also tells more about the strengths and weaknesses of the CRUTEM3 data set.


Purpose of weighting.

I'll illustrate this with the conventional scheme for creating cells by 5°x5° lat/lon blocks:


You can see that the stations are unevenly distributed. This is a land/sea analysis using the new CRUTEM3 data set with HADSST2, and the stations marked are those that reported in April 2009. The "stations" in the sea are gridcells for HADSST2, which is also on a 5°x5° grid.

It would be most undesirable to include the stations unweighted - then areas that happened to have many stations reporting would unduly influence the estimate of what is supposed to be a global average. In fact, the usual way of getting areas to balance equally is to add up area averages, over the cells that you see.

TempLS takes a slightly different view, but functionally it is very similar. The cells are weighted according to the inverse of their area density, as well as can be estimated. And the simplest estimate is stations per cell, adjusted for area.

Empty cells

As you can see, in any month quite a lot of cells (marked in yellow) are empty. Conventionally, they are just left out of the sum. TempLS does that in effect too. But this detracts from the purpose of weighting. Leaving them out does not mean no effect; instead, it is as if they were included, but assigned the average global anomaly for that month.

What should be done is to include them with the best estimate of their likely value. A better estimate is something derived from nearby stations. Many cells are empty in the Arctic, for example, and that leads to the relatively high trend there being underrepresented in the global average.

Remedies

The first remedy is to use a different scheme which has a smaller empty cell area. I will describe a simple one. I've described previously a more complex one. It is fairly time-consuming, though.

In V2.2 I have developed another, in conjunction with an equal-cell area division. Nearby cells are assigned extra weight to make up for what the empty cells are missing. This weight is transferred to the stations.

The solution method emulates diffusion. Empty cells transfer weight to their neighbors. Those that have cells keep the weight, but some will have gone to empty cells. So it is transferred again. About four cycles of this gets most of the weight to occupied cells.

There is a trick there. It's like explicit solution of numerical diffusion by "relaxation", and it can be accelerated by over-relaxation. In each step, about 4/3 times the weight is removed - ie empty cells go negative. But they get some from their neighbors.

Implementation

As I mentioned in my earlier post, this is available, and I tried it for the CRUTEM data. The answers were more different than for the simple schemes. This may be correct, but I'm checking.

Anyway, in this post, I mainly wanted to show more about the empty cell issue and how a new cell scheme can help. So here are some more views of problem areas. Africa is bad - S America actually seems better covered in CRUTEM than in GHCN. But maybe April 2009 was a good month.



Somalian pirates seem to be making a dent in SST measures.



The new scheme

The simple scheme is not good near the poles. The elements get small, so the few stations that are there are assigned small weighting. The area uncovered is proportionally greater.

The new scheme retains 5° latitude bands, but divides with fewer cells per band away from the equator. So each cell looks like the 5x5 equatorial block. That's pretty good up to 85°N. Then the last circle is divided into three.

One interesting geometry fact I learnt while doing this. If you slice a sphere equally (by thickness of slice) then the sphere surface area in each slice is equal. The rind from the equator and the near circle at the pole are the same.

This scheme does not have equal thicknesses, but it makes the calculation of cell area easy. Just work out what the thickness of each lat band is (cos lat), and divide that by the number of cells in the band.

OK, so here are the pictures:









Using these cells without diffusion was method 0 of the previous post.

There are many ways a sphere could be divided, but you need a good scheme for then deciding into which cell an arbitrary point (station) falls. With this scheme, you just decide with lat band it is, and then which of the equal divisions of longitude it falls into.


9 comments:

  1. In meteorological analysis history, what you're doing here is in the vein of Cressman analysis.

    http://onlinelibrary.wiley.com/doi/10.1111/j.2153-3490.1954.tb01126.x/abstract

    http://journals.ametsoc.org/doi/abs/10.1175/1520-0493%281959%29087%3C0367%3AAOOAS%3E2.0.CO%3B2

    ReplyDelete
  2. This new scheme you are using looks very promising. Have you calculated yet the difference in using it would have on a reconstruction?

    ReplyDelete
  3. Penguin,
    Thanks for the references. The Cressman scheme is for interpolating a field variable - here I'm looking at weights. Cressman requires fitting a polynomial or equivalent. Then the lat/lon information would be essential.

    ReplyDelete
  4. Robert,
    Yes, in the previous post I showed plots with the two methods (eg this one. What I should have done, and will add as an update, is to calculate the improvement the second scheme makes to empty cell area over a long calc.

    ReplyDelete
  5. There are standard, well established interpolation methods for doing this sort of thing - kriging and optimal interpolation are key terms. I'd be wary of an ad-hoc invention.

    ReplyDelete
  6. Thanks, James, it's an interesting suggestion. It wasn't at first clear what to make of it, since I'm thinking of the problem as one of reassigning weights. But another way of solving the empty cell issue would be to put an interpolated value in the cell, and proceed with unchanged weights. Then I could use kriging or OI.

    Turning that around, since the interpolant is a linear combination of surrounding values, I could take the coefficients and use them as weight increments. That would have the same effect.

    I'd defend the homeliness of my earlier remedy, though. The default, as I think in GISS, is to do nothing, which means in effect interpolating the global average. Anything local is better than that.

    ReplyDelete
  7. Kriging, OI, and Cressman are all used to fill in the empty cells in the first place. There's no need to fill them in first. Indeed, doing so by some other method renders these moot.

    I'm fond of homely methods myself. But since Cressman is exactly a method to assign weights to data you do have in order to fill in areas where you don't have data, and it proceeds iteratively in a diffusive-like manner, it seems worth some time of yours to see what has already been done in the area. OI is superior, but more removed from what you're brewing at home.

    GISS uses something more akin to OI. UKMO ignores the empty cells.

    ReplyDelete
  8. Thanks, PD. I have actually blogged (briefly) on kriging previously. It seems though that in practice krigers use empirical variograms, which seem to have their own homeliness.

    At first sight, what I'm looking for doesn't seem statistical. I have a known distribution of stations, and I want to partition the surface to reproduce a spatial integration formula. I've looked at ways of doing that with triangulation.

    But there is an underlying notion of localness that gives a spatial scale, and depends on spatial correlation. So that's where the variogram comes in. I have the feeling that if I did get a kriging method working, the logical thing to do would be to throw away the cells and derive the weights directly from the variogram.

    I've obviously missed what GISS does - I'll try to track that down.

    ReplyDelete
  9. I've been thinking more about the advice of penguindreams and James about interpolation methods. A basic problem here is that we don't have, at any useful stage of this LS method, anomalies to interpolate, and on land at least (where the main problem is), one can't interpolate raw temperatures from different stations.

    So I have to work through the weighting, and I can't easily use optimal interpolation based on the data. We don't have interpolable data, at least until the LS is done. Maybe I could compute the local offset L and then interpolate an effective anomaly as an iterative process.

    My inclination continues to be to work through the weights. I can invert an interpolation formula, as I described in my response to James. Maybe I can work out an optimal interpolation after calculation of L and use the coefficients

    I still haven't really figured out what GISS does, but for the same reason I doubt that I can use it. GISS has gone to a lot of trouble to create grid-based anomalies. The LS method avoids this pesky stage.

    The simplest method may well be iterative. Do once as is, interpolate the empty cells using the resulting G (anomalies) and L, and do it again.

    ReplyDelete