Remove ambiguity in cell_methods, especially means over subgrid areas
|Reported by:||jonathan||Owned by:||cf-conventions@…|
Remove ambiguity about statistics described by cell_methods, especially means over subgrid areas
To indicate the portion of a cell over which a statistic has been calculated, in situations where there is a need to distinguish between statistics calculated for the same quantity over different portions of the cell, or where the quantity might be considered to be undefined over some portion of the cell. The statistic is usually a mean or a sum. The situation most often arises with cells in the horizontal, where the "portions" are different types of surface that don't have defined geographical boundaries. Some examples:
- sea_ice_thickness averaged over the area of sea ice, the area of sea, or
the entire area of the cell including land. These can all be written as A/B, where A is the total volume of sea ice in the cell, and B is the area of sea ice, the area of sea or the area of the cell.
- surface_upward_sensible_heat_flux averaged over different surface types
within the cell e.g. land, sea, land ice, forest. These might similarly be written as A/B, A (in W) being the area-integral of the flux applying to the given surface type, and B (in m2) either the area occupied by that type, or the cell area. Alternatively, this means the flux (W m-2) is expressed either per unit area of the particular surface type, or per unit area of the grid cell. When the values for different types are given per unit area of the cell, the sum of these values over all types is the mean for the cell as a whole.
- surface_temperature averaged over different surface types within the
cell. This is only likely to be given as a average value for each surface type i.e. formally where B is the area of that type e.g. the temperature is 300 K over the land portion of the cell, 310 K over the forest portion.
4. Initial Statement of Technical Proposal
In the standard_name guidelines, this issue is partly addressed by using where-phrases. However, this approach is unclear and inadequate. It can't indicate, for instance, whether the sensible heat flux applying to the land portion of the box is expressed per unit area of land or per unit area of the cell. The present proposal follows the one made on the email list in December 2006 http://www.cgd.ucar.edu/pipermail/cf-metadata/2006/001449.html. Following our discussion in Paris in June 2007, the proposal extends the use of cell_methods and coordinates to indicate subgrid variation more precisely, and eliminates where-phrases from standard names.
- If there is no cell_methods specified, the default interpretation for an
intensive quantity is "point", which means a local value in area or an instantaneous value in time, and "sum" for an extensive quantity, meaning the sum over area or time in the cell. No change is proposed to this: it is unproblematic, because point values and integrals do not involve dividing by anything. It is undefined what value should be given if the quantity does not exist e.g. for sea_ice_thickness where there is no sea ice, the value could be zero or missing, as either would make sense for a point value.
- Delete the existing standard names with where-phrases, making them aliases
of names without the where-phrases. There are only nine of them at present:
precipitation_flux_onto_canopy_where_land surface_net_downward_radiative_flux_where_land surface_snow_thickness_where_sea_ice surface_temperature_where_land surface_temperature_where_open_sea surface_temperature_where_snow surface_upward_sensible_heat_flux_where_sea water_evaporation_flux_from_canopy_where_land water_evaporation_flux_where_sea_ice
- Define a new standard_name of area_type, whose values could be any of
the surface_cover types as well as any distinctions of horizontal area which are not surface types, such as "cloud". It is not proposed to standardise the values of area_type at present, but they could be standardised later.
- To provide for greater use of string-valued auxiliary coordinate variables,
especially string-valued scalar coordinate variables:
- To the end of the first paragraph of 6.1, append: Other purposes for string identifiers are also described in Section 6.1.1, "Geographic Regions", and Section 7.3.3, "Statistics applying to portions of cells".
- To the end of the second paragraph of 6.1, append: If a character variable has only one dimension (the length of the string), it is regarded as a string-valued scalar coordinate variable, analogous to a numeric scalar coordinate variable (Section 5.7).
- Modify the section on 6.1 in the conformance document to read: A variable of character type that is named by a coordinates attribute is a label variable. This variable must have one or two dimensions. The trailing (CDL order) or sole dimension is for the maximum string length. If there are two dimensions, the leading dimension (CDL order) must match one of those of the data variable.
- A cell_methods entry is generically of the form "name: [name:
...] method" (see CF 7.3), where names are the names of dimensions, scalar coordinate variables, or standard_names. Horizontal area-means are indicated by "lat: lon: mean", if lat and lon are the latitude and longitude dimensions. I propose to introduce a special name of area to indicate horizontal area, so an area-mean can be written "area: mean". This is more obvious and convenient.
To do this, modify the paragraph in 7.3 beginning "If a data value is representative of variation over a combination of axes" by changing "a longitude-latitude gridbox would have" to "... could have", and appending the following:
To indicate variation over horizontal area, a special name of area is permitted as an alternative to specifying a combination of dimensions. The common case of an area-mean in longitude-latitude gridboxes can thus be shown by cell_methods="area: mean". If they are not longitude and latitude, the horizontal coordinate variables can be identified with axis attributes of X and Y (see Chapter 4, Coordinate Types).
- Since CF 7.3 on Cell methods is quite long, I propose
- To rename 7.3 as Statistical variation within cells. This title would explain more of what it is about and parallels the title of 7.4 on Climatological statistics.
- To insert subsection headings of 7.3.1 Statistics for more than one axis starting with the paragraph "If more than one ...", 7.3.2 Recording the spacing of the original data and other information starting "To indicate more precisely" and 7.3.4 Use of standard names in cell methods starting "The convention of specifying". (A new subsection on portions of cells will be inserted as 7.3.3 between the second and third existing subsections.)
- Insert a new subsection in CF 7.3 entitled Statistics applying to
portions of cells before the paragraph beginning "The convention of specifying", as follows:
By default, the statistical method indicated by cell_methods is assumed to have been evaluated over the entire cell. Sometimes it is necessary to evaluate different values of a quantity for different portions of a cell. To indicate this, one of two conventions may be used.
The first convention is the more general. In this convention, a string-valued coordinate variable or string-valued scalar coordinate variable (see Section 6.1, "Labels") indicates the portion of the cell. Variables with standard_names of land_cover, surface_cover or area_type are suitable. With this approach, a coordinate variable with dimension greater than one would allow values of a quantity to be given for various area types in one data variable, as is often needed in land surface models for example, since they deal with many types within each surface gridbox. In this convention, the cell_methods entry is of the form "name: method" as usual, where name could be area, but the statistical method applies to the selected portion of the cell only e.g. a mean over the sea-ice area.
The second convention is a shorthand for the commonest cases. In this convention, a cell_methods entry may be given of the form "name: method where type", in which type may be land, sea, sea_ice, or open_sea (sea area not occupied by sea ice). The phrase "where type" should be interpreted as exactly equivalent to supplying a scalar or size-one coordinate variable of area_type with value type.
Example. Means over land and sea.
dimensions: lat=73; lon=96; maxlen=20; lc2=2; variables: float surface_temperature(lat,lon); surface_temperature:cell_methods="area: mean where land"; float surface_upward_sensible_heat_flux(lc2,lat,lon); surface_upward_sensible_heat_flux:coordinates="land_cover2"; surface_upward_sensible_heat_flux:cell_methods="area: mean"; char land_cover2(lc2,maxlen); data: land_cover2="land","sea";
In any case, other coordinate variables may also implicitly restrict the portion of the cell considered by the statistical method. For example, the horizontal area of the ocean decreases with increasing depth. An area-mean as a function of depth in the ocean is therefore formed over different areas at different depths. This is not indicated explicitly in cell_methods. As described in Section 7.3.4 "Use of standard names in cell methods", a labeled axis of region may restrict the portion of a latitude-longitude gridbox to be considered.
If the method is mean, the cell_methods entry may be further supplemented by the phrase "over type", where type can be land, sea or all, and all means the entire area of the cell. A cell_methods entry of the form "mean where type1 over type2" indicates the mean is calculated by summing over the type1 portion of the cell and dividing by the area of the type2 portion. A cell_methods entry of the form "mean over type" indicates the mean is calculated by summing over the entire cell and dividing by the area of the type portion.
Example. Thickness of sea-ice and snow on sea-ice averaged over sea area.
variables: float snow_thickness(lat,lon); snow_thickness:cell_methods="area: mean where sea_ice over sea"; snow_thickness:standard_name="lwe_thickness_of_surface_snow_amount"; snow_thickness:units="m"; float sea_ice_thickness(lat,lon); sea_ice_thickness:cell_methods="area: mean over sea"; sea_ice_thickness:standard_name="sea_ice_thickness"; sea_ice_thickness:units="m";
In the case of sea-ice thickness, it makes no difference to include "where sea_ice", since the sum over all sea area of sea-ice thickness is obviously the same as the sum over sea-ice area only. In the case of snow thickness, the "where" phrase does make a difference; it excludes snow on land from the average. Omitting the "over" phrase would mean that both quantities would be averages over the entire cell, not just the sea area.
- Modify the first bullet of the section on 7.3 in the conformance document
The type of the cell_methods attribute is a string whose value is one or more blank separated word lists, each with the form
dim1: [dim2: [dim3: ...] ] method [where type1] [over type2] [within|over days|years] [(comment)]
where brackets indicate optional words. The valid values for dim1 [dim2 [dim3 ...] ] are dimension names of the associated variable, valid standard names, or the word area. The valid values of method are contained in Appendix D. The valid values for type1 are land, sea, sea_ice, or open_sea. The valid values for type2 are land, sea and all. When the method refers to a climatological time axis, the suffixes for within and over may be appended.
These clarifications will particularly benefit those providing or using data from models, when it is important to be clear exactly how area-means have been calculated. The current standard is unclear.
6. Status Quo
The necessary clarification could be recorded as a comment in () in the cell_methods. This is not usually done, and even if it were done, generic applications could not use it to distinguish the possibilities as it is not standardised. For the CMIP3 database, PCMDI described how means should be calculated e.g. whether sea ice thickness is calculated as the mean over sea ice area or some other area; the prescription was not recorded in the netCDF data produced, which is therefore not self-describing.
Change History (37)
comment:1 Changed 9 years ago by jonathan
- Summary changed from Remove ambiguity in by cell_methods, especially means over subgrid areas to Remove ambiguity in cell_methods, especially means over subgrid areas