CF aggregation rules
|Reported by:||davidhassell||Owned by:||cf-conventions@…|
In this ticket we propose a set rules, based on CF metadata, for deciding whether or not two arbitrary CF field constructs may be aggregated into one, larger field construct. A field construct (hereafter a field) is as defined in the proposed CF data model (see ticket #68), as are all other terms written in bold. In terms of CF-netCDF files, a field corresponds to a data variable, with all its attributes, coordinate variables, auxiliary coordinate variables, etc.
Aggregation may be thought of as the combination of one field with another to create a new field that occupies a larger space. In practice, this means combining two fields so that their data arrays are concatenated along exactly one dimension, as are their coordinate arrays which span that dimension, in such a way that the aggregated field conforms to the CF data model (and is therefore CF-netCDF compliant).
The CF-netCDF convention at present applies only to individual files, but there is a common and increasing need to be able to treat a collection files as a single dataset, and the CF standard does not define how this should be done. Like the ticket for the CF data model, this ticket does not propose any change to the CF standard. Our purpose is to write down general abstract rules for CF field aggregation which are consistent with the abstract CF data model.
These proposed CF aggregation rules make no reference to netCDF file format. They are built solely on the abstract CF data model. As such, they may be applied equally to fields stored in CF-netCDF files or to fields contained in a memory representation of the CF data model. To support the CF data model, we produced the cf-python software, and the latest version of that software includes an aggregation function based on these aggregation rules. This function can be used to combine CF-netCDF files by aggregating the fields they contain.
Our proposed rules are more flexible than the existing schemes that we are aware of. They are similar to the NcML aggregation types JoinExisting and JoinNew, but are more general in various ways, such as that the aggregating dimension need not be the outer dimension, nor be in the same position in different fields. Also, if combining fields from various netCDF files, the netCDF variable names need not match, because the variables are identified by their metadata instead of by their names. Any number of fields may ultimately be aggregated along more one or more dimensions by repeated aggregations between pairs of fields. Our software can handle this general approach, but it needs optimisation.
This proposal is closely related to the CF data model (ticket #68), and we would welcome comments on that ticket as well as the present one.
David Hassell (d.c.hassell at reading.ac.uk)
Jonathan Gregory (j.m.gregory at reading.ac.uk)