wiki:DataModel1.5Draft

Draft of the CF Data Model 1.5

This page is the working draft of the CF data model for CF 1.5, under the terms of reference of ticket #88. The original discussion in ticket #68 have been closed and a new ticket #95 opened to move adapt this draft model to a community accepted model.

UML Sketch

This sketch represents the model in its draft state, illustrating the relationships between types.

Controlled Vocabularies

standard_name

Where a type has an attribute of 'standard_name' its value must be taken from the list of standard names http://cf-pcmdi.llnl.gov/documents/cf-standard-names/

Constructs, Types

markup notes

  • An editor may add text: but this must be in italics.
  • An editor may raise a question about text section, this must be marked up as bold italics
  • A suggested textual replacement may be indicated by ( current text | replacement text)

This facilitates communication on the trac ticket

Field construct

The central concept of the data model is a field construct. A field construct corresponds to exactly one data array together with associated information about the domain in which the data resides (defined by spatio-temporal and other coordinates) and other metadata. This data model makes a central assumption that each field construct is independent.

A Field defines a single phenomenon, using a standard_name, long_name, units and other attributes, all of which are optional. Each element of the data array is a measure of this phenomenon within the sampling domain.

Each field construct must contain:

  • domain_axes: An unordered list of zero or more domain axis constructs. Each domain axis construct declares a dimension of the field.
  • A data array, which contains the data of the field. The shape of the data array is determined by an ordered subset of the domain axes. All domain axes of size greater than one must be included in the subset, but domain axes of size one may optionally be omitted, because their position in the order of domain axes makes no difference to the order of data elements in the array. If there are no domain axes of greater size than one, the single datum may be a scalar instead of an array. If the data array has more than one element, they must all be of the same data type, which may be numeric, character or string.

Each Field Construct may optionally contain:

  • dimension_coords: An unordered collection of ( dimension coordinate constructs | DimCoord instances).
    • Each ( dimension coordinate construct | DimCoord instance) provides physical coordinates to locate the cells at unique positions along a single domain axis.
  • auxiliary_coords: An unordered collection of (auxiliary coordinate constructs | AuxCoord instances). Each (auxiliary coordinate construct | AuxCoord instance) provides physical coordinates to locate the cells along one or more domain axes.
  • Yet to be worked upon: An unordered collection of cell measure constructs.
  • Yet to be worked upon: A cell methods construct, which refers to the domain axes (but not their sizes).
  • Yet to be worked upon: An optional unordered collection of (transform constructs | Transforms) (corresponding to CF-netCDF formula_terms and grid_mapping).
  • Other properties, which are metadata that do not refer to the domain axes, and serve to describe the data the field contains. Properties may be of any data type (numeric, character or string) and can be scalars or arrays. These properties correspond to attributes in the netCDF file, but we use the term "property" instead because not all CF-netCDF attributes are properties in this sense.
  • Attributes which define the phenomenon of the Field, using the controlled names: standard_name, long_name, units
    • further attributes which use a name not controlled by the CF semantics.
  • Yet to be worked upon: A list of ancillary fields, which contain metadata about the elements of the field's data array.

Collectively, the (domain axis, dimension coordinate, auxiliary coordinate, cell measure and cell method constructs | domain_axes, dimension_coords, auxiliary_coords and cell_measures) describe the domain in which the data resides and the sampling regime from that domain. Thus a field construct can be regarded as a domain with data (in that domain | sampled from that domain).

The Field in a NetCDF file

The CF-netCDF formula_terms (see also Transform constructs) and ancillary_variables attributes make links between field constructs. These links are fragile and it might not always be possible for data processing software to maintain a consistent set of such links when writing fields to files or manipulating them in memory.

CF-netCDF considers fields which are contained in single netCDF files. In a dataset contained in a single netCDF file, each data variable corresponds to one field construct. This data model has a broader scope. It applies also to data contained in memory and to datasets comprising several netCDF files. A field construct may span data variables in more than one file, for instance from different ranges of a time coordinate. Rules for aggregating data variables from several files into a single field construct are needed but are not defined by CF version 1.5; such rules are regarded as the concern of data processing software. Technically, data variables stored in CF-netCDF files are often not independent, because they share coordinate variables. However, we view this solely as a means of saving disk space, and we assume that software will be able to alter any field construct in memory without affecting other field constructs. For instance, if the coordinates of one field construct are modified by averaging the field values over one dimension, it will not affect any other field construct.

Explicit tests of domain consistency will be required to establish whether two data variables have the same coordinates or share a subset of these coordinates. Such tests are necessary in general if CF is applied to a dataset comprising more than one file, because different variables may then reside in different files, with their own coordinate variables (this text may be better in the previous section). Within a netCDF file, tests for the equality of coordinates between different data variables may be simplified if the data variables refer to the same coordinate variable.

Domain axis construct

A (domain axis construct | DomainAxis) must contain:

  • A size, which is an integer that must be greater than zero, but could be equal to one.

Dimension coordinate construct

A (dimension coordinate construct | DimCoord) (may | must) contain:

  • A one-dimensional numerical coordinate array of the size specified (for the domain axis | by the referencing DomainAxis).
    • If the size is one, the single coordinate value may be a scalar instead of an array.
    • If the size is greater than one, the elements of the coordinate array must:
      • all be of the same numeric data type,
      • they must all have different non-missing values,
      • and they must be monotonically increasing or decreasing.
    • (Dimension coordinate constructs | DimCoord instances) cannot have string-valued (coordinates | coordinate arrays).

A (dimension coordinate construct | DimCoord) may contain:

  • A two-dimensional numerical boundary array, whose slow-varying dimension (first in CDL, second in Fortran) equals the size specified by the domain axis construct, and whose fast-varying dimension is two, indicating the extent of the cell.
  • For climatological time dimensions, the bounds are interpreted in a special way indicated by the cell methods.
  • Sometimes the bounds are the important information for locating the cell, and the coordinates are notional, especially for extensive quantities.
  • (Properties | Attributes) (in the same sense as for the (field construct| Field)) serving to (describe the coordinates | define the coordinate's phenomenon.

The Dimension Coordinate in a NetCDF file

A dimension coordinate construct corresponds to a netCDF coordinate variable, whose name is the same as the name of its single dimension, or a CF-netCDF numeric scalar coordinate variable. A CF-netCDF string-valued coordinate variable or string-valued scalar coordinate variable corresponds to an auxiliary coordinate construct (not a dimension coordinate construct), with a domain axis that is not associated with any dimension coordinate construct.

In this data model we permit a domain axis construct not to have a dimension coordinate construct if there is no appropriate numeric monotonic coordinate. That is the case for a dimension that runs over ocean basins or area types, for example, or for a domain axis that indexes timeseries at scattered points. Such domain axes do not correspond to a continuous physical quantity. (They will be called index dimensions in CF version 1.6.)

Auxiliary coordinate construct

An auxiliary coordinate construct must contain:

  • A coordinate array whose shape is determined by the (domain axes | DomainAxis instances) in the order listed, optionally omitting any domain axes of size one.
  • If all domain axes are of size one, the single coordinate value may be a scalar instead of an array.
  • (If the array has more than one element, they | elements) must all be of the same data type (numeric, character or string), but they do not have to be distinct or monotonic.
  • Missing values are not allowed (in CF version 1.5).

An auxiliary coordinate construct may contain:

  • A boundary array with all the dimensions, in the same order, as the coordinate array, and an additional dimension (following the coordinate array dimensions in CDL, preceding them in Fortran) equal to the number of vertices of each cell.
  • (Properties | Attributes) (in the same sense as for the field construct) serving to describe the coordinates.

Auxiliary Coordinates in Cf NetCDF

Auxiliary coordinate constructs correspond to auxiliary coordinate variables named by the coordinates attribute of a data variable in a CF-netCDF file. CF requires there to be auxiliary coordinate constructs of latitude and longitude if there is two-dimensional horizontal variation but the horizontal coordinates are not latitude and longitude.

In CF-netCDF, a string-valued auxiliary coordinate construct can be stored either as a character array with an additional dimension (last dimension in CDL) for maximum string length, or represented by a numerical auxiliary coordinate variable with a flag_meanings attribute to supply the translation to strings.

Legacy text For reference only

Constructs, Types

Field

The central concept of the data model is a Field. A Field corresponds to exactly one data array together with associated information about the domain and sampling in which the data resides (defined by spatio-temporal and other coordinates) and other metadata. This data model makes a central assumption that each Field is independent.

The Field defines a domain and one phenomenon described over that domain. It contains a multi-dimensional array of data values, which may include missing data, and the metadata which define the domain.

Each Field may contain the following, all of which are optional.

  • An ordered domain_axes collection, of DomainAxis instances.
  • A data array whose shape is determined by the domain axes in the order listed, optionally omitting any domain axes of size one.
    • (It is possible to omit domain axes of size one because their position in the order of domain axes makes no difference to the order of data elements in the array.)
      • but it does affect the shape of the array
    • If there are no domain axes of greater size than one, the single datum may be a scalar instead of an array.
    • The data array must be of a single data type, which may be numeric, character or string.
  • A dimension_coords collection of DimCoord instances:
    • A dimension coord member provides physical coordinates to define and locate the cells at unique positions along a single DomainAxis.
    • Each member is referenced by the Field using a qualified association, exclusively mediated by one DomainAxis instance;
      • i.e. a DomainAxis may reference one or zero DimCoords as a member of the Field's dimension_coords
  • An auxiliary_coords collection of AuxCoord and DimCoord instances.
    • An auxiliary coord provides physical coordinates to locate the cells along one or more DomainAxis instances.
  • A cell_measures collection of CellMeasure instances.
  • A cell methods construct, which refers to the domain axes (but not their sizes).
  • A cell_methods container of CellMethod instances referencing elements of the dimension_coords collection.
  • A collection of Transform constructs
  • Attributes: key:value pairs which serve to describe the data the field contains.
  • Other properties, which are metadata that do not refer to the domain axes, and serve to describe the data the field contains. Properties may be of any data type (numeric, character or string) and can be scalars or arrays. These properties correspond to attributes in the netCDF file, but we use the term "property" instead because not all CF-netCDF attributes are properties in this sense.
  • A list of ancillary fields (corresponding to the CF-netCDF ancillary_variables attribute, which identifies other data variables that provide metadata).

Collectively, the domain_axes, dimension_coordinates, auxiliary_coordinates, cell_measures and cell_methods describe the domain and sampling in which the data resides. Thus a Field can be regarded as a domain with data in that domain.

DomainAxis

A DomainAxis declares a degree of freedom of the field. It must contain

A size: a postive integer greater than zero

DimCoord

A DimCoords instance must contain:

  • A one-dimensional numerical coordinate array of the size specified by a referencing DomainAxis.
    • If the size is one, the single coordinate value may be a scalar instead of an array.
    • If the size is greater than one, the elements of the coordinate array must all be of the same numeric data type, they must all have different non-missing values, and they must be monotonic: increasing or decreasing.
    • missing values are not allowed in the array
    • The coordinate array must be strictly monotonic
    • DimCoord instances cannot have string-valued coordinates.

and may contain:

  • A two-dimensional numerical boundary array, whose slow-varying dimension (first in CDL, second in Fortran) equals the size specified by the referencing DomainAxis, and whose fast-varying dimension is two indicating the extent of the cell.
    • For climatological time dimensions, the bounds are interpreted in a special way indicated by the cell methods.
    • Sometimes the bounds are the important information for locating the cell, and the coordinates are notional, especially for extensive quantities.
  • Attributes: key:value pairs, describing the DimCoord instance definition
  • Properties (in the same sense as for the field construct) serving to describe the coordinates.

AuxCoord

An AuxCoord must contain:

  • A coordinate array whose shape is determined by the referencing DomainAxes in the order listed
    • optionally omitting any domain axes of size one. If all domain axes are of size one, the single coordinate value may be a scalar instead of an array.
    • If the array has more than one element, they must all be of the same data type (numeric, character or string), but they do not have to be distinct or monotonic.
    • Missing values are not allowed (in CF version 1.5). i thought they were

and may also contain

  • A boundary array with all the dimensions, in the same order, as the coordinate array, and an additional dimension (following the coordinate array dimensions in CDL, preceding them in Fortran) equal to the number of vertices of each cell.
  • Attributes: key:value pairs, describing the AuxCoord instance definition
  • Properties (in the same sense as for the field construct) serving to describe the coordinates.

CellMethod

CellMeasure

Transform

Notes

Qualified Associations

The associations between the Field and its Coordinates and CellMeasures are qualified associations. These are UML concepts which denote a managed association without mandating how this association is managed; only the constraints of the relationship are detailed.

E.g. a Field may define One or Zero dimension_coordinates (DimCoord instances) for each DomainAxis of the Field.

Last modified 6 years ago Last modified on 04/11/13 01:52:09

Attachments (1)

Download all attachments as: .zip