Ticket #68: cfdm_0.7.html

File cfdm_0.7.html, 19.8 KB (added by davidhassell, 8 years ago)

version 0.7 of the proposed CF data model

2<meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">
3<title>Draft CF data model, proposed version 0.7</title>
5<body bgcolor="#efefdf">
7<h1 align="center">Draft CF data model</h1>
8<h2 align="center">Proposed version <font color="green">0.7</font></h2>
11<p>In <a href="https://cf-pcmdi.llnl.gov/trac/ticket/88">CF trac
12ticket 88</a>, proposed by Mark Hedley and accepted on 5th August
132012, it has been decided that CF should adopt a data model. The data
14model will be a logical abstraction of the concepts of CF data and
15metadata, and the relationships that exist between these concepts, but
16will not define an application programming interface (API) for CF.
17Adopting a data model is believed to offer the following benefits:
19<li>Providing an orientation guide to the CF Conventions Document.
20</li><li>Guiding the development of software compatible with CF.
21</li><li>Facilitating the creation of an API which
22"behaves" or "feels" like CF and is intuitive to use.
23</li><li>Providing a reference point
24for gap analysis and conflict analysis of the CF specification.
25</li><li>Providing a communication tool for discussing CF concepts and
26proposals for changes to the CF specification.
27</li><li>Setting the groundwork to expand CF beyond netCDF files.
30<p>The present document proposes a data model corresponding to
31the <a href="http://cf-pcmdi.llnl.gov/documents/cf-conventions/1.5">CF
32metadata standard</a> (version 1.5). The data model avoids prescribing
33more than is needed for interpreting CF as it stands, in order to
34avoid inconsistency with future developments of CF. This document is
35illustrated by
36the <a href="http://www.met.rdg.ac.uk/%7Edavid/newCF_0.7.pdf">accompanying
37UML diagram</a> of the data model.
39</p><p>As well as describing the CF data model, this document also
40comments on how it is implemented in netCDF. Since the CF data model
41could be implemented in file formats other than netCDF, it would be
42logically better to put the information about CF-netCDF in a separate
43document, but when introducing the data model for the first time, we
44feel that this document would be harder to understand if it omitted
45reference to the netCDF information. We propose that these functions
46should be separated in a later version of the data model. Some parts
47of the CF standard arise specifically from the requirements or
48restrictions of the netCDF file format, or are concerned with
49efficient ways of storing data on disk; these parts are not logically
50part of the data model and are only briefly mentioned in this
53</p><p>In this document, we use the word "construct" because we feel
54it to be a more language-neutral term than "object" or "structure".
55The constructs of this data model might correspond to objects in an OO
58</p><h2>Field construct</h2>
60The central concept of the data model is a <b>field construct</b>. In
61a dataset contained in a single netCDF file, each data variable
62usually corresponds to a field construct, but a field construct might
63be a combination of several data variables. In a dataset comprising
64several netCDF files, a field construct may span data variables in
65more than one file, for instance from different ranges of a time
66coordinate (to be introduced by Gridspec in CF version 1.7). Rules for
67aggregating data variables from one or several files into a single
68field construct are needed but are not defined by CF version 1.5; such
69rules are regarded as the concern of data processing software.
71<p>This data model makes a central assumption that each field
72construct is independent. Data variables stored in CF-netCDF files are
73often not independent, because they share coordinate variables.
74However, we view this solely as a means of saving disk space, and we
75assume that software will be able to alter any field construct in
76memory without affecting other field constructs. For instance, if the
77coordinates of one field construct are modified, it will not affect
78any other field construct. Explicit tests of equality will be required
79to establish whether two data variables have the same
80coordinates. Such tests are necessary in general if CF is applied to a
81dataset comprising more than one file, because different variables may
82then reside in different files, with their own coordinate
83variables. <font color="green">In a netCDF file, tests for the
84equality of coordinates between different data variables may be
85simplified if the data variables refer to the same coordinate
88</p><p>Each field construct may have
91<li><font color="green">An ordered list of zero or more <b>domain axis
92    constructs</b>.</font>
94<li>A <b>data array</b> whose shape is determined by
95    the <font color="green">domain axes</font> in the order listed,
96    optionally omitting any <font color="green">domain axes</font> of
97    size one. If there are no <font color="green">domain axes</font>
98    of greater size than one, the data array may be a
99    scalar. <font color="green">If there are no domain axes then data
100    array must be a scalar.</font> <font color="green">Domain
101    axes</font> of size one can be omitted because their position in
102    the order of <font color="green">domain axes</font> makes no
103    difference to the order of data elements in the array. The
104    elements of the data array must all be of the same data type,
105    which may be numeric, character or string.
107<li><font color="green">An unordered collection of <b>dimension
108    coordinate constructs</b>.</font>
110<li>An unordered collection of <b>auxiliary coordinate constructs</b>.
112<li>An unordered collection of <b>cell measure constructs</b>.
113</li><li>A <b>cell methods construct</b>, which refers to
114the <font color="green">domain axes</font> (but not their sizes).
115</li><li>An unordered collection of <b>transform constructs</b>.
117<li>Other <b>properties</b>, which are metadata that do not refer to
118    the <font color="green">domain axes</font>, and serve to describe
119    the data the field contains. Properties may be of any data type
120    (numeric, character or string) and can be scalars or arrays. They
121    are attributes in the netCDF file, but we use the term "property"
122    instead because not all CF-netCDF attributes are properties in
123    this sense.
125<li>A list of <b>ancillary fields</b>. This corresponds to the
126    CF-netCDF <tt>ancillary_variables</tt> attribute, which identifies
127    other fields that provide metadata.
131All the components of the field construct bar the data array are
134<font color="green">
135Collectively, the domain axis, dimension coordinate, auxiliary
136coordinate, cell measure and cell method constructs describe
137the <b>domain</b> in which the data resides. Thus a field construct
138can be regarded as a domain with data in that domain.
141<p>The CF-netCDF <tt>formula_terms</tt> (see also <b>Transform
142constructs</b>) and
143<tt>ancillary_variables</tt> attributes make links between field constructs.
144These links are fragile.
145If a field construct is written to a file, it is not required that any
146other field constructs to which it is linked are also written to the file.
147If an operation alters one field
148construct in a way which could invalidate a relationship with another field
149construct, the link should be broken. The user of software will have to be
150aware of these relationships and remake them if applicable and useful.
152<font color="green"></p><h2>Domain axis construct</h2>
154A domain axis construct must contain
156<li>A <b>size</b> (an integer greater than zero), which can be equal
157    to one.
161<font color="green"></p><h2>Dimension coordinate construct</h2>
163A dimension coordinate construct indicates the physical meaning and
164locations of the cells for a unique <b>domain axis</b> of the field.
167<font color="green">A dimension coordinate construct may
170<li>A scalar or one-dimensional numerical <b>coordinate array</b> of
171    the size specified for the <font color="green">domain
172    axis</font>. The elements of the coordinate array must all be of
173    the same numeric data type, they must all have different
174    non-missing values, and they must be monotonically increasing or
175    decreasing. Dimension coordinate constructs cannot have
176    string-valued coordinates. In this data model, a CF-netCDF
177    string-valued coordinate variable or string-valued scalar
178    coordinate variable corresponds to an auxiliary coordinate
179    construct (not a dimension coordinate construct), with
180    a <font color="green">domain axis</font> which is not associated
181    with a <font color="green">dimension coordinate construct</font>.
183<li>A two-dimensional <b>boundary coordinate array</b>, whose
184    slow-varying (second in Fortran) dimension equals the size
185    specified by the <font color="green">domain axis</font> construct,
186    and whose fast-varying dimension is two, indicating the extent of
187    the cell. For climatological time dimensions, the bounds are
188    interpreted in a special way indicated by the cell methods.
190<li>Properties (in the same sense as for the field construct) serving
191    to describe the coordinates.
195<p>In this data model we permit a domain axis not to have a coordinate
196array if there is no appropriate numeric monotonic coordinate. That is
197the case for a dimension that runs over ocean basins or area types,
198for example, or for a domain axis that indexes timeseries at scattered
199points. Such domain axes do not correspond to a continuous physical
200quantity. (They will be called <b>index dimensions</b> in CF version
203</p><h2>Auxiliary coordinate construct</h2>
205<font color="green">
206An auxiliary coordinate construct provides auxiliary information for
207interpreting the cells of an ordered list of one or more <b>domain
208axes</b> of the field.</font>
210An auxiliary coordinate construct must contain
213<li>A coordinate array <font color="green">whose shape is determined
214by the domain axes in the order listed, optionally omitting any domain
215axes of size one</font>. The elements of the coordinate array must all
216be of the same data type (numeric, character or string), but they do
217not have to be distinct or monotonic. Missing values are not allowed
218(in CF version 1.5).
221and may also contain
223<li>A boundary coordinate array with all the dimensions, in the same
224order, as the coordinate array, and a fastest-varying dimension (first
225dimension in Fortran) equal to the number of vertices of each cell.
227<li>Properties serving to describe the coordinates.
231Auxiliary coordinate constructs correspond to auxiliary coordinate
232variables named by the <tt>coordinates</tt> attribute of a data
233variable in a CF-netCDF file. CF recommends there to be auxiliary
234coordinate constructs of latitude and longitude if there is
235two-dimensional horizontal variation but the horizontal coordinates
236are not latitude and longitude. As for dimension constructs,
237auxiliary coordinate constructs of different field constructs are
238independent in the data model.
240<h2>Cell measure construct</h2>
242<font color="green">A cell measure construct provides information
243about the size, shape or location of the cells defined by an ordered
244list of one or more <b>domain axes</b> of the field.</font>
246A cell measure construct may contain
248<li>Properties to describe itself.
251and must contain
253<li>A <b>measure property</b>, which indicates which metric of the space
254it supplies e.g. cell areas.
256<li>A <b>units property</b> consistent with the measure property
257e.g. m2.
259<li>A numeric array of metric values <font color="green"> whose shape
260    is determined by the domain axes in the order listed, optionally
261    omitting any domain axes of size one.</font> The array must all be
262    of the same data type. It is assumed that the metric does not
263    depend on any of the domain axes of the field which are not
264    specified<font color="green">, along which the values are
265    implicitly propagated.</font>
268In CF-netCDF files, cell measures constructs correspond to variables
269named by the <tt>cell_measures</tt> attribute of the data variable.
270As for dimensions, cell measures constructs of different field
271constructs are independent in the data model.
273<h2>Cell methods construct</h2>
275The cell methods construct describes how the data values represent
276variation of the quantity within cells. It corresponds to
277the <tt>cell_methods</tt> attribute of the data variable in CF-netCDF
278files. It is an ordered list, because the methods specified are not
279necessarily commutative. Each entry of the list specifies either one
280or more dimensions, or a CF standard name (to describe variation with
281respect to a quantity which is not recorded as a dimension of the
282field), and a method e.g. <tt>mean</tt> (CF Appendix E). Special
283methods indicate climatological time processing.
285<h2>Transform constructs</h2>
287<font color="green">
288A transform construct defines a formula for transforming one group of
289dimension or auxiliary coordinates into another, consistent group of
290dimension or auxiliary coordinates for the same domain.<p>
292Either of these groups of coordinates may not exist, in which case it
293may be created by applying the transformation, inverting the formula
294if necessary.
297<font color="green">
298A transform also serves to connect consistent dimension coordinate and
299auxiliary coordinate constructs (which have an implied common
300transformation formula) and to provide coordinate system metadata
301(which would be used in any transformation) to a dimension coordinate
302or auxiliary coordinate construct.
306A transform construct contains
308<li>A <b>transform name</b> which indicates the nature of the
309    transformation and implies the formulae to be used. A CF-netCDF
310    file does not explicitly record the formulae; it depends on the
311    application software knowing what to do.
313<li>An unordered collection
314    of <b>terms</b> <font color="green">corresponding to the variables
315    of the transformation formula</font>. These variables may be
316    scalar parameters, pointers to dimension or auxiliary coordinate
317    constructs of the field construct, or pointers to other field
318    constructs. Each member of the collection has a particular role in
319    the formula <font color="green">, necessarily including all
320    existing coordinates which relate to this transformation</font>.
323<p>Transform constructs correspond to the functions of the CF-netCDF
325<tt>formula_terms</tt>, which describes how to compute a vertical coordinate
326variable from components (CF Appendix D),
327and <tt>grid_mapping</tt>, which describes how to transform between
328longitude-latitude field and the horizontal coordinates of the field construct
329(CF Appendix F).
330The transform name is the <tt>standard_name</tt> of a vertical coordinate
331variable with <tt>formula_terms</tt>, and the <tt>grid_mapping_name</tt>
332of a <tt>grid_mapping</tt> variable.
333The scalar parameters are scalar data variables (which should
334have <tt>units</tt> if dimensional) named by <tt>formula_terms</tt>,
335and attributes of <tt>grid_mapping</tt> variables
336(for which the units are specified by the transform construct).
337The role of each term in the formulae of the transform construct is
338identified by its keyword in a <tt>formula_terms</tt> attribute,
339or its attribute name in a <tt>grid_mapping</tt> variable.
341</p><h2>Other properties</h2>
343The other properties recognised by this CF data model correspond to attributes
344listed in CF Appendix A.
345For field constructs, the allowed properties are
356Some of these can be global attributes in a CF-netCDF file.
357In this data model, it is assumed that any relevant global attribute
358is also an
359attribute of every data variable, although it is superseded if the data
360variable has its own attribute.
361Each field construct in the model has its own independent set of properties.
362For dimensions and auxiliary coordinate constructs, the allowed properties are
372Coordinate constructs of time are optionally climatological;
373this property is indicated by the presence of the <tt>climatology</tt>
375In any field, any given value of the <tt>axis</tt> attribute can occur
376no more than once among all the dimension and auxiliary coordinates of
377that field.
378The CF data model allows field, dimension
379and auxiliary coordinate constructs
380to have other properties not defined by CF, provided they do not
381conflict with CF, but since they are not part of the
382CF standard, the data model does not provide any interpretation of them.
384<p>The attributes
386<tt>valid_min</tt> and
388of data variables and coordinate variables are checks on the validity of
389the values, which could be verified on input and written on output.
390In this CF data model we assume they do not constrain any manipulations
391which might be done on the data in memory,
392and they are not part of the data model.
394</p><p>The attributes
395<tt>_FillValue</tt> and
397of data variables specify how missing data is indicated in the data array.
398This data model supports the idea of missing data, but does not depend on
399any particular method of indicating it, so these attributes
400are not part of the model.
402</p><p>The attributes
407<tt>flag_values</tt> and
409are all used in methods of compressing the data to save space
410in CF-netCDF files,
411with or without loss of information.
412They are not part of this data model because these operations do not
413logically alter the data,
414except that the <tt>compress</tt> attribute implies two alternative
415interpretations of coordinates (compressed or uncompressed).
416The "feature type" attribute and associated new conventions,
417to be introduced in CF version 1.6,
418will provide a way of packing multiple
419fields of the same kind of discrete sampling geometry
420(timeseries, trajectories, etc.) into a single CF-netCDF data variable,
421in order to save space, since a multidimensional representation with
422common coordinate variables is typically very wasteful in such cases.
423This is a kind of compression. The data model would regard each instance
424of the feature type as an independent field construct.
425However, the "feature type" attribute itself is also a metadata property
426that would be a property of the field construct and part of the data model.
428</p><p>The attributes
435<tt>formula_terms</tt> and
437have various special or structural functions in the CF-netCDF file format.
438Their functions and
439the relationships they indicate are reflected in the structure
440of this data model,
441and these attributes do not correspond directly to
442properties in the data model.
44517th December 2012
446<br><a href="http://www.met.rdg.ac.uk/%7Edavid/cfdm_recast_0.6.html">Version 0.6 of 12th December 2012</a>
447<br><a href="http://www.met.rdg.ac.uk/%7Ejonathan/CF_metadata/cfdm_0.5.html">Version 0.5 of 16th October 2012</a>
448<br><a href="http://www.met.rdg.ac.uk/%7Ejonathan/CF_metadata/cfdm_0.4.html">Version 0.4 of 5th August 2012</a>
449<br><a href="http://www.met.rdg.ac.uk/%7Ejonathan/CF_metadata/cfdm_0.3.html">Version 0.3 of 6th February 2012</a>
450<br><a href="http://www.met.rdg.ac.uk/%7Ejonathan/CF_metadata/cfdm_0.2.html">Version 0.2 of 1st August 2011</a>
451<br><a href="http://www.met.rdg.ac.uk/%7Ejonathan/CF_metadata/cfdm_0.1.html">Original version 0.1 of 10th January 2011</a>
454<p><a href="http://www.met.reading.ac.uk/%7Ejonathan">Jonathan Gregory</a>,
455David Hassell and Mark Hedley