CF data model and reference implementation in Python
|Reported by:||jonathan||Owned by:||cf-conventions@…|
In this ticket we do not propose any change to the CF standard. This ticket concerns an abstract model for data and metadata corresponding to the existing standard (version 1.5). As a netCDF convention, up to now the CF standard has not included a data model. However, the design of CF implies a data model to some extent. Following the discussion at the GO-ESSP meeting, we now propose that the data model should be made explicit, as an independent element of CF, separate from the CF standards and conformance documents, to be updated for successive CF versions in line with those documents. We consider that defining an explicit data model will contribute to the CF goal to help in "building applications with powerful extraction, regridding, and display capabilities."
We have drafted a document that describes the data model, with an associated UML diagram to illustrate it. The description follows from the one discussed earlier this year on the CF email list, which pointed out the need for a diagram. The proposed data model avoids prescribing more than is needed for interpreting CF as it stands, in order to avoid inconsistency with future developments of CF.
The document describes both the proposed CF data model and how it is implemented in netCDF. These are distinct purposes. The same data model could be implemented in other non-netCDF-like file formats, and that would require the description of the model and implementation to be separated. We have not done that in this version of the document because we think that it would make it harder to understand at this stage.
Following discussions on the email list and at GO-ESSP, we are aware that this attempt to describe the CF data model overlaps with other work on data models, especially the Unidata CDM. It will be useful to discuss the relationship between these. The proposed CF data model corresponds less closely to netCDF storage concepts than the CDM does, and in that sense it is more abstract.
We have also developed a minimal implementation of the data model in Python, including documentation. (Note: since putting this up, we've discovered that there is an existing package called cfpython, which is not related to CF. That is confusing, so we might have to change the name of ours.) The software reads and writes CF-netCDF files, and contains the data and metadata in memory in objects called spaces in a way which is consistent with the data model. It is possible to select a subset of the spaces according to their properties, to extract subspaces by specifying ranges of coordinates or indices along the dimensions, and to modify the metadata. We describe this implementation as "minimal" because it doesn't provide any processing or graphical functions, and it doesn't extend to the level of the scientific feature types of the CDM, for instance. This software might be useful:
- To illustrate the data model.
- As a reference implementation of CF. In all versions of CF so far published, all changes have been introduced and are marked as provisional, because of the requirement of two demonstrated implementations of new features before changes are accepted as permanent. This software could provide one implementation. The CF checker could be another. (The cf-python software attempts to interpret netCDF files by reference to the CF convention, but does not require or check complete compliance.)
- As a basis for data processing and graphical software in Python based on CF concepts. For this purpose, the API is the essence. Other code could be written which offered the same API to the CF data model. All Python code using the same API to the minimal CF data model would be interoperable at that level.
To be clear, we are not proposing this Python code as an element of CF. It could be useful to people dealing with CF-netCDF data, but this proposal is really about the CF data model, which we are proposing as an element of CF. The first of the above points is therefore the most important to this proposal.
We hope that people will consider this proposal. We will welcome comments on this ticket on both the data model and the Python API. (However, comments on the code itself would probably be better made by email, unless they are matters of principle.)
We are grateful to Bryan Lawrence and Dominic Lowe for very thoughtful discussions.
Jonathan Gregory (j.m.gregory at reading.ac.uk)
David Hassell (d.c.hassell at reading.ac.uk)