Subconvention for associated files, proposed for use in CMIP6
|Reported by:||jonathan||Owned by:||davidhassell|
Subconvention for associated files
CMIP6, like CMIP5, will store cell_measures variables in a different file from the data variables to which they belong. This is to save storage space, but it is not legal in CF, which is a convention that so far applies only to individual self-contained netCDF files. To relax that restriction requires regarding two or more files as though they were a single dataset. Rules for aggregating files are needed. In this ticket a simple mechanism is proposed which is sufficient for CMIP6 to allow one file to refer to another file.
Note that the file referred to is not necessarily identified by name, because this is fragile and caused some difficulties in CMIP5. This proposal does not say exactly how the file should be found. A further convention specifically for CMIP6, and not part of CF, will be needed for that, and other users of this subconvention would similarly have to adopt their own rule.
4 Status quo and benefits
CMIP6 files will not be CF-conforming without this change. Legalising them is a benefit, and the mechanism will probably be useful in other situations. This proposal arose from email discussions involving Balaji, Jonathan, Steve Griffies and others in September 2014 during discussions about the Ocean Model Development Panel recommendations of diagnostics for CMIP6.
5 Detailed proposal
The proposal is to introduce a subconvention of CF i.e. conventions which are not part of CF, but intended to be used in combination with CF. It is proposed to insert the following text as a named (but not numbered) section of the CF standard document, before Appendix A. The title of the section will be Subconvention for associated files and the text is below. In addition to the following new section, in Table A.1 of Appendix A insert an entry for associated_files, type S, use G, link "Associated files subconvention", description "Indicates where files containing metadata variables can be found".
CF is a convention for individual netCDF files, which implies that if a data variable refers to another variable containing metadata, the variables must be in the same file. This subconvention provides a mechanism to allow data variables in one file to refer to metadata variables in another file or files. When this subconvention is used, the netCDF file containing the data variable should contain CF-n/associated-files in its global Conventions attribute, where n is the CF version number to which it conforms.
The optional global attribute associated_files of the file containing the data variable indicates where the files containing metadata variables can be found. This attribute is a string whose syntax is not standardised. For instance, it could the path to a file, a URL of a file, or a URL of a website where the required file could be found (thus requiring human intervention). Applications which use this subconvention may define their own rules for the syntax and the interpretation of the associated_files attribute.
The metadata variables to which this subconvention applies are those identified by the coordinates, formula_terms, grid_mapping and cell_measures attributes. These metadata variables are identified by name. The named variables may be stored in either the same file as the data variable which refers to them, as usual, or in other files, provided that
- There is only one variable of that name in the data in any of the files concerned (the file containing the data variable and any of the associated files), so that the identification of the metadata variable is unambiguous.
- If the metadata variable is in a different file from the data variable, its dimensions must have names which are also names of dimensions in the file containing the data variable, and these dimensions must have the same sizes as they do in that file. These rules are usual CF conventions when the metadata variable is in the same file as the data variable.
A file containing a data variable:
dimensions: lat=73; lon=96; level=20; variables: float temperature(level,lat,lon); temperature:cell_measures="area: areacell"; temperature:standard_name="air_temperature"; temperature:standard_name="degC"; // global attributes: :Conventions="CF-1.7/associated-files" ; :associated_files="http://some.web.site/areacell.nc";
In this example, the associated_files attribute gives the URL of this file, which contains a metadata variable:
dimensions: lat=73; lon=96; variables: float areacell(lat,lon); areacell:units="m2"; // global attributes: :Conventions="CF-1.7" ;
The variable areacell would need to be in the same file as temperature according to standard CF. This subconvention allows it to be stored in a different file. It would be an error if there was a variable called areacell in both files, since it would be ambiguous which should be used. It would be an error if the latitude and longitude dimensions had names other than lat and lon, or different sizes e.g. lat=180, in the second file, because they must correspond to dimensions of the data variable in the first file.
Change History (54)
comment:4 Changed 18 months ago by stevehankin
comment:7 Changed 17 months ago by stevehankin
comment:13 Changed 17 months ago by stevehankin
comment:23 in reply to: ↑ 21 Changed 17 months ago by caron
comment:53 Changed 3 months ago by davidhassell
- Owner changed from cf-conventions@… to davidhassell
- Status changed from new to accepted