Custom Query (125 matches)

Filters
 
Or
 
  
 
Columns

Show under each result:


Results (55 - 57 of 125)

Ticket Resolution Summary Owner Reporter
#145 fixed Subconvention for associated files, proposed for use in CMIP6 davidhassell jonathan
Description

1 Title

Subconvention for associated files

2 Moderator

Balaji

3 Purposes

CMIP6, like CMIP5, will store cell_measures variables in a different file from the data variables to which they belong. This is to save storage space, but it is not legal in CF, which is a convention that so far applies only to individual self-contained netCDF files. To relax that restriction requires regarding two or more files as though they were a single dataset. Rules for aggregating files are needed. In this ticket a simple mechanism is proposed which is sufficient for CMIP6 to allow one file to refer to another file.

Note that the file referred to is not necessarily identified by name, because this is fragile and caused some difficulties in CMIP5. This proposal does not say exactly how the file should be found. A further convention specifically for CMIP6, and not part of CF, will be needed for that, and other users of this subconvention would similarly have to adopt their own rule.

4 Status quo and benefits

CMIP6 files will not be CF-conforming without this change. Legalising them is a benefit, and the mechanism will probably be useful in other situations. This proposal arose from email discussions involving Balaji, Jonathan, Steve Griffies and others in September 2014 during discussions about the Ocean Model Development Panel recommendations of diagnostics for CMIP6.

5 Detailed proposal

The proposal is to introduce a subconvention of CF i.e. conventions which are not part of CF, but intended to be used in combination with CF. It is proposed to insert the following text as a named (but not numbered) section of the CF standard document, before Appendix A. The title of the section will be Subconvention for associated files and the text is below. In addition to the following new section, in Table A.1 of Appendix A insert an entry for associated_files, type S, use G, link "Associated files subconvention", description "Indicates where files containing metadata variables can be found".

CF is a convention for individual netCDF files, which implies that if a data variable refers to another variable containing metadata, the variables must be in the same file. This subconvention provides a mechanism to allow data variables in one file to refer to metadata variables in another file or files. When this subconvention is used, the netCDF file containing the data variable should contain CF-n/associated-files in its global Conventions attribute, where n is the CF version number to which it conforms.

The optional global attribute associated_files of the file containing the data variable indicates where the files containing metadata variables can be found. This attribute is a string whose syntax is not standardised. For instance, it could the path to a file, a URL of a file, or a URL of a website where the required file could be found (thus requiring human intervention). Applications which use this subconvention may define their own rules for the syntax and the interpretation of the associated_files attribute.

The metadata variables to which this subconvention applies are those identified by the coordinates, formula_terms, grid_mapping and cell_measures attributes. These metadata variables are identified by name. The named variables may be stored in either the same file as the data variable which refers to them, as usual, or in other files, provided that

  • There is only one variable of that name in the data in any of the files concerned (the file containing the data variable and any of the associated files), so that the identification of the metadata variable is unambiguous.
  • If the metadata variable is in a different file from the data variable, its dimensions must have names which are also names of dimensions in the file containing the data variable, and these dimensions must have the same sizes as they do in that file. These rules are usual CF conventions when the metadata variable is in the same file as the data variable.

Example

A file containing a data variable:

dimensions:
  lat=73;
  lon=96;
  level=20;
variables:
  float temperature(level,lat,lon);
    temperature:cell_measures="area: areacell";
    temperature:standard_name="air_temperature";
    temperature:standard_name="degC";
// global attributes:
  :Conventions="CF-1.7/associated-files" ;
  :associated_files="http://some.web.site/areacell.nc";

In this example, the associated_files attribute gives the URL of this file, which contains a metadata variable:

dimensions:
  lat=73;
  lon=96;
variables:
  float areacell(lat,lon);
    areacell:units="m2";
// global attributes:
  :Conventions="CF-1.7" ;

The variable areacell would need to be in the same file as temperature according to standard CF. This subconvention allows it to be stored in a different file. It would be an error if there was a variable called areacell in both files, since it would be ambiguous which should be used. It would be an error if the latitude and longitude dimensions had names other than lat and lon, or different sizes e.g. lat=180, in the second file, because they must correspond to dimensions of the data variable in the first file.

#146 fixed Decisions by the CF conventions committee about management of versions of CF cf-conventions@… jonathan
Description

This ticket is to record some decisions taken in emails by the CF committee during June and July 2015 about the management of versions of the CF conventions. This ticket should remain open until the required changes to the rules and website have been made, and 1.7 is finished and published.

  • It was agreed to abolish provisional status. This means removing the three paragraphs of http://cfconventions.org/rules.html which mention provisional status, and inserting the following new text to replace them:

If the change, once implemented in the conventions, subsequently turns out to be materially flawed, meaning that data written following the convention could be somehow erroneous or ambiguous, a trac ticket should urgently be opened to discuss whether to revoke the change. If this is agreed by a majority of the committee, a new version of the conventions will be prepared immediately, with the second digit of the version number incremented, and will be recommended to be used instead of the flawed version. The flawed version will be deprecated by a statement in the standard document and the conformance document. However, any data written with the flawed version will not be invalidated, although it may be problematic for users. Errors or lack of clarity in wording, when the convention itself is not at fault, can be corrected by defect tickets on the usual schedule.

  • If a change has to be reversed, the simplest approach would be to prepare a new version minus the one offending change. Anyone could draft the required change in the ticket. The usual procedures in force at the time for producing a new version of the standards document would be followed.
  • The best way to make 1.7 is to get a complete AsciiDoc of 1.6 first. The AsciiDoc source will not have any provisional markup, just 1.6 without showing deletions and modifications, but we can keep the existing PDF with the markup for reference. Once the AsciiDoc is ready, we can proof-read 1.6 made from the new AsciiDoc source, and after that make 1.7 at last by implementing all the currently agreed tickets, without showing provisional status.
  • Appendix G (which will be Appendix Z in 1.7 and thereafter) should describe all the changes implemented in each version in a user-friendly way. This info should come from the tickets, so in future we should make sure that before a ticket is agreed it has a correct short (one or two sentences) description of its effect.
  • In future we aim to have a release six-monthly or less (provided new tickets have been agreed of course).
  • The work in AsciiDoc is all being done on GitHub. In future, we will not mark up the document to show changes, because GitHub will be able to show the differences between versions. When we have more experience with it, we should revisit the management of the document, including the possibility of adding a third digit to the document version number (for trivial changes), and of using GitHub issues instead of trac.
  • It was agreed to recognise the contribution to the CF convention made by individuals who have provided information management and support, namely John Graybeal (Stanford University), Kyle Halliday (LLNL), Matthew Harris (LLNL), David Hassell (University of Reading), Rosalyn Hatcher (University of Reading), Richard Hattersley (UK Met Office), Mark Hedley (UK Met Office), Velimir Mlaker (LLNL), Jeff Painter (LLNL), Alison Pamment (CEDA). We will maintain this list in addition to the separate lists of original and additional authors of the convention and contributors to the standard names table.

Jonathan Gregory, 14 Dec 2015

#31 fixed Proposal for standard attributes actual_min and actual_max cf-conventions@… jonblower
Description

Summary

It is very useful for data mining and visualization applications to know the minimum and maximum values of a particular variable in a NetCDF file, without needing to extract the entire variable and calculate this in the application. Here we propose a new pair of standard variables actual_min and actual_max that contain the min and max values of a variable.

Advantages

The proposed new attributes would prevent misuse of the valid_min and valid_max attributes, which are intended to be used to delimit a valid data range, but in fact are often used to denote the actual range of data in a variable. The latter (mis)use leads to incorrect assessment of missing values by tools.

Supports data mining: It is a quick operation to find, say, all those data files that contain temperature values above 30degC.

Supports visualization: Visualization tools can use the actual_min/max to generate a sensible colour scale range for displaying the contents of a file. (However see caveats below.)

In the context of aggregations (by NcML or otherwise), the actual_min/max could be easily calculated by taking the minimum/maximum of the attributes of the components of the aggregation.

Disadvantages

The new attributes represent redundant metadata and could be incorrectly generated or otherwise become inconsistent. Mitigation: Allow the CF-checker to check these attribute values if they are present in a file.

Caveats

These attribute values would not be correct for a subset of data from the file and so any data subsetting tools must be aware of this and recalculate or remove these attributes from any data product subset.

Data values outside the valid_range would not be counted in the actual_min/max. An alternative nomenclature could be actual_valid_min/max (although personally I find this more confusing).

For visualization, the actual_min/max will not always represent the optimal scale range, particularly if examining a restricted geographical area, or when looking at data from a particular elevation. A more sophisticated solution could involve expressing actual_min/max as an array quantity, with a value pair for each elevation in the data volume. This increases the complexity of the solution and places extra burdens on data providers and tool developers (and does not completly solve the problems described).

Note: See TracQuery for help on using queries.