Opened 8 months ago

Last modified 6 months ago

#153 new enhancement

Requirements related to specific standard names

Reported by: martin.juckes Owned by: cf-conventions@…
Priority: medium Milestone:
Component: cf-conventions Version:
Keywords: Cc:

Description (last modified by martin.juckes)

A significant number of standard names contain, in their definitions, explicit specifications for additional required metadata. For instance, if the standard_name is region then there are constraints on the allowed values of the data variable. The standard name descriptions cannot include examples or markup, and the specification of the rules is not as clear as in the convention text. It also appears that the rules are not checked by the CF checker (at least not the few that I have looked at in detail) and I think the best way to get consistent checking would be to first create a well structured summary of these rules in the conventions document.

The specific proposal is add a new Appendix which lists the rules with examples where appropriate.

It will take some time to complete the list. I propose that we add a provisional list, after agreeing the format and approach, and work towards completion later.

Appendix D: Rules associated with standard names

Some standard names bring additional constraints on the meta-data and/or data values of the variables they are associated with. This appendix list such names, grouped according to the types of constraint, and provides usage examples where needed.

Required Coordinates

A common constraint involves the requirement that a particular coordinate or set of coordinates be present. The following table lists the rules and associated standard names. An explanation of each rule follows below.

Rule Description
Standard Name(s)
Required coordinate(s)
1 Area Fraction The fractional area in a cell covered by a particulate area type.
area_fraction
area_type
2 Lifted from Parameters defined in terms of lifting from a reference level
atmosphere_convective_available_potential_energy, atmosphere_convective_inhibition, atmosphere_level_of_free_convection, atmosphere_lifting_condensation_level
original_air_pressure_of_lifted_parcel
3 Lifting range Parameter defined in terms of lifting through a specified range
temperature_difference_between_ambient_air_and_air_lifted_adiabatically
original_air_pressure_of_lifted_parcel,final_air_pressure_of_lifted_parcel
4 Radiances For radiance variables a direction must be specified
downwelling_photosynthetic_photon_radiance_in_sea_water and others
zenith_angle
5 Reference state Variables which depend on reference air temperature and humidity
mass_concentration_of_pm_*_ambient_aerosol_in_air, mass_fraction_of_pm_*_ambient_aerosol_in_air
air_temperature, relative_humidity
6 Wavelength Functions of wavelength
*_per_unit_wavelength_in_air
radiation_wavelength

In all cases, the structure follows the same pattern, illustrated by the following examples for case 1. Area Fraction:

   float cropcover(lat,lon);
      cropcover:standard_name = 'area_fraction';
      cropcover:coordinates =  'crop';
      cropcover:units = '1';
   character crop(nchar);
      crop:standard_name = 'area_type';
   data:
      crop = 'crop';

Other rules

Regions and Area Types

If the standard name of a variable is region or area_type then the variables must either be of character type or use flag values to associate each element to a character string. The string values must be from the CF standard region and area type lists respectively.

Quantities representing a layer average or sum

Many "layer" quantities require vertical coordinates with bounds.

  • *_atmosphere_layer[_*];
  • *_ocean_layer[_*];
  • *_soil_layer[_*];

Variation of variables in sigma coordinates due to surface pressure change

change_in_energy_content_of_atmosphere_layer_due_to_change_in_sigma_coordinate_wrt_surface_pressure: must have a vertical coordinate variable (axis=Z).

   float deltae(sig);
      deltae:standard_name = 'change_in_energy_content_of_atmosphere_layer_due_to_change_in_sigma_coordinate_wrt_surface_pressure';
      deltae:units = 'J m-2';
   float sig(sig);
      sig:axis = 'Z';
      sig:standard_name = 'atmosphere_sigma_coordinate';
      sig:bounds = 'sig_bnds';
      sig:units = '1';
   float sig_bnds(2,sig);  # required because of _atmosphere_layer

Temporal change

Time rate of change or displacement over time require bounds on time coordinate:

  • change_over_time_*;
  • *_displacement;

Comments for discussion

In some cases the wording of standard_name definitions could be interpreted as a recommendation or suggestion rather than a requirement. If some of these are intended only as suggestions, that should be flagged.

Attachments (3)

CF_Standard_Name_Rules.xml (1.2 KB) - added by martin.juckes 6 months ago.
CF Standard Name Rules
CF_Standard_Name_Rules.json (1.2 KB) - added by martin.juckes 6 months ago.
CF Standard Name Rules Demo (in JSON)
CFStandardNameRules-1.1.xsd (4.8 KB) - added by martin.juckes 6 months ago.
CF Standard Name Rules Schema (based on CF Standard Name Schema)

Download all attachments as: .zip

Change History (16)

comment:1 Changed 8 months ago by martin.juckes

  • Description modified (diff)

comment:2 Changed 8 months ago by martin.juckes

  • Description modified (diff)
  • Summary changed from Requires related to specific standard names to Requirements related to specific standard names

comment:3 Changed 8 months ago by jonathan

Dear Martin

I think this is a good idea, thank you, and I agree with the proposal except that I suggest it would be better to have it as a separate document on the standard name page, like the guidelines, rather than as an appendix in the CF convention document. That is because

  • It relates to standard names only and does not affect the convention.
  • As a separate document, it could be updated more easily and frequently than the convention document, and it would make sense to update it with the standard name table.

The CF checker could consult it in either case. For use by the CF checker or other software, I suppose that this list of constraints should be made available in an form convenient for reading by programs. Ros's opinion would be useful.

Best wishes

Jonathan

comment:4 Changed 8 months ago by martin.juckes

Dear Jonathan,

I'm happy with that proposal .. perhaps with one extra line in the conventions document to say that "Use of some standard names introduces additional constraints on the variable attributes and/or values, as detailed in link to: Requirements Related to Specific Standard Names . "

I also agree on the need for a machine readable form. I was thinking that something would be needed to assist the proof reading. E.g. a JSON file which can be used to generate a spreadsheet displaying the definitions of all the variables listed under each constraint.

I think the more legible wiki form is also necessary, in order to provide the usage examples. In the earlier email discusion, Roy Lowry suggested encoding the rules in RDF and serving them through the NERC Vocab Server alongside the standard names. This would be neat, but I think it may be worth generating wiki and JSON versions first, in order to get a clearer view of the range of constraints that we are dealing with,

regards, Martin

comment:5 Changed 8 months ago by jonathan

Dear Martin

OK, good. I agree we also need a human-readable version - wiki and JSON would be fine.

Best wishes

Jonathan

comment:6 Changed 8 months ago by martin.juckes

Dear Jonathan,

On 2nd thoughts, however, in connection with updating "frequently and easily", we need to be careful about backward compatibility. E.g. if we introduce it in parallel with CF-1.7, files which were considered valid under CF-1.6 might become invalid. We want, I think, such files to continue to be considered as vaild under CF-1.6, hence the checker should not use this extension when checking against earlier convention versions. This differs from the policy with the standard name list, for which the latest is always used. This implies, I think, that this document would need to be clearly versioned in a way which makes the link to convention versions clear, eg. we might start with 1.7.00 and increment to 1.7.01 etc until the convention moves to 1.8.

I can see that we want flexibility to add rules about new standard names when the standard name table is updated, and this is far more frequent that convention updates. We need to be careful about dealing with rules for existing standard names which might have been overlooked. Once we have a 1.7.00 version, we should not change any rules for existing standard names until 1.8.00 is launched, though we could perhaps add advisory notes where appropriate.

Does this sound workable?

regards, Martin

comment:7 Changed 8 months ago by jonathan

Dear Martin

I appreciate your caution but I think we can be a bit more relaxed. This new document does not have any information which isn't already in the standard name table, so it may be regarded as an adjunct to that. Hence I think the new document should have the same version numbers as the standard name table, though it will probably not be updated every time. Although at the moment the CF checker doesn't verify that the constraints are satisfied, it could already have done so - it's a matter of implementation, not the convention. The constraints are not new. We're simply making it easier to check them. We aren't changing anything about the convention. We also have a choice to make about whether the checker would regard not meeting these constraints as an error (i.e. breaking a requirement) or bad practice (i.e. not respecting a recommendation).

Best wishes

Jonathan

comment:8 Changed 6 months ago by ros

Sorry, coming to this rather late.... Happy to have these rules listed in a separate document so long as it is easily readable by the CF Checker. With the standard-name and area-types tables being in XML that would obviously require the least amount of work, but not against another format if that would be more appropriate to fit other requirements.

The extra rules has obviously been overlooked and not made it into the CF Checker document. I'm happy to add these in in the next release.

Cheers, Ros.

comment:9 Changed 6 months ago by martin.juckes

Dear Ros,

OK, I think the rules could be expressed in a simple XML format, one element per rule. e.g. <rule target="area_fraction" requiredCoordinate="area_type"/> to specify that "area_fraction" must have and area_type variable as a coordinate. Where a variable has more than one required coordinate, I would list this in two separate XML elements. Other rules would be of the form <rule target="...." requiredAxis="Z"/>, to specify that a variable must have a dimension or coordinate with "axis=Z" <rule target="...." requiredBoundAxis="Z"/> to specify additional that the dimension or coordinate in question must have bounds set.

The region and area_type rules have a choice, which we could encode as follows:

<choice target="region">
  <option type="char"\>
  <option attribute="flag_values"\>
<\choice>
<rule target="region" dataValuesIn="CF Standard Region"\>

The data values referred to in the last case should be interpreted as th flag values if present. On the other hand, it may be easier to just have a named test for these two, rather than using a complex schema like this.

Cheers, Martin

comment:10 Changed 6 months ago by ros

Dear Martin,

I think that would work absolutely fine from my point of view. (I have just briefly looked at reading JSON, having no experience with it, into python which does look pretty easy if that was deemed to be more appropriate.)

One thing we would also need to do is put the Standardised Region List into a readable format as it currently only appears to exist on the website as an HTML list. I have just discovered that I did put the check of valid regions into the checker but it never got released - I'll include it in the next release.

Regards, Ros.

comment:11 Changed 6 months ago by martin.juckes

Dear Ros,

Loading json files is trivial ... a one line command and then you get a python object (e.g. a list or a dictionary). You then have to parse the object .. the advantage of XML is that there is a well tested approach to enforcing structure on the object, which, I find, tends to make parsing more reliable. I can easily export it as json as well, as I suspect that will make it more accessible to others. Thinking about defining the structure, it will be easier to have rules of the form: <rule targ="some_standard_name"><requiredAxis value="Z"></rule>.

The region/area_type rule could be encoded more simply as

<rule targ="region"><charOrFlagIn value="CF Standard Region"\><\rule>

This would then give 4 rules to encode: requiredCoordinate, requiredAxis, requiredBoundAxis, charOrFlagIn.

regards, Martin

comment:12 Changed 6 months ago by martin.juckes

Dear Ros,

I agree that we need a machine readable version of the Standardised Region List .. I'll start a new ticket for that,

regards, Martin

Changed 6 months ago by martin.juckes

CF Standard Name Rules

Changed 6 months ago by martin.juckes

CF Standard Name Rules Demo (in JSON)

Changed 6 months ago by martin.juckes

CF Standard Name Rules Schema (based on CF Standard Name Schema)

comment:13 Changed 6 months ago by martin.juckes

Hello Ros,

After looking at the schema of the standard name list, I've adapted that for the rules, with typical entries of the form:

<rule id="moisture_content_of_soil_layer.boundAxis">
      <description>The soil layer must be described by a bounds attribute on a vertical coordinate.</description>
      <target>moisture_content_of_soil_layer</target>
      <requiredBoundAxis>Z</requiredBoundAxis>
   </rule>

The "id" has to be unique, so may need to be extended beyond the standard name which the rule is intended to apply to. Ideally, the target standard name should be constrained by the schema to be in the CF list, but I haven't implemented that yet. attachment:CF_Standard_Name_Rules.xml is a demo XML document with 3 rules, and attachment:CF_Standard_Name_Rules.json is the same in JSON. The schema for the XML is attachment:CFStandardNameRules-1.1.xsd.

This approach makes it easier to impose the schema rules on the names of the rules and the associated restrictions on the values (e.g. "requiredBoundAxis" should take a value "X", "Y", ...).

cheers, Martin

Note: See TracTickets for help on using tickets.