Opened 3 years ago

Last modified 3 years ago

#151 new enhancement

Clarification of use of standard region names in "region" variables. — at Version 5

Reported by: martin.juckes Owned by: cf-conventions@…
Priority: medium Milestone:
Component: cf-conventions Version:
Keywords: Cc:

Description (last modified by martin.juckes)

The CF standard name region has the current description "A variable with the standard name of region contains strings which indicate geographical regions. These strings must be chosen from the standard region list." This description implies that the variable should be of character type, but it is often more convenient to have an integer variable and make a clear link to the region names using flag_values and flag_meanings. The proposal is to clarify the definition so that either usage is acceptable and include an example of the latter usage in the convention text. It is also proposed that an appendix be added to the CF Convention text to state clearly any constraints on file meta-data which are implied by the CF Standard Name definitions, so that it is possible to test such constraints in the CF checker.

New descriptions for CF standard names

region

A variable with the standard_name of region contains strings which indicate a geographical region or integers which can be translated to strings using flag_values and flag_meanings attributes. These strings are standardised. Values must be taken from the CF standard region list.

area_type

A variable with the standard_name of area_type contains strings which indicate the nature of the surface e.g. land, sea, sea_ice, or integers which can be translated to strings using flag_values and flag_meanings attributes. These strings are standardised. Values must be taken from the area_type table.

New usage example in CF Convention text

The following should be placed at the end of 6.1.1, after example 6.2

A variable with standard name of region, area_type or any other standard name which requires string-valued values from a defined list may alternatively be of integer type and use flag_values and flag_meanings attributes to record the translation between the integers and the string values, for instance:

int basin(lat, lon);
       standard_name: region;
       flag_values: 1, 2, 3;
       flag_meanings:'atlantic_arctic_ocean indo_pacific_ocean global_ocean';
......
values::
   basin: 1, 1, 1, 1, 2, ..... 

New Appendix Section

Change "Appendix B: Standard Name Table Format" to:

Appendix B: Standard Names

B.1: Standard Name Table Format

.....

and

B.2 Constraints for specific standard names

B.2.1: region

Variables with standard name region must be one of:

  • type character, with values taken from the CF standard region list;
  • type integer, with flag_values and flag_meanings attributes. The flag_meanings attributes must be a space separatd list of values from the CF standard region list (see example 6.2).

Variables with standard name area_type must be one of:

  • type character, with values taken from the area type table;
  • type integer, with flag_values and flag_meanings attributes. The flag_meanings attributes must be a space separatd list of values from the area type table (analogous to example 6.2).

Change History (5)

comment:1 Changed 3 years ago by jonathan

Dear Martin

Thanks for making this proposal. As you know I agree with the principle. I'd like to generalise it to other similar cases, so I would suggest modifying your text

A variable with standard name of region may also be of integer type and use flag_values and flag_meanings attributes to express the relationship between the integers and the region names:

to

A variable with standard name of region, area_type or any other standard name which requires string-valued values from a defined list may alternatively be of integer type and use flag_values and flag_meanings attributes to record the translation between the integers and the string values, for instance:

and then give your example as it is. (I think "translate" is more explicit than "relationship" but you may disagree!) This also requires a modified definition for area_type:

A variable with the standard name of area_type contains strings which indicate the nature of the surface e.g. land, sea, sea_ice, or integers which can be translated to strings using flag_values and flag_meanings attributes. These strings are standardised. Values must be taken from the area_type table.

I'm not convinced about modifying Appendix B. I feel that it should be adequate to note the constraints for specific standard names in the table itself. We could also make a note about the existence of constraints on the standard name page. If we were to make a separate list of them, it should be comprehensive. For instance, there are a number which expect or require particular coordinates variables to exist.

Best wishes

Jonathan

comment:2 Changed 3 years ago by martin.juckes

  • Description modified (diff)

comment:3 Changed 3 years ago by martin.juckes

Dear Jonathan,

thanks .. I've added your generalisation and reworded the suggested decsription for region to match your wording for area_type.

I've also modified basin in the example to be a lat/lon field, following a comment from Karl: in CMIP5 and CMIP6 basin(basin) is a character array used as a dimension, while basin(lat, lon) is an integer array. Aligning the example cleanly with CMIP usage should make it clearer.

On the suggested Appendix: this could be separated off, as the other modifications don't rely on it and, as you say, it would make sense to make a complete list of relevant rules before adding it. I included it because I have the impression that rules which are only recorded in CF standard name descriptions are not picked up in the conformance document or the checker. The suggested Appendix may not be the best way of addressing this problem, but I think it is worth having a paragraph in the convention text about constraints which are expressed in the standard name descriptions. It may be enough to ensure that there are explicit examples for each type of constraint (such as the one proposed above) with relevant standard names listed. A sentence could also be appended to the paragraph about description in section 3.3: The description may define rules on the variable type and attributes (see for example section 6.1.1) which must be complied with by any variable carrying that standard name.

Regards, Martin

comment:4 Changed 3 years ago by jonathan

Dear Martin

Thanks very much. Seeing the change you have made for consistency with CMIP, I realise that this new text is probably not in the right place in the document. Sorry I didn't realise this before. Sect 6 is about coordinates. When basin is an auxiliary coordinate variable, we don't need the flag methods; there is a single dimension with basin names as labels. The example and your concern is about the case when a data variable contains regions or area_types. Therefore I would now suggest that the new text and the example should be at the end of Sect 5.5 instead, or should form a new short Sect 8.3 about string-valued data variables (since this mechanism is a kind of packing), or maybe there's a better place for them - but probably not in Sect 6. What do you think?

I appreciate your point about checking of constraints on data variables with particular standard names. I agree it would be good to note this in Sect 3.3, and a corresponding sentence could be inserted in the conformance document for Sect 3.3. I think that would be a better way than splitting Appendix B. I don't know actually what the cf-checker currently does about this or what it could do, but it would be useful to make the point explicitly.

Best wishes

Jonathan

comment:5 Changed 3 years ago by martin.juckes

  • Description modified (diff)

Dear Jonathan,

In the draft of CF-1.7 section 6 is "Labels and Alternative Coordinates" and 6.1 is "Labels", which looks suitable to me. Example 6.2 has a region variable as a coordinate, but the text is about how to encode geographical regions in a variable. I can't see how this fits into section 5 ("Coordinates"). Am I missing something here?

On the cf-checker: a NetCDF file with a variable:

float basin(index):
  standard_name: region

is passed by the checker as valid, with a warning for the absence of a units attribute on the variable. If the variable is defined as in the example above and invalid region names are used, this is also passed (I've updated the example to change flag_values from a string, which the checker does not allow, to a list of integers). So these details are not currently checked.

regards. Martin

Note: See TracTickets for help on using tickets.