Opened 23 months ago

Last modified 44 hours ago

#140 new enhancement

Clarifying the role of attributes on boundary variables.

Reported by: davidhassell Owned by: cf-conventions@…
Priority: medium Milestone:
Component: cf-conventions Version:
Keywords: boundary variable, attribute Cc:

Description

1. Title

Clarifying the role of attributes on boundary variables.

2. Moderator

TBC (any offer will be gladly accepted).

3. Requirement

To disallow inconsistencies between particular attributes of a boundary variable and its associated coordinate or auxiliary coordinate variable.

For example, it is currently possible for a boundary variable to have a different standard_name attribute to its associated coordinate or auxiliary coordinate variable. This would be unsatisfactory because the user of the data cannot know which of the possibilities is correct.

It is proposed that if a boundary variable has attributes which determine the coordinate type (units, standard_name, axis and positive) or those which affect the interpretation of the boundary variable array values (units, calendar, leap_month, leap_year and month_lengths) then they must always agree exactly with the same attributes of its associated coordinate or auxiliary coordinate variable. In addition, it is recommended that these attributes are not provided to a boundary variable since they are already inherited implicitly.

No restriction is made on any other boundary variable attributes.

This does not affect datasets encoded with previous versions of CF

4. Initial Statement of Technical Proposal

The following changes should be made to section 7.1. Cell Boundaries (additions marked by TEXT, deletions by TEXT):

To represent cells we add the attribute bounds to the appropriate coordinate variable(s). The value of bounds is the name of the variable that contains the vertices of the cell boundaries. We refer to this type of variable as a "boundary variable". A boundary variable will have one more dimension than its associated coordinate or auxiliary coordinate variable. The additional dimension should be the most rapidly varying one, and its size is the maximum number of cell vertices. Since a boundary variable is considered to be part of a coordinate variable's metadata, it is not necessary to provide it with attributes (such as long_name and units). and providing no attributes is always acceptable. Boundary variable attributes which determine the coordinate type (units, standard_name, axis and positive) or those which affect the interpretation of the array values (units, calendar, leap_month, leap_year and month_lengths) must always agree exactly with the same attributes of its associated coordinate or auxiliary coordinate variable. To avoid duplication, however, it is recommended that the attributes units, standard_name, axis, positive, calendar, leap_month, leap_year and month_lengths are not provided to a boundary variable.

In section 7.1 Cell Boundaries of the conformance document (additions marked by TEXT, deletions by TEXT):

  • Requirements:
    • The type of the bounds attribute is a string whose value is a single variable name. The specified variable must exist in the file.
    • A boundary variable must have the same dimensions as its associated variable, plus have a trailing dimension (CDL order) for the maximum number of vertices in a cell.
    • A boundary variable must be a numeric data type.
    • If a boundary variable has units or standard_name attributes, they must agree with those of its associated variable. units, standard_name, axis, positive, calendar, leap_month, leap_year and month_lengths attributes, they must agree exactly with those of its associated variable.
  • Recommendations:
    • The points specified by a coordinate or auxiliary coordinate variable should lie within, or on the boundary, of the cells specified by the associated boundary variable.
    • Boundary variables should not have the _FillValue or missing_value _FillValue, missing_value, units, standard_name, axis, positive, calendar, leap_month, leap_year or month_lengths attributes.

5. Benefits

It would be disallowed to encode type-determining attributes (units, calendar, standard_name, axis and positive) or array value interpretation attirbutes (units, calendar, leap_month, leap_year and month_lengths) on a boundary variable if they conflict with the associated coordinate or auxiliary coordinate variable.

6. Status Quo

Attributes on a boundary variable may conflict with the associated coordinate or auxiliary coordinate variable, and this is not always checked by the CF checker.

This proposal does not affect datasets encoded under previous versions of CF, other than via the potential for extra warnings being raised by the CF checker.

David Hassell

Change History (9)

comment:1 Changed 23 months ago by jonathan

Thank you for making this proposal, which I support.

Jonathan

comment:2 Changed 3 days ago by taylor13

I support this proposal with the caveat that if we allow formula_terms on parametric coordinate *bounds* (as I've advocated in ticket #147, then we might want to include some mention here that the formula_terms attached to the bounds should be consistent with the formula_terms attached to the parametric coordinate variable itself. By "consistent" I mean that the same parameters must be defined (but of course the parameter values will be stored in different variables from the parameters of the coordinates themselves).

thanks, David, for proposing this change.

best regards, Karl

comment:3 Changed 3 days ago by davidhassell

Karl,

I agree with your note on formula_terms. I would go further to say that if the parent coordinate variable also has formula_terms which refers to a variable with bounds then those bounds must be referred to by same parameter of the bounds' formula_terms.

I'll draft some text to add to the section 7.1 and conformance changes proposed above...

Thanks, David

comment:4 Changed 3 days ago by davidhassell

Proposed first paragraph of section 7.1. Cell Boundaries (original additions marked by TEXT, deletions by TEXT, new additions in TEXT ):

To represent cells we add the attribute bounds to the appropriate coordinate variable(s). The value of bounds is the name of the variable that contains the vertices of the cell boundaries. We refer to this type of variable as a "boundary variable". A boundary variable will have one more dimension than its associated coordinate or auxiliary coordinate variable. The additional dimension should be the most rapidly varying one, and its size is the maximum number of cell vertices. Since a boundary variable is considered to be part of a coordinate variable's metadata, it is not necessary to provide it with attributes (such as long_name and units). and providing no attributes is always acceptable. Boundary variable attributes which determine the coordinate type (units, standard_name, axis and positive) or those which affect the interpretation of the array values (units, calendar, leap_month, leap_year and month_lengths) must always agree exactly with the same attributes of its associated coordinate or auxiliary coordinate variable. To avoid duplication, however, it is recommended that the attributes units, standard_name, axis, positive, calendar, leap_month, leap_year and month_lengths are not provided to a boundary variable. If the associated variable is a parametric coordinate variable with a formula_terms attribute (ref section 4.3.2) then two cases are possible

1) if the boundary variable also has a formula_terms attribute then its terms must be the same as those for the parametric coordinate variable, but with different variables named as term values and using, wherever possible, the boundary variables of variables named by parametric coordinate variable's formula_terms

2) if the boundary variable does not have a formula_terms attribute then it is assumed that the formula_terms of the parametric coordinate variable applies, substituting a named variable with its boundary variable, wherever possible

We'll also need some extra changes to the conformance document ...

comment:5 Changed 3 days ago by taylor13

Just to note: This would be much less complicated if we decide to reject Jonathan's alternative under ticket #147. Then the new text would read:

If the boundary variable is associated with a parametric coordinate variable and both the coordinate variable and the boundary variable have formula_terms (ref section 4.3.2), then the terms in the formula definition must be the same for the coordinates and its bounds, but with different parametric variable names specified for any terms in the definition that depend on the vertical coordinate.

comment:6 Changed 2 days ago by davidhassell

Hi Karl,

I like this approach, but I think we can retain your clarity whilst retaining Jonathan's alternative:

If the boundary variable is associated with a parametric coordinate variable then it assumed that the formula definition of the parametric coordinate variable also applies to the bounds. The term values are the same except when the named variable depends on the vertical coordinate, in which case the named variable is substituted with its boundary variable, if it exists. Note that a formula_terms attribute may also be provided on a boundary variable provided it adheres to these restrictions.

comment:7 Changed 2 days ago by jonathan

Dear Karl and David

I don't understand why Karl thinks the 1D formula terms (things like sigma values) are not anything like coordinate data. I have the same view as David that they do contain something like coordinate data, even though they're not coordinates by themselves. Evidently they do have bounds; in Karl's preferred arrangement (David's second case in comment 21 of ticket 147), hybrid_sigma:formula_terms points to A_bounds and B_bounds. If you don't call these boundary variables, what are they? If they are boundary variables, why not point to them with a bounds attribute? However since we've said all these things already, and know each other's point of view, it must be some philosophical disagreement. We'll have to arrange a conference about it sometime!

There is an important advantage in Karl's arrangement that you don't have to work out the identities of the formula terms for the bounds, since there's a formula_terms attribute to tell you them explicitly. What if we make it mandatory for the bounds variable of a parametric vertical coordinate to have a formula_terms attribute? This would be a backward-incompatible change, in the sense that data that was compliant with earlier versions of CF might not be compliant with the new version.

That would simplify the text here. Starting from the bold bit, we would have

If a parametric coordinate variable with a formula_terms attribute (ref section 4.3.2) also has a bounds attribute, its boundary variable must have a formula_terms attribute too. Because the same standard_name must describe both variables, the formula must have the same terms (as specified in Appendix D), but a different variable must be named by the two formula_terms attributes for any term which depends on the vertical dimension, because the boundary variables have one more dimension.

Then my preferred arrangement can be permitted by further text

The boundary variables for these formula terms may also be identified by bounds attributes of the formula terms variables. In that case, the formula_terms of the boundary variable and the bounds of the formula terms variables must be consistent.

So this permits David's case 2 and the case 3 I wrote down in ticket 147, but not David's case 1, which Karl doesn't like. In Martin's list in comment 20 of ticket 147, I would advocate option 4 - do nothing. We always permit non-standardised attributes in CF. The formula_terms attribute used other than for variables containing coordinate data (in the broad sense in which David and I interpret it) doesn't mean anything to CF, but it's allowed. It may have a meaning to the data-writer. Of course, it might be a mistake as well, but we don't police such mistakes. We have no general prohibition of or recommendation against using attributes from Appendix A in situations where CF doesn't describe their use.

Best wishes

Jonathan

comment:8 Changed 45 hours ago by davidhassell

Hello Karl, Jonathan,

Allowing the term values which span the vertical dimension to not have a bounds attribute would certainly make writing software harder, as the software would have to work out that a formula terms named variable is associated with a boundary variable and then make that connection explicit.

Running with Jonathan's idea of insisting that the boundary variable has a formula_terms attribute, I would take it further and insist that term values which span the vertical dimension must have a bounds attribute which points to the appropriate variable named in the boundary variable's formula_terms for its boundary variable. This is also a backwards-incompatible change:

If a parametric coordinate variable with a formula_terms attribute (ref section 4.3.2) also has a bounds attribute, its boundary variable must have a formula_terms attribute too. Because the same standard_name must describe both variables, the formula must have the same terms (as specified in Appendix D), but a different variable must be named by the two formula_terms attributes for any term which depends on the vertical dimension, because the boundary variables have one more dimension. For these terms, the boundary variable's formula_terms must name the bounds of the variables named by the vertical coordinate variable's formula_terms.

That said, I like to think that we can find some non-confusing wording which allows my case 1, and so no backward-incompatible changes would be necessary.

All the best,

David

comment:9 Changed 44 hours ago by taylor13

Hi David,

Could you expand on why you think software will want to extract the so-called "bounds" values for variables appearing in formula_terms along with the values themselves? I would have thought that for parametric coordinates you would want to primarily associate formula terms with the coordinate values they are used to transform. So for the coordinates themselves you would associate the parameter values in the formula_terms that is attached to the parametric coordinate. For the *bounds* on that coordinate you would associate the parameter values in the formula_terms attached to the parametric coordinate's bounds.

Why is there any need to associate the parameter values used for coordinate bound transformations with the parameter values used for coordinate transformations. I should think these two sets of parameter values will invariably be used independently. I suppose one might want to put into a container all the coordinate and bound information, but I don't think you would ever put the coordinate information and the coordinate and bounds parameters together without also including the coordinate bounds themselves. If this is the case then your code could easily construct such a container without a "bounds" attribute attached to the parameter variables.

I'm sorry if I'm a bit slow on this, but you seem to have a specific use case where "working out" needed relationships is difficult. Could you describe it in a bit more detail? This could help us reach consensus.

Sorry this seems to be taking up your valuable time, but I assure you if there is a compelling use case, then I'll favor including Jonathan's alternative.

best regards, Karl

Note: See TracTickets for help on using tickets.