Opened 2 years ago

Last modified 5 weeks ago

## #140 new enhancement

# Clarifying the role of attributes on boundary variables.

Reported by: | davidhassell | Owned by: | cf-conventions@… |
---|---|---|---|

Priority: | medium | Milestone: | |

Component: | cf-conventions | Version: | |

Keywords: | boundary variable, attribute | Cc: |

### Description

## 1. Title

Clarifying the role of attributes on boundary variables.

## 2. Moderator

TBC (any offer will be gladly accepted).

## 3. Requirement

To disallow inconsistencies between particular attributes of a boundary variable and its associated coordinate or auxiliary coordinate variable.

For example, it is currently possible for a boundary variable to have a different

standard_nameattribute to its associated coordinate or auxiliary coordinate variable. This would be unsatisfactory because the user of the data cannot know which of the possibilities is correct.

It is proposed that if a boundary variable has attributes which determine the coordinate type (

units,standard_name,axisandpositive) or those which affect the interpretation of the boundary variable array values (units,calendar,leap_month,leap_yearandmonth_lengths) then they must always agree exactly with the same attributes of its associated coordinate or auxiliary coordinate variable. In addition, it is recommended that these attributes are not provided to a boundary variable since they are already inherited implicitly.

No restriction is made on any other boundary variable attributes.

This does not affect datasets encoded with previous versions of CF

## 4. Initial Statement of Technical Proposal

The following changes should be made to section 7.1. Cell Boundaries (additions marked by

TEXT, deletions by~~TEXT~~):

To represent cells we add the attribute bounds to the appropriate coordinate variable(s). The value of bounds is the name of the variable that contains the vertices of the cell boundaries. We refer to this type of variable as a "boundary variable". A boundary variable will have one more dimension than its associated coordinate or auxiliary coordinate variable. The additional dimension should be the most rapidly varying one, and its size is the maximum number of cell vertices. Since a boundary variable is considered to be part of a coordinate variable's metadata, it is not necessary to provide it with attributes (such as

long_nameandunits)~~.~~and providing no attributes is always acceptable. Boundary variable attributes which determine the coordinate type (units,standard_name,axisandpositive) or those which affect the interpretation of the array values (units,calendar,leap_month,leap_yearandmonth_lengths) must always agree exactly with the same attributes of its associated coordinate or auxiliary coordinate variable. To avoid duplication, however, it is recommended that the attributesunits,standard_name,axis,positive,calendar,leap_month,leap_yearandmonth_lengthsare not provided to a boundary variable.

In section 7.1 Cell Boundaries of the conformance document (additions marked by

TEXT, deletions by~~TEXT~~):

- Requirements:
- The type of the bounds attribute is a string whose value is a single variable name. The specified variable must exist in the file.
- A boundary variable must have the same dimensions as its associated variable, plus have a trailing dimension (CDL order) for the maximum number of vertices in a cell.
- A boundary variable must be a numeric data type.
- If a boundary variable has
`units`or`standard_name`attributes, they must agree with those of its associated variable.`units`,`standard_name`,`axis`,`positive`,`calendar`,`leap_month`,`leap_year`and`month_lengths`attributes, they must agree exactly with those of its associated variable.

- Recommendations:
- The points specified by a coordinate or auxiliary coordinate variable should lie within, or on the boundary, of the cells specified by the associated boundary variable.
- Boundary variables should not have the
`_FillValue`or`missing_value`attributes.`_FillValue`,`missing_value`,`units`,`standard_name`,`axis`,`positive`,`calendar`,`leap_month`,`leap_year`or`month_lengths`

## 5. Benefits

It would be disallowed to encode type-determining attributes (

units,calendar,standard_name,axisandpositive) or array value interpretation attirbutes (units,calendar,leap_month,leap_yearandmonth_lengths) on a boundary variable if they conflict with the associated coordinate or auxiliary coordinate variable.

## 6. Status Quo

Attributes on a boundary variable may conflict with the associated coordinate or auxiliary coordinate variable, and this is not always checked by the CF checker.

This proposal does not affect datasets encoded under previous versions of CF, other than via the potential for extra warnings being raised by the CF checker.

David Hassell

### Change History (9)

### comment:1 Changed 2 years ago by jonathan

### comment:2 Changed 5 weeks ago by taylor13

I support this proposal with the caveat that if we allow formula_terms on parametric coordinate *bounds* (as I've advocated in ticket #147, then we might want to include some mention here that the formula_terms attached to the bounds should be consistent with the formula_terms attached to the parametric coordinate variable itself. By "consistent" I mean that the same parameters must be defined (but of course the parameter values will be stored in different variables from the parameters of the coordinates themselves).

thanks, David, for proposing this change.

best regards, Karl

### comment:3 Changed 5 weeks ago by davidhassell

Karl,

I agree with your note on `formula_terms`. I would go further to say that if the parent coordinate variable also has `formula_terms` which refers to a variable with bounds then those bounds must be referred to by same parameter of the bounds' `formula_terms`.

I'll draft some text to add to the section 7.1 and conformance changes proposed above...

Thanks, David

### comment:4 Changed 5 weeks ago by davidhassell

Proposed first paragraph of section 7.1. Cell Boundaries (original additions marked by *TEXT*, deletions by ~~TEXT~~, new additions in * TEXT *):

To represent cells we add the attribute bounds to the appropriate coordinate variable(s). The value of bounds is the name of the variable that contains the vertices of the cell boundaries. We refer to this type of variable as a "boundary variable". A boundary variable will have one more dimension than its associated coordinate or auxiliary coordinate variable. The additional dimension should be the most rapidly varying one, and its size is the maximum number of cell vertices. Since a boundary variable is considered to be part of a coordinate variable's metadata, it is not necessary to provide it with attributes (such as

long_nameandunits)~~.~~and providing no attributes is always acceptable. Boundary variable attributes which determine the coordinate type (units,standard_name,axisandpositive) or those which affect the interpretation of the array values (units,calendar,leap_month,leap_yearandmonth_lengths) must always agree exactly with the same attributes of its associated coordinate or auxiliary coordinate variable. To avoid duplication, however, it is recommended that the attributesunits,standard_name,axis,positive,calendar,leap_month,leap_yearandmonth_lengthsare not provided to a boundary variable.If the associated variable is a parametric coordinate variable with aformula_termsattribute (ref section 4.3.2) then two cases are possible

1) if the boundary variable also has aformula_termsattribute then its terms must be the same as those for the parametric coordinate variable, but with different variables named as term values and using, wherever possible, the boundary variables of variables named by parametric coordinate variable'sformula_terms

2) if the boundary variable does not have aformula_termsattribute then it is assumed that theformula_termsof the parametric coordinate variable applies, substituting a named variable with its boundary variable, wherever possible

We'll also need some extra changes to the conformance document ...

### comment:5 Changed 5 weeks ago by taylor13

Just to note: This would be much less complicated if we decide to reject Jonathan's alternative under ticket #147. Then the new text would read:

If the boundary variable is associated with a parametric coordinate variable and both the coordinate variable and the boundary variable have formula_terms (ref section 4.3.2), then the terms in the formula definition must be the same for the coordinates and its bounds, but with different parametric variable names specified for any terms in the definition that depend on the vertical coordinate.

### comment:6 Changed 5 weeks ago by davidhassell

Hi Karl,

I like this approach, but I think we can retain your clarity whilst retaining Jonathan's alternative:

If the boundary variable is associated with a parametric coordinate variable then it assumed that the formula definition of the parametric coordinate variable also applies to the bounds. The term values are the same except when the named variable depends on the vertical coordinate, in which case the named variable is substituted with its boundary variable, if it exists. Note that a

formula_termsattribute may also be provided on a boundary variable provided it adheres to these restrictions.

### comment:7 Changed 5 weeks ago by jonathan

Dear Karl and David

I don't understand why Karl thinks the 1D formula terms (things like sigma values) are not anything like coordinate data. I have the same view as David that they do contain something like coordinate data, even though they're not coordinates by themselves. Evidently they *do* have bounds; in Karl's preferred arrangement (David's second case in comment 21 of ticket 147), `hybrid_sigma:formula_terms` points to `A_bounds` and `B_bounds`. If you don't call these boundary variables, what are they? If they are boundary variables, why not point to them with a `bounds` attribute? However since we've said all these things already, and know each other's point of view, it must be some philosophical disagreement. We'll have to arrange a conference about it sometime!

There is an important advantage in Karl's arrangement that you don't have to work out the identities of the formula terms for the bounds, since there's a `formula_terms` attribute to tell you them explicitly. What if we make it *mandatory* for the bounds variable of a parametric vertical coordinate to have a `formula_terms` attribute? This would be a backward-incompatible change, in the sense that data that was compliant with earlier versions of CF might not be compliant with the new version.

That would simplify the text here. Starting from the bold bit, we would have

If a parametric coordinate variable with a

formula_termsattribute (ref section 4.3.2) also has aboundsattribute, its boundary variable must have aformula_termsattribute too. Because the samestandard_namemust describe both variables, the formula must have the same terms (as specified in Appendix D), but a different variable must be named by the twoformula_termsattributes for any term which depends on the vertical dimension, because the boundary variables have one more dimension.

Then my preferred arrangement can be permitted by further text

The boundary variables for these formula terms may also be identified by

boundsattributes of the formula terms variables. In that case, theformula_termsof the boundary variable and theboundsof the formula terms variables must be consistent.

So this permits David's case 2 and the case 3 I wrote down in ticket 147, but not David's case 1, which Karl doesn't like. In Martin's list in comment 20 of ticket 147, I would advocate option 4 - do nothing. We always permit non-standardised attributes in CF. The `formula_terms` attribute used other than for variables containing coordinate data (in the broad sense in which David and I interpret it) doesn't mean anything to CF, but it's allowed. It may have a meaning to the data-writer. Of course, it might be a mistake as well, but we don't police such mistakes. We have no general prohibition of or recommendation against using attributes from Appendix A in situations where CF doesn't describe their use.

Best wishes

Jonathan

### comment:8 Changed 5 weeks ago by davidhassell

Hello Karl, Jonathan,

Allowing the term values which span the vertical dimension to *not* have a `bounds` attribute would certainly make writing software harder, as the software would have to work out that a formula terms named variable is associated with a boundary variable and then make that connection explicit.

Running with Jonathan's idea of insisting that the boundary variable has a `formula_terms` attribute, I would take it further and insist that term values which span the vertical dimension *must* have a `bounds` attribute which points to the appropriate variable named in the boundary variable's `formula_terms` for its boundary variable. This is also a backwards-incompatible change:

If a parametric coordinate variable with a formula_terms attribute (ref section 4.3.2) also has a bounds attribute, its boundary variable must have a

formula_termsattribute too. Because the same standard_name must describe both variables, the formula must have the same terms (as specified in Appendix D), but a different variable must be named by the twoformula_termsattributes for any term which depends on the vertical dimension, because the boundary variables have one more dimension. For these terms, the boundary variable'sformula_termsmust name the bounds of the variables named by the vertical coordinate variable'sformula_terms.

That said, I like to think that we can find some non-confusing wording which allows my case 1, and so no backward-incompatible changes would be necessary.

All the best,

David

### comment:9 Changed 5 weeks ago by taylor13

Hi David,

Could you expand on why you think software will want to extract the so-called "bounds" values for variables appearing in formula_terms along with the values themselves? I would have thought that for parametric coordinates you would want to primarily associate formula terms with the coordinate values they are used to transform. So for the coordinates themselves you would associate the parameter values in the formula_terms that is attached to the parametric coordinate. For the *bounds* on that coordinate you would associate the parameter values in the formula_terms attached to the parametric coordinate's bounds.

Why is there any need to associate the parameter values used for coordinate bound transformations with the parameter values used for coordinate transformations. I should think these two sets of parameter values will invariably be used independently. I suppose one might want to put into a container all the coordinate and bound information, but I don't think you would ever put the coordinate information and the coordinate and bounds parameters together without also including the coordinate bounds themselves. If this is the case then your code could easily construct such a container without a "bounds" attribute attached to the parameter variables.

I'm sorry if I'm a bit slow on this, but you seem to have a specific use case where "working out" needed relationships is difficult. Could you describe it in a bit more detail? This could help us reach consensus.

Sorry this seems to be taking up your valuable time, but I assure you if there is a compelling use case, then I'll favor including Jonathan's alternative.

best regards, Karl

**Note:**See TracTickets for help on using tickets.

Thank you for making this proposal, which I support.

Jonathan