Opened 18 months ago

Last modified 2 months ago

#147 new defect

clarification of standard and correction of conformance doc: formula_terms

Reported by: taylor13 Owned by: cf-conventions@…
Priority: high Milestone:
Component: cf-conventions Version:
Keywords: Cc:

Description

Based on a statement appearing in the conformance document, the CF checker raises an error when the formula_terms attribute is attached to a variable other than a coordinate variable. It turns out that formula_terms are essential for interpreting the bounds on dimensionless vertical coordinates, so formula_terms should be provided whenever a variable with one of the standard_names listed in Appendix C appears in a file. The formula terms associated with bounds are needed, for example, to compute the pressure-thickness of model atmospheric layers (needed to compute the grid cell mass).

To correct this defect (and remove any impression that the formula_terms can *only* be attached to coordinate variables (as formally defined by the NUG), I propose:

  1. In section 1.3 Overview, the standard states: "The definitions are associated with a coordinate variable via the standard_name and formula_terms attributes." I would revise the sentence to read: "The definitions are associated with variables containing dimensionless coordinate values via the standard_name and formula_terms attributes."
  1. Similarly in section 1.4 Relationship to the COARDS Conventions, the standard states: "But we recommend that the standard_name and formula_terms attributes be used to identify the appropriate definition of the dimensionless vertical coordinate ...". I would revise the sentence to read: "But we recommend that the standard_name and formula_terms attributes be used to identify the appropriate definition of a variable storing dimensionless vertical coordinate values ...".
  1. In Chapter 4 Coordinate Variables. The standard states "The definitions are associated with a coordinate variable via the standard_name and formula_terms attributes." I would revise the sentence to read: "The definitions are associated with variables containing dimensionless vertical coordinate values via the standard_name and formula_terms attributes."
  1. In section 4.3.2. Dimensionless Vertical Coordinates, the standard states: "A new attribute, formula_terms, is used to associate terms in the definitions with variables in a netCDF file." I would add to this sentence the phrase ", as described in Appendix D".
  1. In Appendix D. Dimensionless Vertical Coordinates, the standard states: "A coordinate variable is associated with its definition by the value of the standard_name attribute. The terms in the definition are associated with file variables by the formula_terms attribute. The formula_terms attribute takes a string value, the string being comprised of blank-separated elements of the form "term: variable", where term is a keyword that represents one of the terms in the definition, and variable is the name of the variable in a netCDF file that contains the values for that term. The order of elements is not significant." I would add to this paragraph the sentence: "Use of the standard_names and formula_terms defined in this Appendix is not limited to coordinate variables (as defined in section 2.3.1 of the NUG); it is recommended that they be included for any variable storing dimensionless vertical coordinate values."
  1. The CF Conformance Requirements and Recommendations document currently states: "The formula_terms attribute is only allowed on a coordinate variable which has a standard_name listed in Appendix C." In addition to correcting "Appendix C" to read "Appendix D", I suggest modifying the sentence as follows: "The formula_terms attribute is only allowed on a variable that has a standard_name listed in Appendix D or contains coordinates bounds for a coordinate variable defined in Appendix D." [Note that this means we can have formula_terms associated with *any* variable that has a standard_name listed in the Appendix.]
  1. The current CF checker(s) should be revised to be consistent with the above change in the conformance document.

A final note: In CMIP5 (and as planned for CMIP6), formula_terms are defined for vertical dimensionless coordinates and also for the bounds on those coordinates. If the CF checker is modified as suggested above, it will no longer raise an error in checking CMIP output files.

Change History (25)

comment:1 Changed 18 months ago by jonathan

Dear Karl

Thanks for this proposal. I have a few comments.

  • In your preamble, you mean appendix D, I think.
  • In ticket 143, we have agreed to rename "dimensionless vertical coordinate variables" as "parametric vertical coordinate variables", because they are not necessarily dimensionless. That is a simple substitution which affects your proposed text.
  • On point 5, perhaps it would be simpler to replace "A coordinate variable" at the start with "A variable containing coordinate data". In Appendix A, we already have this approach. There we put "C" for "variables which contain coordinate data", and formula_terms is marked "C". I think we used that phrase in order to include Unidata coordinate variables, auxiliary coordinate variables and scalar variables, and you would argue that bounds variables should also be included in it.
  • This also relates to ticket 140, in which David Hassell proposed that boundary variables should generally not have attributes which can be inherited from their coordinate variables, or if they do, they must be exactly the same. However, he didn't include formula_terms in that, so it doesn't conflict with yours. The formula_terms attribute does have to be different for bounds, because the variables it names are different from the formula_terms of its parent variable.
  • If we can express it, we ought to have some requirement that the formula_terms of the bounds and the coordinate variable are consistent. They must be the same formula, or it wouldn't make sense. I think that means they must have the same standard_name, or if the bounds don't have one, it can be inherited (ticket 140). See also next point.
  • I don't follow your final remark in point 6. Surely the standard_name of the bounds variable must be the same as of the coordinate variable (see also ticket 140)? I don't think we can allow a variable to have formula_terms unless it's one of those in Appendix D, because it's only those which have a defined formula.
  • We could make a requirement of consistency that the variables named by the bounds of the formula_terms of the vertical coordinate variable must be the same ones as named by the formula_terms of the bounds of the vertical coordinate variable! That is, you can get to the same place by two routes.

Best wishes

Jonathan

comment:2 Changed 17 months ago by davidhassell

Hello Karl,

Thanks for thinking about this. In the spirit of #140, I prefer a different emphasis, one which recommends there not to be formula_terms on bounds variables, but defines how a formula may be assumed to apply to bounds variable if omitted. This is to reduce redundancy. I propose that (incorporating Jonathan's last point, probably less well put ...):

If a coordinate variable has a formula_terms attribute then by default it is assumed that the same formula applies to the coordinate variable's bounds, but each term with a coordinate variable value is replaced with the corresponding bounds variable. If a bounds variable has a formula_terms attribute then it must be the same as for the parent coordinate variable but with term values of bounds variables instead of the corresponding coordinate variables, and this would be checked by the checker.

For example, in CDL, this:

float atmosphere_hybrid_height_coordinate(atmosphere_hybrid_height_coordinate);
    standard_name: "atmosphere_hybrid_height_coordinate";
    formula_terms: "a: a b: b orog: orog";
    bounds: "atmosphere_hybrid_height_coordinate_bounds";

float atmosphere_hybrid_height_coordinate_bounds(atmosphere_hybrid_height_coordinate, 2);

would be exactly equivalent to:

float atmosphere_hybrid_height_coordinate(atmosphere_hybrid_height_coordinate);
    standard_name: "atmosphere_hybrid_height_coordinate";
    formula_terms: "a: a b: b orog: orog";
    bounds: "atmosphere_hybrid_height_coordinate_bounds";

float atmosphere_hybrid_height_coordinate_bounds(atmosphere_hybrid_height_coordinate, 2);
    formula_terms = "a: a_bounds b: b_bounds orog: orog";

assuming that coordinate variables a and b have bounds variables a_bounds and b_bounds respectively.

What do you think?

All the best,

David

comment:3 follow-up: Changed 17 months ago by jonathan

Just to clarify this, David, I think that when you say "assuming that coordinate variables a and b have bounds variables a_bounds and b_bounds respectively" you mean that they do explicitly have those bounds. That is, in both cases, the CDL also includes

   float a(atmosphere_hybrid_height_coordinate);
     a:bounds="a_bounds";
   float a_bounds(atmosphere_hybrid_height_coordinate,2);
   float b(atmosphere_hybrid_height_coordinate);
     b:bounds="b_bounds";
   float b_bounds(atmosphere_hybrid_height_coordinate,2);

Is that right? The point is that you don't need the formula_terms on the bounds of the coordinate variable, because you can use the formula_terms on the coordinate variable to find the bounds of the terms.

Jonathan

comment:4 in reply to: ↑ 3 Changed 17 months ago by davidhassell

Jonathan,

That is indeed what I meant. Much better to spell it out to avoid confusion - thank you!

All the best,

David

comment:5 Changed 17 months ago by taylor13

Dear all,

Thanks for thinking about this.

Responses and further comments to the previous discussion:

I agree with Jonathan’s first 5 bullet points.

Concerning Jonathan’s point 6: Would it be clearer if this simply read: “The formula_terms attribute may be attached to any variable that has a standard_name listed in Appendix D (or to a variable directly associated with another variable with that standard_name, as might be the case with cell bounds variables).”

Concerning Jonathan’s point 7, and subsequent discussion: For consistency with the rest of the convention, there would be no restrictions placed on actual variable names (e.g., a_bounds, and b_bounds would not have to start with “a” and “b” or end in the suffix “bounds”.) . Thus, it would be perfectly acceptable to have CDL like the following (paralleling previous examples):

float h(h);

h:standard_name = "atmosphere_hybrid_height_coordinate";

h: formula_terms = "a: x b: y orog: orog";

h: bounds = "hbnds";

float hbnds(h, 2);

hbnds:formula_terms = "a: xxx b: yyy orog: orog"

float x(h);

x:bounds="xxx";

float xxx(h,2);

float y(h);

y:bounds="yyy";

float yyy(h,2);

In my original proposal, I omitted the bounds attached to x and y because that seems both redundant and I think the bounds attribute should only be allowed for variables containing coordinate values (nb. in the above example “x” and “y” are not coordinate values). The formula_terms are needed to translate some (usually) dimensionless value to an actual geophysical vertical position. In this sense they are needed to interpret the bounds positions just as they are needed to interpret the coordinate positions. It seems natural then to attach them to the bounds, as well as to the coordinate. I don’t see any added value to attaching a bounds attribute to the any of the variables appearing in formula_terms. So I favor eliminating the following two lines from the above example:

x:bounds="xxx";

y:bounds="yyy";

Thanks for quickly providing input on this,

Karl

comment:6 Changed 17 months ago by jonathan

Dear Karl

I think that any variable containing coordinate data should be allowed to have bounds (as currently indicated by Appendix A), which I understand to include the formula terms of a coordinate variable. I think this makes them more independently interpretable, and the convention therefore more robust. I understand your point, that if a variable is used only as a formula term of a coordinate variable, you could find its bounds by looking at the formula_terms of the bounds of the coordinate variable. But this is rather indirect. There is nothing that points "up" from a formula term variable to the coordinate variable. You can only see this link from the top down. A general feature of CF is that things are self-describing. Consistent with this idea, I would like to be able to find the bounds of any variable by inspecting that variable alone.

Another argument, which is weaker but I think valid, is that a formula terms variable might also serve as a coordinate variable in its own right. It is contrived, but possible that pressure might be both a coordinate variable, and a formula term for a hybrid sigma-pressure coordinate variable, for example. Allowing this possibility means that we can't prevent formula terms from having bounds.

You prefer that formula terms should not have bounds. David and I prefer that bounds should not have formula terms. I suggest, therefore, that the best solution is to allow both, but insist that they must be consistent if both present. This was the last point of my comment above. It is a rule we can write down in the conformance document to be implemented in the CF checker. (By the way, I've been aware that we disagreed about this for at least a decade. I thought it might come to a head one day. :-)

On the other point, perhaps this would be even plainer, although repetitive: "The formula_terms attribute may be attached to any variable that has a standard_name listed in Appendix D, or which is the bounds variable of a coordinate variable with a standard_name listed in Appendix D."

I agree, there are no implied names for bounds. They have to be named explicitly by a bounds attribute.

Best wishes

Jonathan

comment:7 Changed 17 months ago by martin.juckes

Hello All,

while there is some value in Jonathan's compromise, I'm concerned that it will make life very difficult for the struggling software developer. When my software encounters a variable with formula terms and bounds, it should, perhaps, start looking for the bounds associated with the formula terms. In Karl's approach the formula terms attribute on the bounds variable explicitly states that one component of the formula shares the same variable as the parent coordinate variable (i.e. orog) but in David's approach this information is implicit through the absence of a bounds attribute on orog. While the information is complete in theory, I don't like the idea of constructing the formula based on presence or absence of an attribute which is optional (i.e. the bounds attribute of the formula terms. I probably have not fully understood the spirit of #140, but perhaps that spirit could be better supported by making a clear statement that the formula_terms attribute is an exception to the general rule because there will generally be good reasons for it to have different values.

Regards, Martin

comment:8 Changed 17 months ago by jonathan

Dear Martin

My aim is to make things easy for users of the data and writers of software for analysis, by providing everyone with the information they need under their noses. If you want the formula terms of a variable, whether it is a coordinate variable or a bounds variable, you can use its formula_terms attribute (Karl's preference). If you want the bounds of a variable, whether it is a coordinate variable or a formula terms variable, you can use its bounds attribute (David's preference and mine). Nothing is implicit or indirect. I think that it should be mandatory to provide both, so that users of data can rely on it. The disadvantage of this proposal is that it is redundant, because there are two routes from the coordinate variable to the bounds of its formula terms. But it would be easy to write down a test for consistency that can be put in the CF checker.

Best wishes

Jonathan

comment:9 Changed 17 months ago by taylor13

Dear Jonathan, Martin, and all,

My fundamental objection to defining bounds for formula terms is that it extends what is meant by “cell bounds” to a more abstract concept. I think “cell bounds” should mean the location (either in physical space, or some model representation of physical space) of the edges (or vertices) of grid cells. In the case of vertical coordinates, the values should monotonically change and the cell "centers" should fall between the bounds. The difference between two cell bounds should be some measure of the cell “thickness”. I would note that the formula terms do not have to change monotonically, so if we allowed the terms to have bounds (as you and Mark favor), the “thickness” could be 0 or negative. The terms are in no way analogous to coordinates so should not have bounds.

Also, I think it would be extraordinarily rare that someone would want to obtain the formula terms for the bounds unless they were interested in interpreting the bounds themselves. Consider someone analyzing temperature (T) that has been saved on model levels (say a sigma coordinate system). If you wanted to compute the (pressure-weighted) mean temperature over some layer (say from 400 to 500 hPa), you would need to recover the pressure on each of the sigma cell bounds. I think the natural way to proceed would be:

1) Look at T and see that sigma is the vertical coordinate. 2) Look at sigma and see that it has cell bounds stored in the variable pointed to by the “bounds” attribute. 3) Extract those sigma bounds 4) convert the sigma values to pressure values using the coefficients (and surface pressure) pointed to by the bounds attribute.

For the above straight-forward approach to work, it seems essential that formula_terms be attached to the coordinate bounds variable (and it isn’t essential that they be attached to the formula terms associated with the coordinate values):

float lev(lev);

lev:bounds=”levbnds”;

float levbnds(lev,2);

levbnds:formula_terms = "sigma: levbnds ps: PS ptop: PTOP" ;

Note also, that sometimes there are several formula term coefficients that are a function of the vertical coordinate. Wouldn’t it always be more trouble to go to each coefficient to learn what the formula_terms are for the coordinate bounds? Why not just go to the cell bounds and find all the formula terms you need to interpret your cell bounds?

In summary, I don’t think there are any good use cases where one would prefer the formula_terms for bounds be found by looking at each of the formula_terms for the coordinate values. Furthermore, placing bounds on variables that are not actually coordinates is confusing (especially since this can lead to wierd interpretations like zero “thicknesses” for the coefficients).

best wishes, Karl

comment:10 Changed 17 months ago by jonathan

Dear Karl

CF cell bounds aren't only about physical space. Every coordinate can have bounds to indicate its range, and all sorts of quantities can be coordinate variables e.g. temperature, density, wavelength, precipitation amount. So I don't see any inconsistency with allowing formula terms to have bounds, although they are not coordinate variables. However, you agree with that actually. I agree that often you go via the route you describe and that therefore it is useful for the coordinate variable bounds to have a formula_terms attribute. I don't object to that. I think that it may be useful sometimes to have a bounds attribute on the formula terms as an alternative. I don't think we have to choose one approach or the other. We can follow both, since it's easy to test whether they're inconsistent.

If I understand correctly, you are referring to an extension of Example 4.3:

float lev(lev) ;
  lev:long_name = "sigma at layer midpoints" ;
  lev:positive = "down" ;
  lev:standard_name = "atmosphere_sigma_coordinate" ;
  lev:formula_terms = "sigma: lev ps: PS ptop: PTOP" ;
  lev:bounds="levbnds";
float levbnds(lev,2);
  levbnds:formula_terms = "sigma: levbnds ps: PS ptop: PTOP" ;
float T(lev,lat,lon);
  T:standard_name="air_temperature";
  T:units="K";

In your procedure above, you don't look at the formula terms of the bounds. You find levbnds by looking at the bounds attribute of lev. This is the attribute I am recommending should be included, but I think it's also OK to include formula_terms on levbnds, as you advocate (I think). In the case of the sigma coordinate, the formula terms are self-referential, and there is only one route you can follow. In other cases e.g. the hybrid sigma-pressure coordinate, the coordinate variable itself is not one of the terms, and then you have two routes. I don't see why we should exclude the possibility of examining the sigma part or the pressure part separately to see what their bounds are.

Best wishes

Jonathan

comment:11 Changed 17 months ago by taylor13

Dear Jonathan,

The example you provide is for the special case where one of the variables appearing in the formula_terms (“lev”) is also a needed coordinate variable to define the air_temperature variable. I think here it is clearly appropriate for bounds to appear with “lev” because it is a coordinate variable (as well as a formula_term).

In the case of atmosphere hybrid-sigma pressure coordinates: formula_terms = "a: var1 b: var2 ps: var3 p0: var4" and the example corresponding to yours would be:

float eta(eta) ;
   eta:long_name = "eta at layer midpoints" ;
   eta:positive = "down" ;
   eta:standard_name = " atmosphere_hybrid_sigma_pressure_coordinate" ;
   eta:formula_terms = "a: a b: b ps: ps p0:p0”;
   eta:bounds="etabnds";
 float etabnds(eta,2);
   etabnds:formula_terms = " a: abnds b: bbnds ps: ps p0:p0" ;
 float T(eta,lat,lon);
   T:standard_name="air_temperature";
   T:units="K";
 float a(eta);
   b:long_name = "’a’ coefficient for vertical coordinate (at full levels)";
   a:units = ‘Pa’ 
   a:bounds = “abnds”;    ***** I don’t think this should be allowed.
 float b(eta);
   b:long_name = "’b’ coefficient for vertical coordinate (at full levels)";
   b:units = ‘Pa Pa-1’
   b:bounds = “abnds”;    ***** I don’t think this should be allowed
 float abnds(eta,2);
   abnds:long_name = "’a’ coefficient for vertical coordinate (at half-levels)";
 float bbnds(eta,2);
   bbnds:long_name = "’b’ coefficient for vertical coordinate (at half-levels)";

Although lots of variables can be used as a vertical coordinate, ‘a’ and ‘b’ cannot because they are not monotonic (and they would not ever appear in a “coordinates” attribute). For example, here are the values of ‘a’ for the 60-level ERA40 reanalysis model:

a (Pa)
------- 
 10.
 28.
 49.
 78.
 113.
 .
 .
 .
 .

 19991.
 20330.
 20412.
 20247.
 19847. 
 19231.
 18420.
 .
 .
 .
 
 661.
 339.
 138.
 36.
 3.
 0.

The values do not change monotonically. Although not indicated by the above, I also note that the value of ‘a’ for the bound between the grid-cells with ‘a’ values of 20412 and 20247 is 20429, so the bound value doesn’t even lie in between the values at the cell “centers”, which to me doesn’t seem right for a reasonably defined set of values and bounds.

I think our convention should stipulate that bounds be normally reserved for use with a variable that is a coordinate in the file, or which by itself might be used as a coordinate in geo-locating data. Any one-dimensional variable that varied monotonically could then have bounds (because it would qualify as a coordinate variable), and a variable such as lat(i) for an unstructured grid could also have bounds even though it would not in general vary monotonically (and could not qualify as at least a potential coordinate variable).

[Aside: Note that the zonal mean surface northward wind component is (normally) carried by a latxlon gridded model on cell bounds, and it might be reported at cell centers, but I don’t think we would want to ever see the following file construction:

coordinates:
   lat=180;

variables:
   float lat(lat);
      lat:bounds = “latbnds”;
   float latbnds(lat,2);
   float vs(lat);
      vs:bounds=”vbnds”;
   float vbnds(lat,2);
      vbnds:coordinates=”latbnds”

This construction would clearly not extend to a 2-d field (latxlon), and I think it really confuses what we intend “bounds” to represent. I think allowing bounds on formula terms (in general) would be like allowing bounds on “vs” in the above example. If we wanted to store the values of vs at both cell centers and cell bounds, wouldn’t we just construct a file along the following lines?

coordinates:
   lat=180;
   latbnds=181

variables:
   float lat(lat);
   float latbnds(latbnds)
   float vs_at_centers(lat);
   float vs_at_bounds(vbnds);

If we allow bounds to be applied to any variable in a file, then the first way of representing the “vs” field at cell centers and at cell bounds would be allowed, which I wouldn’t like. Similarly, I don’t like allowing cell bounds on the “a’s” and “b’s” in formula terms.

end aside]

I am arguing then to reserve bounds for any variable that is used (or the user intends to use) as a coordinate. That would rule out use of “bounds” with many (but not all) formula terms.

Best regards, Karl

P.S.

I note in the introduction to chapter 7 of the standard we state: “When gridded data does not represent the point values of a field but instead represents some characteristic of the field within cells of finite "volume," a complete description of the variable should include metadata that describes the domain or extent of each cell, and the characteristic of the field that the cell values represent.” I think this is well put. And I think that when gridded data does represent point values, then the concept of “cell volume” makes no sense, and such things as grid bounds make no sense. In the case of the “a” and “b” coefficients of hybrid sigma pressure coordinates, these apply at points and do not represent some characteristic of a grid cell, so I don’t think “a” and “b” should have bounds.

comment:12 Changed 17 months ago by jonathan

Dear Karl

I agree that a and b of hybrid sigma-pressure can't be used as coordinates in general (because they are not generally monotonic). However, they do have bounds. You point at their bounds from the formula_terms of etabnds in your example. Your objection is not to their having bounds, which are meaningful because they jointly define the extent of the cell in physical space, but to pointing at these bounds using the bounds attributes you have marked with *****.

I can't present any new argument. As I said before, I don't see the harm in having this attribute (provided we can check for consistency, which is easy), and I think it's useful. Although a is not a coordinate, you might wish to do coordinate-like things with it. If I give a its bounds, it makes it self-contained, which I feel is a naturally CF way to go. For instance, I could hand the varid of a to a subroutine with the request to compute the width of the intervals in a, in just the same way as I might do with eta in your example, or with sigma in the previous example. Under your scheme, the subroutine won't be able to process a, however, because there is no pointer from a to eta, without which you can't find the bounds of a. Are there other cases where CF forbids the data-writer to provide potentially useful information? It seems unnecessarily obstructive to me. So I think we should continue to allow bounds on formula terms, with a consistency check.

I agree that it would not be right to use the bounds of a coordinate variable as a coordinate variable in their own right. I think it would almost be illegal anyway, and certainly very peculiar, since the dimension sized 2 (in your aside) would need a coordinate variable to describe it; if a data variable is multidimensional, we expect each of its dimensions to correspond to a physical axis.

Best wishes

Jonathan

comment:13 Changed 17 months ago by taylor13

Dear Jonathan,

Christmas is nigh, so I'll try to be brief; one more try at explaining my position.

In Section "7.1 Cell Boundaries", we state: "To represent cells we add the attribute bounds to the appropriate coordinate variable(s)." I think the important part of this sentence is that the bounds attribute is used to provide information about the *cell*; this is more important than mentioning that they get attached to coordinates.

Currently, the standard clearly limits its discussion of "cells" to grids (of 1 or more dimensions). I would argue we should keep it that way because it keeps the concept simple.

In short my argument is:

  1. keep the current definition of "cells"
  2. continue to limit the use of "bounds" to describing cells.

I see no compelling use case for modifying (and complicating) this beautifully-constructed aspect of CF.

If we generally allowed bounds to be attached to variables appearing in formula terms (that are not themselves coordinate variables), we would have to modify our definition of a "cell" or define "bounds" in a way that does not relate them to cells.

As we both agree, we can find the terms needed to interpret parametrically defined coordinates through the formula_terms, which under my proposal can be attached to the coordinate values themselves and to their bounds. Why complicate things?

If you want to complicate things, I think you will have to rewrite at least section 7.1 and the introduction to section 7. Do you propose to do that? (I hope not.)

best wishes for the holidays, Karl

comment:14 Changed 17 months ago by jonathan

Dear Karl

I've always understood the formula terms to be part of the definition of the cells. In hybrid sigma-pressure, for example, the eta coordinate has bounds, which are useful for plotting, but you can't use them for many purposes. To do calculations, you need the bounds of the sigma and pressure terms separately. We agree that the formula terms have bounds and that these are needed for the definition of the cell. We disagree only about exposing this link in a bounds attribute. In CF-netCDF files, I have always written a bounds attribute for the formula terms because it's useful, as I've already said, to have a direct link from the formula terms to its bounds for some calculations, because otherwise you have to search the file to find it. I think it's natural; you think it's objectionable!

In fact not all formula terms have bounds. For instance, it wouldn't make sense for the p* field in hybrid sigma-pressure. I think that the ones which have bounds are those which have the vertical dimension - do you think that's right? The p* variable will be the same one in the formula_terms of the coordinate variable and of its boundary variable.

My interpretation of 7.1 implicitly allows bounds of formula terms when they contain coordinate data. On the other hand, unlike you, I didn't interpret 4.3.2 or 7.1 as indicating that boundary variables could have formula_terms, since this is not mentioned, but I don't object to it.

In view of the above two paragraphs, I think both our points of view require further changes to the existing text to allow them explicitly. I suggest:

  • In 4.3.2, you propose a modified sentence, "A new attribute, formula_terms, is used to associate terms in the definitions with variables in a netCDF file, as described in Appendix D." I would insert a further sentence, "The formula_terms attribute is also permitted for boundary variables of variables containing parametric vertical coordinate data (see Section 7.1, Cell boundaries). For a boundary variable, those terms which do not have the vertical coordinate dimension must be identical to the corresponding terms of the coordinate variable, while those terms which do have the vertical coordinate dimension are boundary variables for the formula terms concerned." We can write a check for this in the conformance document.
  • By the way, since we are modifying this sentence, I suggest we change "A new attribute, formula_terms" to "The formula_terms attribute." It is new in CF compared with COARDS, but it's no longer new in absolute terms!
  • The bounds of sigma should be inserted in Example 4.3.
  • I would append the following to 7.1 as a new paragraph. "Variables which are named by formula_terms of a parametric coordinate variable (see Section 4.3.2, Parametric vertical coordinate) and which have the vertical coordinate dimension may have a bounds attribute. If the parametric coordinate variable has a boundary variable with a formula_terms attribute, its terms should be consistent with the bounds of the terms of the coordinate variable." We can write a check for that too. This paragraph should be followed by a new example to illustrate the situation when the coordinate variable is not also one of the formula terms. We could use the hybrid sigma-pressure example which you wrote down in comment 11 (including the starred lines which you don't like).

Making these changes would reconcile our views, I think, by allowing both methods.

Best wishes

Jonathan

comment:15 Changed 17 months ago by davidhassell

Hello,

A minor point, which relates to the checker: If a formula term is not a coordinate variable and has a bounds attribute but is not an auxiliary coordinate variable (i.e. is not referenced by the parent data variable's coordinates attribute) then a warning should issued, because the bounds attribute has no reserved CF meaning on variables which are not auxiliary coordinate variables or coordinate variables.

It has to be a warning and not an error because there's nothing wrong, of course, in setting an attribute called "bounds", with any value, on an arbitrary variable.

All the best,

David

comment:16 Changed 17 months ago by jonathan

Dear David

In my last point above, I'm suggesting that bounds on formula terms will be explicitly allowed by CF, so no warning would be needed. Are you happy with that?

Best wishes

Jonathan

comment:17 Changed 17 months ago by davidhassell

Dear Jonathan,

Ah, OK, I think that I misinterpreted what you had written. However, I'm not sure about that. How is the bounds attribute to be interpreted when such a formula terms variable is used outside of the conversion formula?

All the best,

David

comment:18 Changed 2 months ago by taylor13

Dear Jonathan and all,

As you know, we are about to write petabytes of hopefully CF-compliant CMIP6 data. There is an urgent need to agree on how to proceed on this ticket. If possible, I would like to squeeze it into CF 1.7.

To summarize this ticket: Data stored on model levels for CMIP5 was non-compliant with the standard because formula_terms was attached to the variable providing bounds for the vertical parametric coordinate, and currently CF forbids this. (A formula_terms can only be attached to a coordinate variable.) We plan to include formula_terms for bounds in CMIP6 too, so it will also be non-compliant unless we change the standard.

I proposed that formula_terms should be allowed to be attached to variables containing bounds of coordinates as well as being attached to variables containing the coordinates themselves.

You thought that was a good idea, but also wanted to go further and allow the bounds to be attached to the (parameter) variables pointed to by formula_terms, even though these variables cannot in general be considered coordinates. You thought this was “implicitly allowed by section 7.1”. But that section is introduced with:

“To represent cells we add the attribute bounds to the appropriate coordinate variable(s). The value of bounds is the name of the variable that contains the vertices of the cell boundaries. We refer to this type of variable as a "boundary variable.” A boundary variable will have one more dimension than its associated coordinate or auxiliary coordinate variable.”

This would seem to explicitly rule out use of formula_terms with any variable other than a coordinate variable (and the parameters appearing in formula_terms aren’t generally coordinate variables).

So, I think that however we record the values of the parameters needed to convert the bounds of a parametric coordinate to a vertical location in physical space, we will have to modify the current convention.

You have also argued that “it's useful to have a direct link from the formula terms to its bounds for some calculations, because otherwise you have to search the file to find it.” Earlier you expanded on this:

“… I think it's useful. Although "a" [one of the parameters needed to define hybrid sigma coordinates] is not a coordinate, you might wish to do coordinate-like things with it. If I give "a" its bounds, it makes it self-contained, which I feel is a naturally CF way to go. For instance, I could hand the varid of "a" to a subroutine with the request to compute the width of the intervals in "a", in just the same way as I might do with eta in your example, or with sigma in the previous example. Under your scheme, the subroutine won't be able to process "a", however, because there is no pointer from "a" to eta, without which you can't find the bounds of "a".”

You say you might want to compute the width of “a”, but I can’t think of any reason to do that (I noted earlier that the so-called “width” can turn out to be 0 for some parameters.) I can’t think of any use for operating on the values of parameters at cell bounds other than to compute the position in space of the vertical coordinate. I would note that both eta and sigma are actual parametric coordinates, and it clearly is sometimes useful to compute the width of coordinate cells.

In any case, I can see no added convenience of attaching a bounds attribute to the parameters themselves, rather than to the variable containing the bounds coordinate variable. When you come across a variable that is a function of a parametric vertical coordinate, you would presumably look at the formula_term to determine what “containers” needed defining. At that time you could note whether or not there were bounds defined for that parametric vertical coordinate, and if there were, you could easily extract and associate the variables containing the parameter values at the bounds of the vertical coordinate with the variables containing the parameter values at the coordinate nodes. This would be quite straight-forward I should think.

Note that I don’t think we can interpret the “value of the parameter at the bounds of a parametric vertical coordinate’s grid cell” as the “bounds of the parameter” because cells can’t intrinsically be defined by the parameter. The cells are defined by the parametric vertical coordinate (which therefore have bounds). Like other variables (e.g., temperature, humidity, etc.) that can be defined both at the coordinate locations and the cell bounds, the parameter values can be defined at both places. But the cell (along with its bounds) is defined by the coordinate, not the variables that are a function of that coordinate. Do you agree?

The reason I have been so forceful in arguing against your position is that I think it requires us to redefine what we’ve meant by a “cell”. Up to now, a cell has been defined by the bounds attached to a variable used as a coordinate variable. This meant that the grid cell bounds would always have values between the values of the two cells they separated. The concept of a physical cell (like intervals on the number lines taught us in elementary school) is easy to grasp. If we modify this simple concept and allow bounds for the parameters associated with parametric vertical coordinates, I think we make it much harder for novices to understand what we’re talking about. How can the bounds defining contiguous cells in 1 dimension not be monotonic? That is what would be required if we allowed bounds be attached to parameters rather than limiting their use to coordinates.

I guess if you still don’t see why I’m so opposed to allowing both options, and there are no other opinions expressed, we have two choices:

1) We allow both options

2) We remain unable to reach consensus, and CMIP6, like CMIP5, will produce non-CF-compliant files

I anxiously await your thoughts.

best wishes, Karl

comment:19 Changed 2 months ago by taylor13

Dear Jonathan and all,

[THE FOLLOWING PARAGRAPH WAS INSERTED 1-DAY AFTER THE ORIGINAL COMMENT WAS POSTED. PLEASE READ THIS PARAGRAPH AND COMMENT ON IT EVEN IF YOU DON'T HAVE TIME TO READ THE REST OF THE COMMENT:

It occurs to me that there is a stop-gap measure we should take immediately. Modify the conformance document such that it won't raise an error if a formula_terms is attached to a variable that is not a coordinate variable. I have reread the standard, and I can't see any place where it specifically forbids using formula_terms outside the usage discussed. Like other attributes, I would think this means it might also be used in unorthodox ways without making a file inconsistent with the standard. (For example, 'bounds' can only be expected to be interpreted by software when attached to a coordinate variable, but, as David noted and as Jonathan has done in practice, this does not forbid its use elsewhere.) Similarly, I think we should be able to attach the formula_terms attribute to to a cell bounds variable without raising an error. So, I propose that the CF checker *not* raise an error in this case. This could be done in time for the CF 1.7 release, I think, and would make CMIP5 and CMIP6 data pass the CF checker's checks. I might note that no one has complained about CMIP5 files being out of compliance with CF, so I don't think there is any software out there that relies on a restriction of formula_terms to coordinate variables. We didn't discover the problem until late last year when we ran the CF checker on some CMIP5 files.

NOW BACK TO THE ORIGINAL POST:]

As you know, we are about to write petabytes of hopefully CF-compliant CMIP6 data. There is an urgent need to agree on how to proceed on this ticket. If possible, I would like to squeeze it into CF 1.7.

To summarize this ticket: Data stored on model levels for CMIP5 was non-compliant with the standard because formula_terms was attached to the variable providing bounds for the vertical parametric coordinate, and currently CF forbids this. (A formula_terms can only be attached to a coordinate variable.) We plan to include formula_terms for bounds in CMIP6 too, so it will also be non-compliant unless we change the standard.

I proposed that formula_terms should be allowed to be attached to variables containing bounds of coordinates as well as being attached to variables containing the coordinates themselves.

You thought that was a good idea, but also wanted to go further and allow the bounds to be attached to the (parameter) variables pointed to by formula_terms, even though these variables cannot in general be considered coordinates. You thought this was “implicitly allowed by section 7.1”. But that section is introduced with:

“To represent cells we add the attribute bounds to the appropriate coordinate variable(s). The value of bounds is the name of the variable that contains the vertices of the cell boundaries. We refer to this type of variable as a "boundary variable.” A boundary variable will have one more dimension than its associated coordinate or auxiliary coordinate variable.”

This would seem to explicitly rule out use of formula_terms with any variable other than a coordinate variable (and the parameters appearing in formula_terms aren’t generally coordinate variables).

So, I think that however we record the values of the parameters needed to convert the bounds of a parametric coordinate to a vertical location in physical space, we will have to modify the current convention.

You have also argued that “it's useful to have a direct link from the formula terms to its bounds for some calculations, because otherwise you have to search the file to find it.” Earlier you expanded on this:

“… I think it's useful. Although "a" [one of the parameters needed to define hybrid sigma coordinates] is not a coordinate, you might wish to do coordinate-like things with it. If I give "a" its bounds, it makes it self-contained, which I feel is a naturally CF way to go. For instance, I could hand the varid of "a" to a subroutine with the request to compute the width of the intervals in "a", in just the same way as I might do with eta in your example, or with sigma in the previous example. Under your scheme, the subroutine won't be able to process "a", however, because there is no pointer from "a" to eta, without which you can't find the bounds of "a".”

You say you might want to compute the width of “a”, but I can’t think of any reason to do that (I noted earlier that the so-called “width” can turn out to be 0 for some parameters.) I can’t think of any use for operating on the values of parameters at cell bounds other than to compute the position in space of the vertical coordinate. I would note that both eta and sigma are actual parametric coordinates, and it clearly is sometimes useful to compute the width of coordinate cells.

In any case, I can see no added convenience of attaching a bounds attribute to the parameters themselves, rather than to the variable containing the bounds coordinate variable. When you come across a variable that is a function of a parametric vertical coordinate, you would presumably look at the formula_term to determine what “containers” needed defining. At that time you could note whether or not there were bounds defined for that parametric vertical coordinate, and if there were, you could easily extract and associate the variables containing the parameter values at the bounds of the vertical coordinate with the variables containing the parameter values at the coordinate nodes. This would be quite straight-forward I should think.

Note that I don’t think we can interpret the “value of the parameter at the bounds of a parametric vertical coordinate’s grid cell” as the “bounds of the parameter” because cells can’t intrinsically be defined by the parameter. The cells are defined by the parametric vertical coordinate (which therefore have bounds). Like other variables (e.g., temperature, humidity, etc.) that can be defined both at the coordinate locations and the cell bounds, the parameter values can be defined at both places. But the cell (along with its bounds) is defined by the coordinate, not the variables that are a function of that coordinate. Do you agree?

The reason I have been so forceful in arguing against your position is that I think it requires us to redefine what we’ve meant by a “cell”. Up to now, a cell has been defined by the bounds attached to a variable used as a coordinate variable. This meant that the grid cell bounds would always have values between the values of the two cells they separated. The concept of a physical cell (like intervals on the number lines taught us in elementary school) is easy to grasp. If we modify this simple concept and allow bounds for the parameters associated with parametric vertical coordinates, I think we make it much harder for novices to understand what we’re talking about. How can the bounds defining contiguous cells in 1 dimension not be monotonic? That is what would be required if we allowed bounds be attached to parameters rather than limiting their use to coordinates.

I guess if you still don’t see why I’m so opposed to allowing both options, and there are no other opinions expressed, we have two choices:

1) We allow both options

2) We remain unable to reach consensus, and CMIP6, like CMIP5, will produce non-CF-compliant files

I anxiously await your thoughts.

best wishes, Karl

comment:20 Changed 2 months ago by martin.juckes

Hello All,

I think there are 4 ways that the CF-Checker could deal with this: when the CF-checker encounters formula_terms on a variable which is not a coordinate, it could:

  1. issue an error,
  2. issue a warning that the attribute has no meaning in this context,
  3. issue an information message that the attribute has no meaning in this context,
  4. do nothing.
  1. is what currently happens, but I would support Karl's arguments for moving away from this. The main problem which I can see with the other extreme, 4., is related to the point David raised, that the attribute is being used with no clear meaning: this may mean that the author of the file has tried to put information in and failed.

How about option 3., which would at least give the author a hint that the attribute is not where it is expected?

regards, Martin

comment:21 Changed 2 months ago by davidhassell

Hi Karl,

Do these sum up the two cases?

If a coordinate variable has bounds and formula terms but its bounds variable does not have formula_terms:

dimensions:
    z = 19 ;
    y = 73 ;
    x = 96 ;
    bound = 2 ;
variables:
    double hybrid_sigma(z) ;
           hybrid_sigma.standard_name = "atmosphere_hybrid_sigma_pressure_coordinate" ;
           hybrid_sigma.formula_terms = "a: A b: B ps: PS" ;
           hybrid_sigma.bounds = "hybrid_sigma_bounds" ;
    double A(z) ;                      // HAS bounds
    	   A.bounds = "A_bounds" ;  
    double B(z) ;                      // HAS bounds
       	   B.bounds = "B_bounds" ;
    double PS(y, x) ;
    double hybrid_sigma_bounds(z, bounds) ;
    double A_bounds(z, bounds) ;
    double B_bounds(z, bounds) ;

If a coordinate variable has bounds and formula terms and its bounds variable also has formula_terms:

dimensions:
    z = 19 ;
    y = 73 ;
    x = 96 ;
    bound = 2 ;
variables:
    double hybrid_sigma(z) ;
           hybrid_sigma.standard_name = "atmosphere_hybrid_sigma_pressure_coordinate" ;
           hybrid_sigma.formula_terms = "a: A b: B ps: PS" ;
           hybrid_sigma.bounds = "hybrid_sigma_bounds" ;
    double A(z) ;                // DOES NOT HAVE bounds 
    double B(z) ;                // DOES NOT HAVE bounds 
    double PS(y, x) ;
    double hybrid_sigma_bounds(z, bounds) ;
           hybrid_sigma.formula_terms = "a: A_bounds b: B_bounds ps: PS" ;
    double A_bounds(z, bounds) ; // NOT explicitly attached to A 
    double B_bounds(z, bounds) ; // NOT explicitly attached to B

The checker has a bit of work to do, in each case, checking whether or not bounds exist on certain of the terms.

Thanks,

David

comment:22 Changed 2 months ago by jonathan

Dear Karl

At the start of this ticket, you said, "Based on a statement appearing in the conformance document, the CF checker raises an error when the formula_terms attribute is attached to a variable other than a coordinate variable." Which statement is that? Is it this one?

The formula_terms attribute is only allowed on a coordinate variable which has a standard_name listed in Appendix C.

If so, perhaps this is a misunderstanding. I think that point of that statement is not that the variable is a coordinate variable, but that it has a standard_name from App C, because without that you can't know what formula applies. So I would suggest amending the statement to

The formula_terms attribute is only allowed on a variable which has a standard_name listed in Appendix C.

To make CMIP6 data compliant, you'd have to add a standard_name to bounds variables. That would be legal, but it would be unnecessary if David's ticket 140 was accepted. Could you support ticket 140? If both this ticket and ticket 140 were accepted now and put in CF 1.7 (which is still under construction - we agreed a deadline of end of Jan for tickets, but that was to meet the needs of CMIP6), I would suggest modifying the conformance document to satisfy both this ticket and that ticket thus:

The formula_terms attribute is only allowed on a variable which has a standard_name listed in Appendix C, or on a bounds variable of a variable which has a standard_name listed in Appendix C.

In that case, formula_terms on bounds will not cause an error. The CF document does not prohibit bounds on formula_terms at present, and I don't think it should, as we've discussed earlier in this ticket. You're not convinced about that, I know. One of your objections is monotonicity, but I would point out that we allow bounds on auxiliary coordinate variables, which aren't necessarily monotonic. For example, a trajectory in 1D might reverse direction at one of its cell boundaries; this boundary would then not lie between the points that it separates. Formula terms aren't as bad as that anyway: they might not be strictly monotonic (i.e. they might have repeated values) but I expect they're monotonic.

I continue to think we should allow bounds on formula terms or formula terms on bounds, because it's convenient. This is not replication of metadata, but it allows two possible routes to the metadata. Since both would need to be supported, they may as well be allowed together. In that case, the CF checker could detect an inconsistency quite easily, starting from the coordinate variable. That's a third case to add to David's two.

dimensions:
    z = 19 ;
    y = 73 ;
    x = 96 ;
    bound = 2 ;
variables:
    double hybrid_sigma(z) ;
           hybrid_sigma.standard_name = "atmosphere_hybrid_sigma_pressure_coordinate" ;
           hybrid_sigma.formula_terms = "a: A b: B ps: PS" ;
           hybrid_sigma.bounds = "hybrid_sigma_bounds" ;
    double A(z) ;                      // HAS bounds
    	   A.bounds = "A_bounds" ;  
    double B(z) ;                      // HAS bounds
       	   B.bounds = "B_bounds" ;
    double PS(y, x) ;
    double hybrid_sigma_bounds(z, bounds) ;
           hybrid_sigma_bounds.formula_terms = "a: A_bounds b: B_bounds ps: PS" ;
    double A_bounds(z, bounds) ;
    double B_bounds(z, bounds) ;

Here, starting from hybrid_sigma, you can reach A_bounds via hybrid_sigma.formula_terms and A.bounds, or via hybrid_sigma.bounds and hybrid_sigma_bounds.formula_terms. (NB David's second example should have hybrid_sigma_bounds.formula_terms in the third line from the end.)

Cheers

Jonathan

comment:23 Changed 2 months ago by davidhassell

Hello Karl, Jonathan,

I'm happy with all the three cases described in CDL in the previous two posts being allowed. I'm also fine with Jonathan's proposed text changes, and (naturally, as I propsed it!) #140 being accepted.

There is one further detail - is it allowed for a term variable to have bounds but for that variable to not be listed in the coordinates attribute of a data variable with uses the formula terms?

dimensions:
    z = 19 ;
    y = 73 ;
    x = 96 ;
    bound = 2 ;
variables:
    double hybrid_sigma(z) ;
           hybrid_sigma.standard_name = "atmosphere_hybrid_sigma_pressure_coordinate" ;
           hybrid_sigma.formula_terms = "a: A b: B ps: PS" ;
           hybrid_sigma.bounds = "hybrid_sigma_bounds" ;
    double A(z) ;                      // HAS bounds
    	   A.bounds = "A_bounds" ;  
    double B(z) ;                      // HAS bounds
       	   B.bounds = "B_bounds" ;
    double PS(y, x) ;
    double lat(y, x) ;
    double lon(y, x) ;
    double hybrid_sigma_bounds(z, bounds) ;
           hybrid_sigma_bounds.formula_terms = "a: A_bounds b: B_bounds ps: PS" ;
    double A_bounds(z, bounds) ;
    double B_bounds(z, bounds) ;
    double air_temperature(z, y, x):
           air_temperature.coordinates = "lat lon" // no A nor B, here

I think that this would be OK, as even though it A is not identified as an auxiliary coordinate variable (and so on the face of it the bounds attribute is non-standardised), its reference from a formula terms means that it may be viewed as containing coordinate data (and so its bounds attribute is, in fact, properly meaningful).

Thanks,

David

comment:24 Changed 2 months ago by taylor13

Hi all,

There seems to be agreement (among the 4 of us who have expressed an opinion) that attaching formula_terms to a parametric coordinate's "boundary variable" should not raise an error (i.e., not be considered out of conformance). I would be o.k. with either option 2 or 3 suggested by Martin, but would favor 3: issue an information message that the attribute has no meaning in this context. We could do this immediately. It is not absolutely essential at this time to agree to the rest of the proposal. [Note: I originally submitted this ticket to correct what I considered a defect in the conformance document.]

That being said, I think we also agree that attaching formula_terms to a "boundary variable" should be sanctioned by CF, so perhaps that also could be implemented now. What we haven't yet reached consensus on is whether Jonathan's alternative approach should also be allowed. As I've said I’m against this because it would require redefinition of a "boundary variable" in the conventions. A "boundary variable" is defined in section 1.2 ("Terminology"):

A boundary variable is associated with a variable that contains coordinate data. When a data value provides information about conditions in a cell occupying a region of space/time or some other dimension, the boundary variable provides a description of cell extent.

I don’t think the parameters defined in formula terms are anything like coordinate data; many of them are just coefficients (which do not have to vary with location). They don’t define meaningful intervals (or cell extents). I don’t think any compelling use case has been proposed which would warrant complicating the very straight-forward meaning of a “boundary variable” in the current conventions.

When we come upon a parametric coordinate variable, we might want to find out the vertical location pointed to by that coordinate, and we would consult the formula_terms to extract the parameters needed to do that. Similarly, when we come upon a variable containing the bounds of a parametric coordinate variable, we might want to find out the vertical locations associated with these bounds, and we would consult the formula_terms attached to it. Codes operating on the data could treat both the coordinates and their bounds in exactly the same way. I can see no need for an alternative pathway.

So, I would advocate further discussion before possibly implementing Jonathan’s alternative.

In the mean time I hope this won’t hold up correcting the defect in the CF conformance document and possibly also agreeing to use of formula_terms with a parametric coordinate’s boundary variable.

best regards, Karl

Note: See TracTickets for help on using tickets.