Opened 3 years ago

Last modified 2 years ago

#107 new task

CF Data Model 1.7

Reported by: markh Owned by: cf-conventions@…
Priority: medium Milestone:
Component: cf-conventions Version:
Keywords: Cc:

Description

CF Data Model version 1.7

The latest draft of the data model is:

The Terms of Reference for the model are agreed.

This ticket is a follow up activity to #95 and #68 which both focussed on CF1.5 but did not reach a conclusion.

Change History (85)

comment:1 Changed 3 years ago by jonathan

Dear Mark

I agree that it would be good to move things along, but I don't think we've finished the data model for CF 1.5 yet, which we agreed is where we would start (point 4 of your terms of reference). We began that in ticket 68 and continued in 95. Ticket 95 hasn't been abandoned, though. Because of our irreconcilable views on scalar coordinates, we digressed to tickets 104 and 105 to address that one point. If we can agree ticket 104, we can go back to the discussion of the data model in ticket 95! I hope we will be able to do that soon. In ticket 95 we have agreed a lot of text describing the data model, but there are still some aspects of the model that we haven't discussed, especially transforms.

The outcome of ticket 95, presuming we succeeding in agreeing it, will be a data model document for CF 1.5. Once we have that, we can update it for CF 1.6, in which the main issue will be discrete sampling geometries. I haven't thought about it carefully, but I suspect that the changes in the data model to update it further to CF 1.7 will be minor, or even nil. There are lots of tickets making useful changes, but I don't immediately see any conceptual changes. However, I don't want to anticipate that discussion!

Best wishes

Jonathan

comment:2 Changed 3 years ago by jonathan

Dear all

Mark argues that we should start work on the data model for CF 1.7 (the next version i.e. CF 1.6 plus all the agreed tickets), instead of concluding the data model for CF 1.5. I'm keen that we should build on what we already agreed for CF 1.5, since it will be mostly unchanged, and Mark agrees with that. I have therefore posted several documents on the wiki. I've put them there as annexes to this posting. I suggest that those wiki documents should not be edited; instead, as we debate changes to the wording of the data model, we can post new wiki documents. Keeping lengthy text out of the ticket itself should make it more readable; that was one of Mark's concerns.

The documents I have posted are:

I suggest that the first one should not need any discussion, unless anyone can see something in it which is different in CF 1.6 or CF 1.7. Following John Caron's comments in ticket 95, I have added text stating that dimension/auxiliary coordinates are independent/dependent. All of the text in that document comes verbatim from ticket 95, except for the changes marked.

The proposal for transforms was posted by David at the end of ticket 95. The other three documents contain text from the data model that David and I proposed for CF 1.5.

I must admit that I haven't yet thought in detail about what CF 1.6 or CF 1.7 might require but, as I said above, I don't think there should be major changes needed.

Best wishes

Jonathan

comment:3 Changed 3 years ago by markh

Thank you for all the reference information Jonathan, that is most helpful.

I suggest we start with some of the smaller, easily defined aspects of the model (to get our collective eye in).

I suggest that cell_measures are a good example of such a well_constrained, limited scope type; perhaps we could address this first.

There is a text proposed here to define cell_measures within the scope of the data model: https://cf-pcmdi.llnl.gov/trac/wiki/Ticket107Text9Nov13CellMeasures

The relevant CF NetCDF section is here: http://cf-pcmdi.llnl.gov/documents/cf-conventions/1.6/ch07s02.html

I invite comments on the definition of cell_measures within the scope of the data model.

comment:4 Changed 3 years ago by markh

I suggest we adopt a commonly used naming convention for types or constructs in a data model: each use in the descriptive text is capitalised, with camel case, to help readers identify where a reference is to an explicit entity in our model;

e.g. CellMeasure

I think that the suggested text would benefit from some more specific details, particularly on what is allowed and what is not. So, 'allowed: area, volume' rather than 'e.g. area'.

I have put my suggestions in a text block here to illustrate my points. I think this is a little better than the text proposed above.

comment:5 follow-ups: Changed 3 years ago by jonathan

Dear Mark

I am sorry if I appear to be awkward, but having compared our versions I prefer ours to yours in most respects. These are the differences:

  • You prefer "CellMeasure" to "cell measure construct". Maybe "construct" isn't the best word, but we've used it in all the previously agreed text. I appreciate that your nomenclature would be more compact, but we should be consistent throughout the document. Could we postpone this until we've agreed all the sections?
  • Layout. I formatted the text in the same way as the text we have already agreed. Your style is different. I think we ought to return to the question of form when we've finished the contents.
  • I gave just one example of the measure property and units, whereas you have listed the allowed values. I think that for the data model we do not need to be exhaustive, because the allowed values are a matter for vocabulary; they don't affect the concept. Actually, I would rather remove the single example than list all possibilities in the data model document!
  • Purpose of cell measures. I agree we need this. It's not in my version because I moved it to the Field part of the document. That's because you suggested, and we agreed, that it was better in the case of coordinates to put the description of purpose at the level of the Field itself. Hence for consistency I did the same for all the other components of the Field. My proposed text is "A cell measure construct provides information about the size or shape of the cells defined by an ordered list of one or more domain axes of the field."
  • I say "contain", you say "define". This is not a big difference, but we say "contain" elsewhere. These are the components of the construct.
  • About the dimensions, I wrote, "[Its] shape is determined by the domain axes in the order listed, optionally omitting any domain axes of size one. ... It is assumed that the metric does not depend on any of the domain axes of the field which are not specified, along which the values are implicitly propagated," and you wrote, "[Its] shape is consistent with the domain axes mediating the references from the containing Field." To my mind, your version is less explicit and less clear. I think we could omit "optionally omitting ... size one" from my version, because I suppose a dependence on a size-one dimension is in effect not a dependence, since there is no variation, so it's covered by the sentence about implicit propagation. That sentence is important. It is not required that a cell measures variable has all the dimensions of the data variable.
  • I don't think the point about controlled vocabularies is needed here. However, we could mention that when we deal with properties.
  • You have omitted the final part, which relates to the data model to CF-netCDF files. I think the user is likely to find that information helpful, so I feel that we should keep it.

Cheers

Jonathan

comment:6 in reply to: ↑ 5 Changed 3 years ago by markh

Replying to jonathan:

thank you for the comments, responses to some of them are here:

  • I say "contain", you say "define". This is not a big difference, but we say "contain" elsewhere. These are the components of the construct.

This makes sense, I have updated the draft accordingly.

  • I gave just one example of the measure property and units, whereas you have listed the allowed values. I think that for the data model we do not need to be exhaustive, because the allowed values are a matter for vocabulary; they don't affect the concept. Actually, I would rather remove the single example than list all possibilities in the data model document!
  • I don't think the point about controlled vocabularies is needed here. However, we could mention that when we deal with properties.

I wonder if these are two sides of the same coin. If we have a 'Properties' section of the model with reference to appropriate controlled vocabularies, we could capture all of the information about controlled vocabularies and scope there.

I think we must be exhaustive somewhere, defining how to comprehend the correct vocabulary: we have to define what is allowed. I am content to put this information under a Properties section, rather than in this section, if that is preferred. I suggest we get onto Properties next, to get the referencing and scope in shape.

  • You have omitted the final part, which relates to the data model to CF-netCDF files. I think the user is likely to find that information helpful, so I feel that we should keep it.

I understand. The first sentence seems the most relevant so I have added this to the draft; does the second sentence really add value here? I am not so sure.

to be continued... mark

comment:7 in reply to: ↑ 5 ; follow-up: Changed 3 years ago by markh

Replying to jonathan:

  • You prefer "CellMeasure" to "cell measure construct". Maybe "construct" isn't the best word, but we've used it in all the previously agreed text. I appreciate that your nomenclature would be more compact, but we should be consistent throughout the document. Could we postpone this until we've agreed all the sections?

I would rather not postpone this, I think it makes a significant difference to readability, giving a visual clue each time a 'first class citizen' in the CF data model is mentioned. I think it would be a helpful approach to adopt.

  • Layout. I formatted the text in the same way as the text we have already agreed. Your style is different. I think we ought to return to the question of form when we've finished the contents.

I think that the form aids readability, which I think we should aim to deliver. I think we should aim for clarity and conciseness at each step.

That said, the layout of the draft now looks quite similar to the layout of Ticket107Text9Nov13CellMeasures so perhaps there isn't much of an issue here.

  • Purpose of cell measures. I agree we need this. It's not in my version because I moved it to the Field part of the document. That's because you suggested, and we agreed, that it was better in the case of coordinates to put the description of purpose at the level of the Field itself. Hence for consistency I did the same for all the other components of the Field. My proposed text is "A cell measure construct provides information about the size or shape of the cells defined by an ordered list of one or more domain axes of the field."

I am not overly concerned about where a description of purpose resides; this seems like an easy thing to change at a later date. I wonder whether we can carry each one around with the construct for now and decide to collate them later.

Having re-read the text I put and your text, I feel that neither has quite captured the essence.

I think that the part

defined by an ordered list of one or more domain axes of the field

is not necessary, not adding much information to the description.

I think that the use of the term

shape

is unhelpful here, as shape may be used when discussing arrays shapes, or used to refer to a geometric spatial shape neither of which are in scope here.

I don't think that

A CellMeasure describes a measurement or parameter for cells within a domain; one is commonly used where the defined measure is not inferable from coordinates.

captures what we want either. The measurement is with reference to spatial aspects of the cell, area and volume.

Perhaps:

A CellMeasure provides information about the spatial size of cells within a domain.

As this information is often inferable and CellMeasures are used in cases where it is not, we could add a statement:

The presence of a CellMeasure indicates that the defined spatial size of the cells should not be inferred.

  • About the dimensions, I wrote, "[Its] shape is determined by the domain axes in the order listed, optionally omitting any domain axes of size one. ... It is assumed that the metric does not depend on any of the domain axes of the field which are not specified, along which the values are implicitly propagated," and you wrote, "[Its] shape is consistent with the domain axes mediating the references from the containing Field." To my mind, your version is less explicit and less clear. I think we could omit "optionally omitting ... size one" from my version, because I suppose a dependence on a size-one dimension is in effect not a dependence, since there is no variation, so it's covered by the sentence about implicit propagation. That sentence is important. It is not required that a cell measures variable has all the dimensions of the data variable.

I am slightly concerned by the sentence

It is assumed that the metric does not depend on any of the domain axes of the field which are not specified, along which the values are implicitly propagated.

I had to think long and hard about this to understand its meaning: I don't think it gives enough clarity.

I think that the definition of these relationship will be fairly consistent for AuxiliaryCoordinate, DimensionCoordinate and CellMeasure instances and will need to reference the description of DomainAxis clearly and consistently.

The text I propose in the draft is an attempt to use a consistent set of terms which will be clearly defined. We haven't got this far yet, so it's hard to judge the suitability of this phrasing.

Perhaps we can put off this discussion until we get back to the wording of the Field - DomainAxis relationship and the mediation of other relationships. Once we have a nice set of words we can agree how to reference them in the relevant sections.

Alternatively, I think we know what condition we are trying to describe, that: the values array can be unambiguously mapped to the containing Field's data array and relevant coordinates. As this is within the scope of the Field, perhaps we can define the relation there, and for the CellMeasure, simply state that:

A CellMeasure instance must contain:

* A typed numeric array of metric values

It is up to a Field to decide whether a CellMeasure is consistent and able to be used. In principle, any values array is valid for an orphaned CellMeasure, only a Field cares about consistency.

The draft is now quite similar to the proposal, although not the same. I wonder whether it captures the essence of the proposal whilst adding a little clarity for the readers.

comment:8 in reply to: ↑ 7 ; follow-up: Changed 3 years ago by davidhassell

Replying to markh:

Dear Mark,

Some thoughts on your very useful comments:

Replying to jonathan:

  • You prefer "CellMeasure" to "cell measure construct". Maybe "construct" isn't the best word, but we've used it in all the previously agreed text. I appreciate that your nomenclature would be more compact, but we should be consistent throughout the document. Could we postpone this until we've agreed all the sections?

I would rather not postpone this, I think it makes a significant difference to readability, giving a visual clue each time a 'first class citizen' in the CF data model is mentioned. I think it would be a helpful approach to adopt.

I'm happy not to postpone this, but would argue for the programming language neutral, plain English version ("cell measure construct") rather than, as I see it, the opposite ("CellMeasure instance")

Alternatively, I think we know what condition we are trying to describe, that: the values array can be unambiguously mapped to the containing Field's data array and relevant coordinates. As this is within the scope of the Field, perhaps we can define the relation there, and for the CellMeasure, simply state that:

A CellMeasure instance must contain:

* A typed numeric array of metric values

OK. So long as we say for each construct that they contain an ordered subset of the domain axes (which I think was dropped from Mark's latest draft), I wonder if we need to mention the propagation along missing axes at all, for any construct. Given that the axes are orthogonal, can it be taken as read?

I'm not sure what is being added by the word "typed", here. Is "numeric" not sufficient?

It is up to a Field to decide whether a CellMeasure is consistent and able to be used. In principle, any values array is valid for an orphaned CellMeasure, only a Field cares about consistency.

There can be no such thing as an orphaned cell measure construct. A cell measure construct can only exist as part of a field construct, I believe.

All the best,

David

comment:9 in reply to: ↑ 8 Changed 3 years ago by markh

Replying to davidhassell:

Alternatively, I think we know what condition we are trying to describe, that: the values array can be unambiguously mapped to the containing Field's data array and relevant coordinates. As this is within the scope of the Field, perhaps we can define the relation there, and for the CellMeasure, simply state that:

A CellMeasure instance must contain:

* A typed numeric array of metric values

OK. So long as we say for each construct that they contain an ordered subset of the domain axes (which I think was dropped from Mark's latest draft), I wonder if we need to mention the propagation along missing axes at all, for any construct. Given that the axes are orthogonal, can it be taken as read?

We need to capture the information on DomainAxis relations with cell measures, no doubt. My intent with this suggestion is that the necessary details may be captured in one place, perhaps in the Field or DomainAxis description, to avoid repeating text in multiple sections and possible inconsistency.

Where I said 'It is up to a Field to decide whether a CellMeasure is consistent and able to be used.' I was suggesting that the scope of this discussion is the responsibility of the Field.

There can be no such thing as an orphaned cell measure construct. A cell measure construct can only exist as part of a field construct, I believe.

The term orphaned was not intended to indicate anything important here, I am not advocating change, only trying to indicate the correct scope for this information may lie elsewhere (encapsulation: a CellMeasure does not know about the Field which contains it).

I am saying that we will need this definition for CellMeasures, DimCoords, AuxCoords and maybe other things, so lets do it once, right and not repeat ourselves.

(A minor detail point: these constructs do not 'contain an ordered set of domain axes' they are referenced by them; I don't think we should use that particular wording, it hints at a relation which I don't think exists.)

I'm not sure what is being added by the word "typed", here. Is "numeric" not sufficient?

"typed" indicates a specific type of array, for example, Int, Float. I think it is expected for one data type to exist across the whole values array:

'A numeric array of metric values .... The array must all be of the same data type.'

I thought this information could be fully captured by stating:

'A typed numeric array of metric values.'

I am content with either wording.

comment:10 follow-up: Changed 3 years ago by jonathan

Dear Mark

  • I agree with David in preferring "cell measure construct" to "CellMeasure", because it's just ordinary words. It should also be borne in mind that the data model is something we will talk about, not just read about, and you can't use visual markup when speaking. In your current draft you use the word "instance". I'm not happy with this because it's a programming term. One reason for proposing the word "construct" is to make this document independent of programming.
  • I think the definition should go in the field construct section. That was your suggestion originally and I agree with it, so I moved it.
  • As regards the purpose, I mentioned "shape" because it is possible or likely that cell measures might be defined for the geometrical form of the grid, such as angles between gridlines. That hasn't been done yet, although it's been discussed. We could foresee the possibility by including "shape" now, or we could modify the data model later. If we adopt the latter, I propose "A cell measure construct provides information about the size of the cells defined by an ordered list of one or more domain axes of the field." The last part is included because it's important to make clear that the construct does not have to refer to all the domain axes; if you just mention "domain", as in your draft, that is not made clear. I don't think it's right to say "one is commonly used where the defined measure is not inferable from coordinates" because there is no prohibition or recommendation against supplying cell_measures even when they are inferable.
  • OK to replace "metric of the space" with "metric of the domain". Thanks.
  • I don't think we need references to properties, because I continue to think that we should not list possible values in the data model. That is a matter for vocabulary, not the conceptual model. I gave one example of the measure property and the units just to make clear what we meant.
  • For the last part, I would say, "A numeric array of metric values whose shape is determined by the relevant subset of the domain axes in the order listed." The relevant subset are the ones mentioned in the description of the purpose of the construct. If we say "numeric", "typed" is implied, isn't it. I think it's OK to omit the implicit propagation, as you and David say.

Best wishes

Jonathan

comment:11 in reply to: ↑ 10 ; follow-up: Changed 3 years ago by markh

Replying to jonathan:

Dear Mark

  • I think the definition should go in the field construct section. That was your suggestion originally and I agree with it, so I moved it.

I'm not sure about this, but I'm happy to consider further. Please may we leave the text it here for now and decide its proper home when we come the the field construct section?

  • As regards the purpose, I mentioned "shape" because it is possible or likely that cell measures might be defined for the geometrical form of the grid, such as angles between gridlines. That hasn't been done yet, although it's been discussed. We could foresee the possibility by including "shape" now, or we could modify the data model later.

I think we should leave this alone for now

If we adopt the latter, I propose "A cell measure construct provides information about the size of the cells defined by an ordered list of one or more domain axes of the field." The last part is included because it's important to make clear that the construct does not have to refer to all the domain axes; if you just mention "domain", as in your draft, that is not made clear.

ok

I don't think it's right to say "one is commonly used where the defined measure is not inferable from coordinates" because there is no prohibition or recommendation against supplying cell_measures even when they are inferable.

My point is not about whether it is prohibited in cases where inference may be made. Instead I thought to indicate that where a cell measure is supplied, it should be used in preference to any inferred or calculated quantity. I have strengthened this statement, to be clear.

Where provided, the measure should be used in preference to calculating such a measure from other information.

I think this is the message we want to provide to data interpreters.

  • I don't think we need references to properties, because I continue to think that we should not list possible values in the data model. That is a matter for vocabulary, not the conceptual model. I gave one example of the measure property and the units just to make clear what we meant.

I think we have to reference the controlled vocabularies somehow from within the model, they are the keystone of CF.

Are you concerned about listing the recognised names of the properties (e.g. measure) or the allowable values (e.g. area)?

Perhaps we can leave these references in brackets just for now, and move onto properties, which I believe should help us shake out most of these details.

  • For the last part, I would say, "A numeric array of metric values whose shape is determined by the relevant subset of the domain axes in the order listed." The relevant subset are the ones mentioned in the description of the purpose of the construct.

I have put this in; I will continue to ponder whether we will be able to remove this later as a result of carefully chosen text for domain axes, fields etc.

If we say "numeric", "typed" is implied, isn't it.

I don't think it is, there are multiple types of numeric, such as int, float (and more in some implementations). The array must have a specified type and not have it vary across its elements, I believe.

mark

comment:12 in reply to: ↑ 11 ; follow-up: Changed 3 years ago by davidhassell

Replying to markh:

Dear mark

  • As regards the purpose, I mentioned "shape" because it is possible or likely that cell measures might be defined for the geometrical form of the grid, such as angles between gridlines. That hasn't been done yet, although it's been discussed. We could foresee the possibility by including "shape" now, or we could modify the data model later.

I think we should leave this alone for now

I don't think we can. The conventions (7.2) explicitly say that cell measures are for "size, shape or location of the cells" and that what is recognised is a matter for a controlled vocabulary This should be reflected in the data model.

My point is not about whether it is prohibited in cases where inference may be made. Instead I thought to indicate that where a cell measure is supplied, it should be used in preference to any inferred or calculated quantity. I have strengthened this statement, to be clear.

Where provided, the measure should be used in preference to calculating such a measure from other information.

I think this is the message we want to provide to data interpreters.

No rules of precedence are stated in the conventions, so I don't think that we should create some for the data model.

If we say "numeric", "typed" is implied, isn't it.

I don't think it is, there are multiple types of numeric, such as int, float (and more in some implementations). The array must have a specified type and not have it vary across its elements, I believe.

This is a bit too implementation-specific for me. Why can't an array be conceptually a mixture of floats and integers?

All the best,

David

comment:13 in reply to: ↑ 12 Changed 3 years ago by markh

Replying to davidhassell:

  • As regards the purpose, I mentioned "shape" because it is possible or likely that cell measures might be defined for the geometrical form of the grid, such as angles between gridlines. That hasn't been done yet, although it's been discussed. We could foresee the possibility by including "shape" now, or we could modify the data model later.

I think we should leave this alone for now

I don't think we can. The conventions (7.2) explicitly say that cell measures are for "size, shape or location of the cells" and that what is recognised is a matter for a controlled vocabulary This should be reflected in the data model.

OK, let's leave it in then.

My point is not about whether it is prohibited in cases where inference may be made. Instead I thought to indicate that where a cell measure is supplied, it should be used in preference to any inferred or calculated quantity. I have strengthened this statement, to be clear.

Where provided, the measure should be used in preference to calculating such a measure from other information.

I think this is the message we want to provide to data interpreters.

No rules of precedence are stated in the conventions, so I don't think that we should create some for the data model.

They are pretty strongly hinted at. 7.2 (cf-netcdf) has a number of uses of language such as 'that cannot be deduced from the coordinates and bounds without special knowledge'; 'In many cases the areas can be calculated from the cell bounds, but there are exceptions'

This gives a strong indication that cell metrics may be inferred, but cell_measures are used where inference is a bad idea, not recommended by data producers. I assumed we wanted to say this more clearly, but I think it is said. The term 'precedence' is not used, and perhaps it shouldn't be, but I think some statement about use of cell_measures to replace potentially calculated quantities is required: it is what they are for, it seems to me.

If we say "numeric", "typed" is implied, isn't it.

I don't think it is, there are multiple types of numeric, such as int, float (and more in some implementations). The array must have a specified type and not have it vary across its elements, I believe.

This is a bit too implementation-specific for me. Why can't an array be conceptually a mixture of floats and integers?

If you are all happy with this, then we can manage with it, but type was explicitly mentioned in all the previous text and seems important to me.

I would say that numeric does not imply typed. I would also say that there are two statements here: 1. the array should be of one type; 2. that type must be numeric. I haven't advocated mentioning lists of types in the model text, so it doesn't feel implementation specific to me.

I had thought that this was the intent of the initial proposal:

A numeric array of metric values ... The array must all be of the same data type.

comment:14 Changed 3 years ago by markh

I feel that the conversation about CellMeasures is nearly concluded.

I'd like to suggest we move onto properties. There is suggested text proposed.

I have a starting question, linked to the statement:

I don't think we need references to properties, because I continue to think that we should not list possible values in the data model. That is a matter for vocabulary, not the conceptual model.

I would like to understand this statement some more. The data model must reference the controlled vocabularies in some way, it is key to the data model that it uses vocabularies correctly.

I can understand the listing possible values leads to far too much text, so I am interested in how we may effectively make the assertion that a particular controlled vocabulary must be used correctly within the data model and how we should reference that vocabulary.

comment:15 follow-up: Changed 3 years ago by markh

I would like to raise a concern regarding the proposed text for properties.

There is explicit mention of:

  • Conventions
  • history
  • title
  • featureType

as property names valid for Field instances.

The CF-NetCDF conventions describe these as file attributes, not valid for individual data variables, in Appendix A.

I can see how featureType and Conventions could be treated as spanning across data variables, and the 'file only' constraint for NetCDF files being a logical extension of how CF instances are encode in NetCDF. As such I am not concerned by these two.

However, the use of history and title is inherited from the NUG and their use is heavily linked to files, collections of Fields.

As the data model does not have a sense of 'a collection of fields' I suggest that these are removed from the data model and put into the list of reserved property names, not to be used except in NetCDF files.

comment:16 in reply to: ↑ 15 Changed 3 years ago by davidhassell

Replying to markh:

After the discussions in #95, I think it was agreed that these attributes do apply to every data variable in the file, although how they apply had varying interpretations and was not resolved.

So in that light, I think that they ok to be mentioned.

All the best,

David

I would like to raise a concern regarding the proposed text for properties.

There is explicit mention of:

  • Conventions
  • history
  • title
  • featureType

as property names valid for Field instances.

The CF-NetCDF conventions describe these as file attributes, not valid for individual data variables, in Appendix A.

I can see how featureType and Conventions could be treated as spanning across data variables, and the 'file only' constraint for NetCDF files being a logical extension of how CF instances are encode in NetCDF. As such I am not concerned by these two.

However, the use of history and title is inherited from the NUG and their use is heavily linked to files, collections of Fields.

As the data model does not have a sense of 'a collection of fields' I suggest that these are removed from the data model and put into the list of reserved property names, not to be used except in NetCDF files.

comment:17 follow-up: Changed 3 years ago by markh

I have a set of questions regarding the proposed text for transforms

#70 contains an approved change which will be incorporated into CF1.7 enabling coordinates to be explicitly linked to particular grid_mapping attributes in a file. This facility is not supported by the description of transforms proposed above.

I think that we have two types of information represented here.

  1. We have coordinate reference systems which individual coordinates may be defined with respect to.
  2. We have coordinates which may be derived from other coordinates and additional information.

I wonder whether we can address case 2 by providing a derived coordinate type: rather than focusing on the action and having a transform type, we focus on the goal, the coordinate.

Case 1 may then be addressed by providing a coordinate reference system type whose sole job is to give context to the horizontal spatial coordinate, just like a calendar definition does for a time coordinate.

grid_mapping attributes from CF1.5 and before were provided to enable the relationship between coordinate variables and true latitude and longitude coordinates. As CF-NetCDF also mandates that latitude and longitude coordinates must be supplied, this information is being duplicated (for good reason, I am not questioning the utility of this approach).

In this context singular grid_mapping attributes, as in CF1.5, are transforms, or derived coordinates, depending on your point of view, providing latitude and longitude coordinate derivations; these coordinates happen to exist explicitly in netCDF files, for convenience. The singular grid_mapping attribute is also a coordinate reference system definition for the axis=X and axis=Y coordinates of the data variable.

CF1.7 introduces a richer syntax, but it does not change the core semantics, that we would like to define horizontal spatial coordinates with respect to a coordinate reference system.

Use of the CF1.7 (#70) syntax does not imply derived coordinates, in my opinion, it is a clear signal that only case 1 applies.

Does this approach make logical sense?

comment:18 in reply to: ↑ 17 ; follow-up: Changed 3 years ago by davidhassell

Replying to markh:

Dear Mark,

I have a set of questions regarding the proposed text for transforms

#70 contains an approved change which will be incorporated into CF1.7 enabling coordinates to be explicitly linked to particular grid_mapping attributes in a file. This facility is not supported by the description of transforms proposed above.

It would be useful to know why you think this facility is not supported, as I think that it is. I have editted the description of transforms to include the first line of the description which didn't make it across from ticket #95, namely:

A transform construct defines a mapping from one set of coordinates which can not geo-locate the field construct's data to another set of coordinates that can geo-locate the field construct's data.

Here is the CDL example from ticket #70 (multiple grid mappings), with a description of how it would be stored by the proposed Field construct:

    double x(x) ;
      x:standard_name = "projection_x_coordinate" ;
    double y(y) ;
      y:standard_name = "projection_y_coordinate" ;
    double z(z) ;
      z:standard_name = "height_above_reference_ellipsoid" ;
    double lat(y, x) ;
      lat:standard_name = "latitude" ;
    double lon(y, x) ;
      lon:standard_name = "longitude" ;
    float temp(z, y, x) ;
      temp:standard_name = "air_temperature" ;
      temp:coordinates = "lat lon" ;
      temp:grid_mapping = "crsOSGB: x y crsWGS84: lat lon" ;
    int crsOSGB ;
      crsOSGB:grid_mapping_name = "transverse_mercator";
      crsOSGB:semi_major_axis = 6377563.396 ;
      crsOSGB:inverse_flattening = 299.3249646 ;
      crsOSGB:longitude_of_prime_meridian = 0.0 ;
      crsOSGB:latitude_of_projection_origin = 49.0 ;
      crsOSGB:longitude_of_central_meridian = -2.0 ;
      crsOSGB:scale_factor_at_central_meridian = 0.9996012717 ;
      crsOSGB:false_easting = 400000.0 ;
      crsOSGB:false_northing = -100000.0 ;
      crsOSGB:unit = "metre" ;
    int crsWGS84 ;
      crsWGS84:grid_mapping_name = "latitude_longitude";
      crsWGS84:longitude_of_prime_meridian = 0.0 ;
      crsWGS84:semi_major_axis = 6378137.0 ;
      crsWGS84:inverse_flattening = 298.257223563

Field contains:

  • Dimension Coordinates: x, y
  • Auxiliary Coordinates: lat, lon
  • Transforms : transform_crsOSGB, transform_crsWGS84

where the transforms contain:

transform_crsOSGB:

  • NAME: "transverse_mercator"
  • semi_major_axis: 6377563.396
  • inverse_flattening; 299.3249646
  • longitude_of_prime_meridian: 0.0
  • latitude_of_projection_origin: 49.0
  • longitude_of_central_meridian: -2.0
  • scale_factor_at_central_meridian: 0.9996012717
  • false_easting: 400000.0
  • false_northing: -100000.0
  • unit: "metre"
  • INPUT COORDINATES: x, y

transform_crsWGS84:

  • NAME: "latitude_longitude"
  • longitude_of_prime_meridian: 0
  • semi_major_axis: 6378137.0
  • inverse_flattening: 298.257223563
  • INPUT COORDINATES: lat, lon

So, transform_crsOSGB defines a mapping from coordinates x and y to latitude-longitude coordinates on a particularly shaped earth; and transform_crsWGS84 defines a mapping from coordinates lat and lon to latitude-longitude coordinates on a different particularly shaped earth.

All the best,

David

comment:19 in reply to: ↑ 18 ; follow-up: Changed 3 years ago by markh

Replying to davidhassell:

So, transform_crsOSGB defines a mapping from coordinates x and y to latitude-longitude coordinates on a particularly shaped earth; and transform_crsWGS84 defines a mapping from coordinates lat and lon to latitude-longitude coordinates on a different particularly shaped earth.

crsOSGB and crsWGS84 do a lot more than define a transform. They define a coordinate reference system that coordinates are defined with respect to. This is fundamental and far more powerful than defining a transformation, it provides a well recognised definition of what the coordinate means.

A transformation, a derived coordinate, is a useful by-product of the definition of a coordinate reference system, but it is only a by-product.

I think this has been implicit in CF for a long time, but with CF-NetCDF1.7 it becomes clear and explicit. I would like it clearly represented in the data model.

comment:20 in reply to: ↑ 19 ; follow-up: Changed 3 years ago by davidhassell

Replying to markh:

Dear Mark,

Replying to davidhassell:

So, transform_crsOSGB defines a mapping from coordinates x and y to latitude-longitude coordinates on a particularly shaped earth; and transform_crsWGS84 defines a mapping from coordinates lat and lon to latitude-longitude coordinates on a different particularly shaped earth.

crsOSGB and crsWGS84 do a lot more than define a transform. They define a coordinate reference system that coordinates are defined with respect to. This is fundamental and far more powerful than defining a transformation, it provides a well recognised definition of what the coordinate means.

The way I see it, the purpose of a coordinate construct is to geo-locate the data. If it contains enough metadata for its array values to do this, then that is fine. If not then a transform construct provides the extra information needed by the coordinate construct to do that geo-location. Is this different to your coordinate reference system?

Perhaps the name "transform construct" is troublesome. If I recall correctly (?), we chose it because its practical purpose in the conventions is to record how one set of coordinates may be transformed into another set of (possibly more meaningful) coordinates. Note that whether the latter set exists, or not, is immaterial to the data model, as is the domain of the transformation (horizontal (grid_mapping), vertical (formula_terms), or anything else).

All the best,

David

comment:21 in reply to: ↑ 20 Changed 3 years ago by markh

Replying to davidhassell:

The way I see it, the purpose of a coordinate construct is to geo-locate the data. If it contains enough metadata for its array values to do this, then that is fine.

I don't think a horizontal spatial coordinate will ever contain enough metadata to geo-locate itself accurately without a coordinate reference system. It is the CRS that provides the geolocation for coordinate value.

If not then a transform construct provides the extra information needed by the coordinate construct to do that geo-location. Is this different to your coordinate reference system?

The coordinate reference system is for geolocation, yes.

Perhaps the name "transform construct" is troublesome. If I recall correctly (?), we chose it because its practical purpose in the conventions is to record how one set of coordinates may be transformed into another set of (possibly more meaningful) coordinates. Note that whether the latter set exists, or not, is immaterial to the data model, as is the domain of the transformation (horizontal (grid_mapping), vertical (formula_terms), or anything else).

To me transformation is one of many practical purposes of accurate geolocation. Another example might be calculating the distance between locations on the geoid.

Early CF focussed on one particular use case, that of providing explicit latitude longitude coordinates where the coordinate variables are not. This is useful, no doubt, but it is one of many cases.

So, whilst I can see the similarity between this case and the parameterised vertical coordinate, with both providing a derived coordinate (or two) as an output, I am not comfortable with all coordinate reference system use cases being handled this way.

I think it helps clarity to have the CRS as an explicit type in the data model and to handle derived coordinates, be they vertical or horizontal, as a separate type.

There's no problem with CF-NetCDF having a particular method of encoding this which is sensitive to its historical development.

comment:22 Changed 3 years ago by jonblower

I haven't followed this conversation in detail so hopefully I'm not repeating something. In general, I think it's a good idea to separate the concept of a transform from the concept of a coordinate reference system. I think CF should allow the serialization of CRSs, but not necessarily transforms.

My view is coloured by how the Java library Geotoolkit addresses this, which I think works well. A transform is just a function that does not need to know anything about georeferencing - it just takes input values and produces output values. It can be derived from two CRSs (e.g. like this) or by other means.

The point is that a transform between two CRSs may not always exist in a readily-serializable mathematical form. Different applications may be happy with different levels of precision in transforms (e.g. sometimes the datum shift is ignored).

In short: I would suggest recording the CRS in CF-NetCDF files, but letting applications find the transformation between CRSs. In fact, I think the examples above do describe a CRS, not a transform.

(The Javadoc for MathTransformFactory has some more info.)

comment:23 Changed 3 years ago by biard

Jon said, "...a transform between two CRSs may not always exist in a readily-serializable mathematical form." In fact, sometimes no transform exists. CRS transforms are a complicated mess.

comment:24 Changed 3 years ago by jonathan

Dear all

The purpose of the CF data model is to describe the CF convention in logical terms. In this case, that means describing what grid_mapping does in CF. CF does not have a concept of coordinate reference system as such, but it does have some rather well-defined metadata for grid mapping. So I think we should focus on describing that.

I would be interested to know who would be willing to discuss the remaining aspects of the text of the CF data model by teleconference(s). That's not to hide the discussion from everyone else, but because I think it would be a lot more efficient to talk rather than write. The exchanges above (still not concluded) about cell_measures show that it's quite detailed. If we could decide most things by talking and then write down the result on the ticket, others could comment if they disagree, so it would not exclude broader participation in reaching a conclusion.

Jonathan

comment:25 Changed 3 years ago by jonblower

Dear Jonathan,

What in your view is the difference between a grid_mapping and a CRS? I've always been a bit confused by the term "mapping" because, in my view, it doesn't describe a mapping but a reference system (what is the "mapping" supposed to be from and to?). I'm not suggesting that the CF data model should describe something that is not already in CF, but I might suggest that a more accurate name might be helpful, and may save some of the confusion between transformations and references.

(I might be wrong on the above - I can't check the CF document at the moment because the website is down, so I'm acting on my memory of what the grid_mapping is.)

Jon

comment:26 follow-up: Changed 3 years ago by jonathan

Dear Jon

The grid_mapping is so-called because of its original purpose, which is still its main purpose, of providing the mapping between 2D curvilinear (longitude, latitude) coordinates and the two horizontal dimensions of the grid (map projection coordinates or rotated lon-lat). We don't have to call it grid_mapping in the CF data model and indeed David and I proposed to call it transform instead, partly because we want to include formula_terms too.

Best wishes

Jonathan

comment:27 Changed 3 years ago by jonathan

Perhaps I could reduce the scope of my last question:

Is there anyone, apart from Mark (I presume), David and me, who would be interested to discuss the remaining issues about the CF data model except transforms, by teleconference, and (if we can) bring an agreed text back to this ticket? Transforms (grid_mapping and formula_terms) are more complicated than the other parts, and it would be nice to have agreed the text for everything else, I feel.

Jonathan

comment:28 in reply to: ↑ 26 ; follow-up: Changed 3 years ago by jonblower

Replying to jonathan:

The grid_mapping is so-called because of its original purpose, which is still its main purpose, of providing the mapping between 2D curvilinear (longitude, latitude) coordinates and the two horizontal dimensions of the grid (map projection coordinates or rotated lon-lat).

Hi Jonathan - To my understanding it is the "coordinates" attribute that points to the curvilinear coordinates. The "grid_mapping" attribute points to something that looks like the definition of a projection (roughly speaking, the mathematical origins of the curvilinear coordinates, not the coordinates themselves), although we also use it for things that aren't projections, like rotated-pole CRSs.

I wasn't keen on the idea of using "transform" because mostly one is not describing the transformation but a reference system. E.g. See the British National Grid example in the CF document. This (correctly) just describes the basis of the coordinate system but does not describe how to transform it to other systems.

My view is that CRS is the most generic name for these descriptions and might be the most appropriate term to use in a data model.

(This does not preclude, of course, the cases in which you really do have a transformation, but these are generally fairly complicated and hard to serialize and, as Jim says above, may not exist at all.)

Best wishes, on

comment:29 in reply to: ↑ 28 Changed 3 years ago by jonblower

Replying to jonblower:

Best wishes, on

Er, *Jon* ;-)

comment:30 Changed 3 years ago by markh

I have drafted some text which we may be able to use or adapt to describe CRSs and derived coordinates for the data model.

I have tried to differentiate between the two different uses for grid_mapping within netCDF.

Does this aid clarity?

mark

comment:31 follow-up: Changed 3 years ago by biard

Mark,

I may have missed a lot as I have followed this stream. I don't see where the grid_mapping variable ever comes into play in relation to derived coordinates.

Jim

comment:32 in reply to: ↑ 31 Changed 3 years ago by markh

Replying to biard:

Mark,

I may have missed a lot as I have followed this stream. I don't see where the grid_mapping variable ever comes into play in relation to derived coordinates.

Jim

I am not sure about this either.

The place where I thought it might be relevant is linked to the description of the grid_mapping attribute's aim: that the true latitude and longitude coordinates be supplied.

This requirement can be delivered in some situations by a pair of derived coordinates linked to the coordinate variables of the data. This isn't so helpful for NetCDF files but could be useful in the more general description.

I think it would be plausible to remove all mention of grid_mapping from a derived coordinate description; perhaps it would be clearer, my explanation of this case seems a little contrived to me now.

thank you mark

comment:33 Changed 3 years ago by jonblower

Hi Mark, Jim,

My reading of the standard is that grid_mapping points to something that looks like a CRS definition, and the "true latitude and longitude" coordinates are supplied by the coordinates attribute. Even though it's redundant, both of these may be used for the convenience of clients.

I'm not sure what is meant in Mark's document by a "derived" coordinate. This seems to be an association of coordinates in one CRS with a defined formula to transform into another one. I would argue that vertical sigma coordinates do fall into this definition, but horizontal coordinates do not, because CF does not (currently) supply the formula to translate to another system (it describes the CRS instead, which is different, leaving the application to figure out the transformation).

Cheers, Jon

comment:34 Changed 3 years ago by markh

I think Jon and Jim's perspectives are clear and helpful.

I have updated the draft crossing out these sections

comment:35 follow-up: Changed 3 years ago by jonblower

Thanks Mark, that looks good to me. A minor semantic point that I wasn't sure about - you say "Where the grid_mapping attribute does not define explicit references between coordinates and a CRS..." do you mean "While"?

I'm trying to get whether the the explicit references may or may not appear, or whether they never appear.

comment:36 in reply to: ↑ 35 Changed 3 years ago by markh

Replying to jonblower:

Thanks Mark, that looks good to me. A minor semantic point that I wasn't sure about - you say "Where the grid_mapping attribute does not define explicit references between coordinates and a CRS..." do you mean "While"?

In CF NetCDF files there are two allowable syntax approaches for the grid_mapping attribute.

  1. One syntax explicitly links a coordinate to a CRS:
    • e.g. temp:grid_mapping = "crsOSGB: x y crsWGS84: lat lon" ;
  2. The other syntax does not provide explicit links:
    • e.g. temp:grid_mapping = "crsOSGB" ;

My aim for this sentence is to make it certain that where the 2. syntax is used, there is still an explicit reference, and that the X and y coordinate variables are defined with respect to this CRS.

In other words, it provides backwards compatibility to the syntax from older version of CF whilst maintaining the same semantics.

I'm trying to get whether the the explicit references may or may not appear, or whether they never appear.

So, explicit references may or may not appear in a CF NetCDF file, but, if they don't appear explicitly, they are unambiguously implicit.

comment:37 Changed 3 years ago by jonblower

Ah, OK, I'd forgotten about the first syntax, sorry. Makes sense, thanks!

comment:38 Changed 3 years ago by davidhassell

Hello, all,

Whilst it is great that the data model is going through a phase of being discussed, before alternative models are presented, I think that it would be sensible to agree that the proposed model is not correct, if that is the case. We should also keep in mind that the data model is intended to a minimal, logical representation of CF as it currently stands, and so shouldn't get tied up with CF-netCDF syntax and organization.

Some good points have been raised about the name of the construct ("transform"). What if it were called a "Coordinate reference system construct" instead? That seems reasonable to me.

Earlier in this thread, I posted an example of how the transform (CRS?) construct was easily able to encapsulate the case of multiple grid_mappings. If there are counter examples where the proposed construct can not store the information and relationships, that would, I think, be very useful.

Many thanks and all the best,

David

comment:39 Changed 3 years ago by jonathan

Dear David

It's a good idea to consider what would be a better name than transform for this concept, but I have concerns about calling it coordinate reference system. The main purpose of grid_mapping is to specify how 2D lon and lat coordinates are related to 1D projected or rotated coordinates. That doesn't sound like a function which I would call "coordinate reference system", though it may involve a CRS. The relationship between dimensional and dimensionless vertical coordinates, which we also include in transform, does not usually involve a CRS.

Perhaps we should consider distinguishing the reference and transformation aspects of grid_mapping. We could decide to regard them as two different parts of the logical model, even though they are joined in one netCDF mechanism.

Cheers

Jonathan

comment:40 Changed 3 years ago by jonblower

Dear Jonathan,

David and I are discussing this off-line and will probably post back later. It's possible that we have different views of what a CRS actually is...

By the way, the relationship between dimensional and dimensionless vertical coordinates does involve a CRS, in fact it involves two! One for the dimensional coordinates (e.g. "depth in metres below sea level") and one for the dimensionless ones (e.g. a particular sigma coordinate system).

As an aside, the phrase "depth in metres below sea level" pretty much completely defines a (vertical) CRS. Horizontal CRSs are more complicated, hence all the varieties and parameters. There may be no single identifiable object in CF that looks like a CRS in all cases (sometimes the information is split between the coordinate variable and some other variable, sometimes the information is implicit) but logically it is there all the same.

Happy to involve you in the offline discussions if you have time!

Cheers, Jon

comment:41 Changed 3 years ago by biard

Hi.

While you guys are discussing offline, I'll pitch in some thoughts.

The contents of the variable referenced by the grid_mapping attribute are a declaration of a Coordinate Reference System (CRS). A CRS is a set of equations, parameters, and measurements that define a 2-D or 3-D coordinate system relative to the body of the Earth. In the simplest case, it is a definition of the ellipsoid to be used to approximate the shape of the Earth and of the (0,0,0) (lon,lat,elevation) point. In more complicated cases, it is a definition of a geoid surface (usually specified with a grid of points) relative to an ellipsoid. In still more complicated cases, it is a definition of a "warped cartesian" map projection coordinate system relative to a geoid or ellipsoid. Transforms may be constructed using the contents of a CRS to convert XYZ coordinates in a map projection to longitude, latitude, and Z (either relative to the geoid or the ellipsoid). It's interesting to note that the longitude and latitude of a point with a large Z value will be different depending on whether the Z is relative to a geoid or an ellipsoid. Transforms can also (usually) be constructed between two different CRSs.

This is actually a overly simple statement. There are map projection CRSs that declare a longitude,latitude anchor point, but don't connect the region they were designed for to any global vertical datum, which means you can't fully transform points measured in that CRS to any other CRS. This is all due to the history of CRSs, where they started as means to connect survey measurements in different regions to one another, and grew incrementally from there to the current state where we have GPS and a global scope.

Jon, regarding your aside above, I don't think the phrase "depth in metres below sea level" defines a vertical datum (the surface that defines your zero point relative to the body of the Earth in a CRS). It may define the Z axis of a coordinate system, but it is not a CRS.

Grace and peace,

Jim

comment:42 Changed 3 years ago by jonblower

Hi Jim,

Thanks very much for the useful input. Regarding my example of "depth in metres below sea level", in this case "sea level" is the datum. I know this isn't a very precise datum, but sometimes it's all you have!

Best wishes, Jon

comment:43 Changed 3 years ago by jonblower

P.S. Here's the formal CRS definition for "depth in metres below sea level": http://www.spatialreference.org/ref/epsg/5715/.

comment:44 Changed 3 years ago by biard

Jon,

The CRS you reference is "depth below Mean Sea Level (MSL)". MSL is a vertical datum that defines the mean sea level at numerous points all over the globe relative to the WGS84 ellipsoid, and is precise. Sea level is a less precise term, and could reference something other than MSL. Perhaps what you are really referring to is MSL.

Jim

comment:45 Changed 3 years ago by jonblower

Hi Jim,

You're absolutely right, I was a bit loose in my terms. But sometimes you don't know where you are with respect to MSL - a local sea level may be all you have. In this case "local sea level" is still the datum, albeit not one that you can easily convert to another CRS. It's still the zero point of your referencing system.

(Sorry, we are probably getting a bit off topic from the original discussion.)

Cheers, Jon

comment:46 Changed 3 years ago by biard

Jon,

I guess this is the ultimate question. Local sea level can define the zero point for your depth measurements, and as such it defines a coordinate system, but does it qualify as a Coordinate Reference System vertical datum? My strong tendency is to say no. It doesn't mean the depth measurement isn't highly useful, but isn't something that can be located geographically/geodetically. Now, if you also had the distance from local sea level to the WGS84 ellipsoid for each depth measured, then you would have defined a custom CRS vertical datum that would be unique for that dataset.

Grace and peace,

Jim

comment:47 Changed 3 years ago by jonblower

Hi Jim,

I think you are using a stronger definition of CRS than I am. I'm using the ISO definition, which basically says that a CRS needs a reference (datum), the direction away from this reference and the unit of measure (hence "depth below local sea level" is a fully-fledged CRS). I think you are referring to a fully geographic/geodetic CRS, which is clearly more precise of course (and more useful).

(I think we will find that not all CRSs in CF are fully geographic/geodetic, for example a vertical CRS based on pressure. It's still useful though and it doesn't mean there is no CRS at all. Any coordinate without a CRS is pretty much useless.)

The GeoAPI documentation is based on ISO and is instructive, as is an inspection of the subclasses (types) of CRS: http://www.geoapi.org/3.0/javadoc/org/opengis/referencing/crs/CoordinateReferenceSystem.html. (The GeoAPI list is not exhaustive of course.)

Best wishes, Jon

comment:48 Changed 3 years ago by graybeal

I just lost a longer response that basically amounts to the same think Jon Blower just said. Stupid TRAC. I include a few references in case they help others.

In short, it should be possible to convert from a 'depth wrt local sea level' to some other, more algorithmic CRS based on earth geoid or ellipsoid), but the error may be high, because local sea levels are more time variant than available sea level measurements or estimates. So while often there is not a precise 'distance from local sea level to the WGS84 ellipsoid for each depth measured', but there is a broader value (and sometimes there is even a precise value, as when moorings measure their GPS altitude).

Definitions based on ISO:

coordinate reference system (CRS): coordinate system + datum; types include geodetic and vertical coordinate system (CS): mathematical rules specifying how coordinates are assigned to points; composed of coordinate system axes; types of interest to use are ellipsoidal (earth shaped, 2D or 3D) and vertical, and the compound type datum: parameters to specify the relationship of a coordinate system to an object (Earth, in all our geospatial cases); defines position of origin, scale, and orientation of the CS (note that [13] references two definitions for datum, we are using ISO [7] for our reference definition)

References: [7] Coordinate systems standards | ISO 19111 | ISO 19111.2007(E); for sale by ISO [13] Datums | geodetic datums FAQ | http://www.ngs.noaa.gov/faq

This materially taken from previous work to consolidate CRS issues for OOI: https://confluence.oceanobservatories.org/display/CIDev/Coordinate+Systems+and+Coordinate+Transformations

comment:49 Changed 3 years ago by jonblower

Thanks John, that's a great reference. Jon

comment:50 Changed 3 years ago by biard

Jon, John,

That's quite interesting! I hadn't ever run across EPSG 5113 before. (All my work in this area has been land-based.) I stand corrected. I've got questions still, but I'll take them offline and stop cluttering up this thread.

Grace and peace,

Jim

comment:51 Changed 3 years ago by markh

My reading of these comments suggests that it is a good idea to separate the concepts of

coordinate reference system

and

derived coordinate

within the model.

Does this concur with other people's feelings on the matter?

If so, I think we can handle the fine detail of what it means to define a particular CRS instance outside of the scope of this ticket.

We can simply state that:

A coordinate may reference one coordinate reference system, the coordinate is then defined with respect to this CRS.

Derived coordinates are just examples of coordinates, which can all be defined with respect to a CRS as required.

is this a useful interpretation?

mark

comment:52 follow-up: Changed 3 years ago by biard

Mark,

Isn't it true that, as it currently stands in CF, coordinates do not reference any coordinate systems? I think it would be superior to have the coordinates directly reference their coordinate systems, but I think that in the current arrangement, a data variable declares an association between coordinate variables and CRSs via the grid_mapping attribute (either implicitly or explicitly, depending on the syntax chosen).

Grace and peace,

Jim

comment:53 in reply to: ↑ 52 ; follow-up: Changed 3 years ago by markh

Replying to biard:

Isn't it true that, as it currently stands in CF, coordinates do not reference any coordinate systems? I think it would be superior to have the coordinates directly reference their coordinate systems, but I think that in the current arrangement, a data variable declares an association between coordinate variables and CRSs via the grid_mapping attribute (either implicitly or explicitly, depending on the syntax chosen).

Hello Jim

In CF 1.6 this appears to be the case. However, there are a number of implications that the coordinate variables which are the Axis=='X' and Axis=='Y' coordinates are defined with respect to this coordiante reference system.

In order to make this implication more explicit and support multiple coordinate reference systems within the scope of a data variable, the ticket #70 was raised, discussed and is now approved. This will appear in CF 1.7.

This ticket makes it explicit that the semantics of all grid_mapping relationships are of coordinates defined with respect to coordinate reference systems. The association is controlled by the data variable and the syntax is defined to maintain backwards compatibility but the semantics are explicit: each coordinate is defined with respect to one coordinate reference system or None.

The fact that the declaration is made by the data variable is an encoding detail, not key semantic information.

By keeping the backwards compatibility we firm up the interpretation of all grid_mapping attributes on data variables and maintain consistency for both syntax options.

Are you content that this interpretation provides your superior approach: that coordinates directly reference their coordinate systems?

mark

comment:54 follow-up: Changed 3 years ago by markh

I have considered a proposed text for cell methods. I think the intent is correct in this text but I think it is a little too difficult to read.

With this in mind, I have prepared a draft for the cell methods text which I think conveys the same intent with a little more clarity.

I am still not sure that the opening sentence:

The cell methods construct describes the methods by which the data values of the field construct represent variation within their cells.

gives a good enough statement of purpose, but I am not sure how else to put it. Are there further suggestions for how to explain the purpose of cell methods?

The CF conventions for NetCDF state that cell methods are:

to describe the characteristic of a field that is represented by cell values

I wonder whether introducing terms such as values aggregated over cells or statistically aggregated values might help?

mark

comment:55 follow-up: Changed 3 years ago by davidhassell

Hello,

Jonathan, Jon and I have taken advantage of our geographic colocation and have been discussing the Transform construct offline. As a result, we'd like to propose the replacing it with a "Geolocation construct", described below. The geolocation construct is similar (but not identical) to the transform construct, and we hope that its definition is much clearer.

We'll be very interested in your comments.

All the best,

David


Georeference construct


A georeference construct provides information needed to locate spatial dimension and auxiliary coordinates within a frame of reference relative to the planet.

Whilst the spatial coordinate constructs themselves serve to locate the data within the domain of the field, some applications may require more information about the relationship of the domain to a planetary reference frame. Moreover, some of the information may be identical for multiple constructs and therefore needs to be defined only once.

A georeference construct contains

  • An unordered collection of the field's dimension and auxiliary coordinate constructs to which the georeference construct applies.
  • Scalar parameters (which may include descriptive strings), other dimension or auxiliary coordinate constructs of the field, or other field constructs; all of which provide information about the frame of reference.

The most common purposes of a georeference construct are

  • To specify the relationship between vertical coordinates which are not geolocated and coordinates of height (with respect to some geophysically located surface) or pressure (which is a proxy for height).
  • To specify the relationship between horizontal coordinates which are not geolocated and a logitude-latitude coordinate system with a particular reference ellipsoid for the shape of the planet.

The functions of the CF-netCDF attributes formula_terms and grid_mapping, which describe the locations of spatial coordinate variables (CF Appendices D and F), correspond to georeference constructs.


comment:56 follow-ups: Changed 3 years ago by biard

Hi.

I find the phrases "vertical coordinates which are not geolocated" and "horizontal coordinates which are not geolocated" to be confusing. It seems to me that the georeference construct is documenting the geolocation, not creating it. In addition, latitude/longitude coordinate systems must also be georeferenced. There are, in fact, differences between the latitude/longitude coordinates defined by different coordinate reference systems (CRSs). The geographic community has pretty much unified on global latitude/longitude systems and ellipsoids in recent times, but you can't make that assumption with older CRSs.

It is possible to define a projected CRS without reference to latitude/longitude at all. It could be done with a Cartesian CRS such as Earth Centered Fixed (ECF). All the ones I know of are layered on a latitude/longitude CRS, but I don't think you should assume layering implies that a latitude/longitude CRS has special status.

I get the feeling that your effort to put formula_terms and grid_mapping together in one construct definition may be what's making things awkward.

I'd write more, but I can't get to the CF pages to check on something I needed to look at in order to proceed. More later.

comment:57 in reply to: ↑ 53 Changed 3 years ago by biard

Replying to markh:

Replying to biard:

... Are you content that this interpretation provides your superior approach: that coordinates directly reference their coordinate systems?

mark

Mark,

Sorry I didn't get back to this sooner. I agree that the suggested new syntax for the contents of the grid_mapping attribute addresses the problem of how to have different CRSs for different coordinate variables, but I don't think it's true that the current highly indirect way of connecting coordinate variables to coordinate systems is semantically equivalent to having a coordinate variable directly reference its CRS. It's easy to see that this is true when you ask the question, "Can I associate multiple CRSs with a single coordinate variable using this scheme?"

Let's say I have two data variables that both use the same coordinate variable, and thus two statements of the association between the coordinate variable and a CRS. With the current solution, the answer is "Yes". We must rely on the developer / CF checker to avoid this condition. If a coordinate variable directly references its CRS, the answer is "No". The two solutions are not equivalent.

If you are trying to be rigorous in your description of the data model, then I don't think you can say that a coordinate variable references a CRS. You can say:

A coordinate must be associated with not more than one CRS, and all data variables referencing a given coordinate must declare the same association.

Or something along those lines. Grace and peace,

Jim

comment:58 in reply to: ↑ 56 ; follow-up: Changed 3 years ago by davidhassell

Replying to biard:

Hi Jim,

Thanks for the points. I'll try to clarify.

I find the phrases "vertical coordinates which are not geolocated" and "horizontal coordinates which are not geolocated" to be confusing. It seems to me that the georeference construct is documenting the geolocation, not creating it. In addition, latitude/longitude coordinate systems must also be georeferenced. There are, in fact, differences between the latitude/longitude coordinates defined by different coordinate reference systems (CRSs). The geographic community has pretty much unified on global latitude/longitude systems and ellipsoids in recent times, but you can't make that assumption with older CRSs.

I think that the construct is neither documenting nor creating the geolocation. The information needed to geolocate the domain is split - some of it is contained by the coordinate constructs themselves (typically by the standard name and units) and some of it is contained elsewhere (typically by grid_mapping or formula_terms). It is this latter information which we are encapsulating in the georeference construct.

It is possible to define a projected CRS without reference to latitude/longitude at all. It could be done with a Cartesian CRS such as Earth Centered Fixed (ECF). All the ones I know of are layered on a latitude/longitude CRS, but I don't think you should assume layering implies that a latitude/longitude CRS has special status.

The geolocation construct does not prescribe what "geolocation" means., i.e. it doesn't necessarily mean latitude-longitude. It could mean "height" or ECF, or anthing. The example "To specify the relationship between horizontal coordinates which are not geolocated and a logitude-latitude ..." is merely highlighting the common case arising from many CF-netCDF grid_mappings.

All the best,

David

comment:59 in reply to: ↑ 56 ; follow-up: Changed 3 years ago by markh

Replying to biard:

I get the feeling that your effort to put formula_terms and grid_mapping together in one construct definition may be what's making things awkward.

I agree with you Jim, I think this is awkward.

The functionality of CF-NetCDF formula_terms and grid_mappings may be simpler and clearer in the model if they are defined separately; they are quite different.

Could we limit our scope in this case to the definition of frames of reference which coordinates may be defined with respect to? I feel it would be clearer and easier for others to use this way.

I think coordinate reference system is a commonly used term for this in other communities which may aid communication, but if georeference is a preferred label, then I think it is usable.

mark

comment:60 in reply to: ↑ 59 ; follow-up: Changed 3 years ago by davidhassell

Replying to markh:

Dear Mark, Jim,

Replying to biard:

I get the feeling that your effort to put formula_terms and grid_mapping together in one construct definition may be what's making things awkward.

I agree with you Jim, I think this is awkward.

The functionality of CF-NetCDF formula_terms and grid_mappings may be simpler and clearer in the model if they are defined separately; they are quite different.

Ok. We should discuss this! I think that fundamentally they are the same, which why they should not be separated in the logical data model

CF-netCDF grid_mapping and formula_terms both provide information that indicate the location of the data, hence they can be described by the same construct.

Could we limit our scope in this case to the definition of frames of reference which coordinates may be defined with respect to? I feel it would be clearer and easier for others to use this way.

Apologies, I'm not sure what you mean, here.

I think coordinate reference system is a commonly used term for this in other communities which may aid communication, but if georeference is a preferred label, then I think it is usable.

One reason that we didn't use the phrase "coordinate reference system (CRS)" is because it means different things to different people - it is not a well defined term across all communities. I'm glad that you find "georeference" ok.

Another is that a georeference construct is not, according to some definitions (such as ISO 19123, Geographic information -Schema for coverage geometry and functions) a CRS. (An ISO 19123 CRS is actually defined by a subset of the information contained in a georeference construct taken with the coordinate constructs to which it relates.)

All the best,

David

comment:61 in reply to: ↑ 54 Changed 3 years ago by biard

Replying to markh:

I have considered a proposed text for cell methods. I think the intent is correct in this text but I think it is a little too difficult to read.

With this in mind, I have prepared a draft for the cell methods text which I think conveys the same intent with a little more clarity.

I am still not sure that the opening sentence:

The cell methods construct describes the methods by which the data values of the field construct represent variation within their cells.

...

mark

Mark,

First a question, then some comments. Is the last line in your draft supposed to say "The cell methods construct" instead of "The cell measures construct"?

Now for the comments. As I look at it, the cell methods construct isn't so much a description of how the field construct represents variation as it is a description of how the values of the field construct were obtained from a source data field that may or may not be present within the data space. That is, the field construct under consideration may have been produced from another field construct found within the file, or file set, or cloud of field constructs (however you want to think of it) using the methods described within the cell methods construct, or it may have been produced from a field that was not preserved within whatever boundary exists for your problem space. I haven't said that succinctly, but I think that recognition of the derived nature of the field construct values is important. There's a further wrinkle in this that isn't being captured by the draft text, which is the ability to use the comment syntax to describe "fuzzy" domains for the action of the cell methods. I just went through a learning exercise about this in relation to a data set that has data values which are mins, means, and maxs over the domain of a weather system. There are no cells (in the regular grid way of thinking about them), and the '( ... comment: ...)' syntax is used to capture the specifics of the domain for the cell method.

So, what about a statement along the lines of:

The cell methods construct describes the methods by which the data values of the
field construct are derived from a source measurement field (which may or may
not be represented by an existing field construct).

This covers both the case where one field construct holds (for example) the standard deviations of the values in another field construct and the case where the field construct holds values derived from values outside the scope of the data set.

Grace and peace,

Jim

Last edited 3 years ago by biard (previous) (diff)

comment:62 in reply to: ↑ 60 ; follow-up: Changed 3 years ago by biard

Replying to davidhassell:

Replying to markh:

Dear Mark, Jim,

Replying to biard:

I get the feeling that your effort to put formula_terms and grid_mapping together in one construct definition may be what's making things awkward.

I agree with you Jim, I think this is awkward.

The functionality of CF-NetCDF formula_terms and grid_mappings may be simpler and clearer in the model if they are defined separately; they are quite different.

Ok. We should discuss this! I think that fundamentally they are the same, which why they should not be separated in the logical data model ... All the best,

David

As I read the definition and usage of formula_terms, I find that it is entirely and only a description of how to combine the values of a unitless coordinate variable with values from other variables to produce coordinate values that have units. It is possible that this "resultant coordinate" could be georeferenced, but that is something else. It is true that it is meaningless to associate a CRS with a unitless coordinate variable if you don't provide a description of how to get to values that can be georeferenced, but this doesn't mean that a mathematical transform equates on any level with a declaration of a CRS.

Grace and peace,

Jim

comment:63 in reply to: ↑ 58 Changed 3 years ago by biard

Replying to davidhassell:

Replying to biard:

Hi Jim,

Thanks for the points. I'll try to clarify. ... I think that the construct is neither documenting nor creating the geolocation. The information needed to geolocate the domain is split - some of it is contained by the coordinate constructs themselves (typically by the standard name and units) and some of it is contained elsewhere (typically by grid_mapping or formula_terms). It is this latter information which we are encapsulating in the georeference construct. ...

All the best,

David

David,

I'm trying to figure out if we have an actual disagreement about the purpose and meaning of the grid_mapping variable / construct, or if I am misunderstanding what you are saying. Do you agree that a grid_mapping variable declares the Geographic / Geodetic Coordinate Reference System (CRS) that one or more coordinate variables are defined with respect to?

Grace and peace,

Jim

comment:64 in reply to: ↑ 62 Changed 3 years ago by markh

Replying to biard:

As I read the definition and usage of formula_terms, I find that it is entirely and only a description of how to combine the values of a unitless coordinate variable with values from other variables to produce coordinate values that have units. It is possible that this "resultant coordinate" could be georeferenced, but that is something else. It is true that it is meaningless to associate a CRS with a unitless coordinate variable if you don't provide a description of how to get to values that can be georeferenced, but this doesn't mean that a mathematical transform equates on any level with a declaration of a CRS.

I agree with this perspective. I think two simple, flexible types for the data model will be a better long term solution here than one, more complicated, combined type.

I have written more on my reasoning, but this turned out far too long to post as a comment on the ticket, and Jim has put it far more succinctly than I have been able to. I have dropped my additional thoughts on a wiki page instead:

http://kitt.llnl.gov/trac/wiki/markhDataModelNotesCRS

comment:65 Changed 3 years ago by markh

Replying to biard:

First a question, then some comments. Is the last line in your draft supposed to say "The cell methods construct" instead of "The cell measures construct"?

Yes, my, copy, paste replace operation failed; thank you; I've updated this.

Now for the comments. ...

So, what about a statement along the lines of:

The cell methods construct describes the methods by which the data values of the
field construct are derived from a source measurement field (which may or may
not be represented by an existing field construct).

This covers both the case where one field construct holds (for example) the standard deviations of the values in another field construct and the case where the field construct holds values derived from values outside the scope of the data set.

I like this; I think it expresses the use of cell methods very nicely.

I wonder whether

source measurement field (which may or may not be represented by an existing field construct).

could be replaced by:

the core concept defined by the standard_name, long_name and units of the Field.

making the description more self-contained, without losing the semantic intent?

many thanks

mark

comment:66 follow-up: Changed 3 years ago by jonathan

Dear Jim and Mark

grid_mapping and formula_terms look rather different in CF-netCDF, but they do have a similar purpose. formula_terms tells you how to translate the vertical coordinate variable into coordinate values which "indicate the location of the data" (as it says in 4.3) in the vertical dimension, in practice meaning height or depth with respect to a geophysically defined reference surface, or pressure, which is (in some cases) a reasonable proxy for height or depth. grid_mapping tells you how to translate the horizontal coordinate variables into longitude and latitude, which have a special status in CF for providing horizontal location; as you know, we insist that it must be possible to locate the data in lat-lon if it has horizontal dimensions.

After grid_mapping was introduced, it was expanded to allow the ellipsoid to be defined, and as a result of recent discussion I've proposed it should allow the geoid to be identified too. At the moment, the ellipsoid and geoid definition cannot be applied to vertical coordinates, but it seems very likely this will become desirable, also as a result of recent discussions. For example, we can specify a vertical coordinate as height above the geoid, but we cannot specify precisely what the geoid is for the vertical coordinate, unless we allow grid_mapping to do that. So, in this respect too, vertical and horizontal coordinates will require similar treatment.

Given these two sorts of similarity, it seems logical to us that we should describe the two CF-netCDF mechanisms as different applications of the same logical construct. It doesn't matter that they are formally different. The formal difference arises partly because formula_terms applies to only one dimension but grid_mapping to two dimensions, and partly because they were designed at different times without seeing the larger picture. In writing down the logical model, we can take a step back to see that larger picture, and make it simpler as a result.

Cheers

Jonathan

comment:67 follow-up: Changed 3 years ago by jonathan

Dear Jim and Mark

The text David and I proposed was

The cell methods construct describes the methods by which the data values of the field construct represent variation within their cells.

and Jim proposes

The cell methods construct describes the methods by which the data values of the field construct are derived from a source measurement field (which may or may not be represented by an existing field construct).

With respect to the differences between these, I would comment:

  1. CF doesn't only deal with measurements (which sounds like obs), so I don't think the word "measurement" should appear.
  1. I think "derived" is too general. The reason for writing "variation" in our definition is because all of the cell methods do that, by calculating a statistic from the variation, or by indicating it has no relevant variation because it is a point (intensive) or a sum (extensive). I think this needs to be indicated by the wording. "Derived" could mean all sorts of mathematical operation which are not encompassed by cell_methods.
  1. I don't see the parenthesis as really necessary, and I'm not sure it's true. The source field may never have existed. If it's a point or sum, then probably there was no source field. This field is the data, and cell_methods tells us something about what it means.

Best wishes

Jonathan

comment:68 in reply to: ↑ 66 ; follow-up: Changed 3 years ago by biard

Replying to jonathan:

Dear Jim and Mark

grid_mapping and formula_terms look rather different in CF-netCDF, but they do have a similar purpose. formula_terms tells you how to translate the vertical coordinate variable into coordinate values which "indicate the location of the data" (as it says in 4.3) in the vertical dimension, in practice meaning height or depth with respect to a geophysically defined reference surface, or pressure, which is (in some cases) a reasonable proxy for height or depth. grid_mapping tells you how to translate the horizontal coordinate variables into longitude and latitude, which have a special status in CF for providing horizontal location; as you know, we insist that it must be possible to locate the data in lat-lon if it has horizontal dimensions.

After grid_mapping was introduced, it was expanded to allow the ellipsoid to be defined, and as a result of recent discussion I've proposed it should allow the geoid to be identified too. At the moment, the ellipsoid and geoid definition cannot be applied to vertical coordinates, but it seems very likely this will become desirable, also as a result of recent discussions. For example, we can specify a vertical coordinate as height above the geoid, but we cannot specify precisely what the geoid is for the vertical coordinate, unless we allow grid_mapping to do that. So, in this respect too, vertical and horizontal coordinates will require similar treatment.

Given these two sorts of similarity, it seems logical to us that we should describe the two CF-netCDF mechanisms as different applications of the same logical construct. It doesn't matter that they are formally different. The formal difference arises partly because formula_terms applies to only one dimension but grid_mapping to two dimensions, and partly because they were designed at different times without seeing the larger picture. In writing down the logical model, we can take a step back to see that larger picture, and make it simpler as a result.

Cheers

Jonathan

Jonathan,

The two elements have almost nothing whatsoever to do with each other, except in that they both deal (on entirely different levels) with coordinate data. I understand that you guys thought that there was back in the day, but I am convinced that this was an example of misunderstanding the problem space.

I see no damage being done by properly separating these two constructs in the data model, and great potential for better understanding and future growth. If we force the two concepts into a single construct, we will be making an "ad hoc aggregation" within the data model that is confusing and will cause headaches moving forward.

Grace and peace,

Jim

comment:69 in reply to: ↑ 67 Changed 3 years ago by biard

Replying to jonathan:

Dear Jim and Mark

The text David and I proposed was

The cell methods construct describes the methods by which the data values of the field construct represent variation within their cells.

and Jim proposes

The cell methods construct describes the methods by which the data values of the field construct are derived from a source measurement field (which may or may not be represented by an existing field construct).

With respect to the differences between these, I would comment:

  1. CF doesn't only deal with measurements (which sounds like obs), so I don't think the word "measurement" should appear.
  1. I think "derived" is too general. The reason for writing "variation" in our definition is because all of the cell methods do that, by calculating a statistic from the variation, or by indicating it has no relevant variation because it is a point (intensive) or a sum (extensive). I think this needs to be indicated by the wording. "Derived" could mean all sorts of mathematical operation which are not encompassed by cell_methods.
  1. I don't see the parenthesis as really necessary, and I'm not sure it's true. The source field may never have existed. If it's a point or sum, then probably there was no source field. This field is the data, and cell_methods tells us something about what it means.

Best wishes

Jonathan

Jonathan,

I'm not overly concerned with the specific wording used. The thing I was trying to get at was the question that the cell_methods attribute is answering, which seems to me to be, "What algorithm did I apply to get from a set of input values to the set of output values stored within this variable?" If the cell_method is "point", then the algorithm is a null transform. If the answer is sum, then there was a sum; if mean, an average; etc. In some cases the input values are found in another variable within the file. In other cases the input values are not found within the file, and may not have been preserved anywhere.

Describing cell_methods in entirely self-referential terms comes across to me as quite awkward, as you are trying to talk about an input-output process without mentioning the input.

Grace and peace,

Jim

comment:70 in reply to: ↑ 68 ; follow-up: Changed 3 years ago by jonathan

Replying to biard:

The two elements have almost nothing whatsoever to do with each other, except in that they both deal (on entirely different levels) with coordinate data. I understand that you guys thought that there was back in the day, but I am convinced that this was an example of misunderstanding the problem space.

I see no damage being done by properly separating these two constructs in the data model, and great potential for better understanding and future growth. If we force the two concepts into a single construct, we will be making an "ad hoc aggregation" within the data model that is confusing and will cause headaches moving forward.

Dear Jim

I don't think we misunderstand the problem. We have spent a long time thinking about it. The applications of grid_mapping and formula_terms are quite varied already, but (to restate my last posting), they have two things in common: (1) They relate one set of coordinates to another set of coordinates which CF/COARDS regards as geolocated (longitude, latitude and height or pressure - pressure has a special status in CF/COARDS), (2) They may need to refer to the reference ellipsoid or geoid (your contributions on the email list helped to clarify that this need applies to both vertical and horizontal coordinates).

We could have devised a convention to do the job of formula_terms with a netCDF construction involving a dummy variable, like grid_mapping, but we hadn't thought of that idea then (in the beginning of CF). If we had chosen to do so then, these two constructions would already look formally quite similar, I think, and then the idea of combining them might not appear so surprising. I agree of course that we do not have to represent these two parts of CF with a single logical construct, but I see it as a simplification which would avoid some redundancy e.g. from (2) above. Please could you explain what headache or confusion it will cause you?

Cheers

Jonathan

comment:71 in reply to: ↑ 70 ; follow-ups: Changed 3 years ago by biard

Replying to jonathan:

Replying to biard:

The two elements have almost nothing whatsoever to do with each other, except in that they both deal (on entirely different levels) with coordinate data. I understand that you guys thought that there was back in the day, but I am convinced that this was an example of misunderstanding the problem space.

I see no damage being done by properly separating these two constructs in the data model, and great potential for better understanding and future growth. If we force the two concepts into a single construct, we will be making an "ad hoc aggregation" within the data model that is confusing and will cause headaches moving forward.

Dear Jim

I don't think we misunderstand the problem. We have spent a long time thinking about it. The applications of grid_mapping and formula_terms are quite varied already, but (to restate my last posting), they have two things in common: (1) They relate one set of coordinates to another set of coordinates which CF/COARDS regards as geolocated (longitude, latitude and height or pressure - pressure has a special status in CF/COARDS), (2) They may need to refer to the reference ellipsoid or geoid (your contributions on the email list helped to clarify that this need applies to both vertical and horizontal coordinates).

We could have devised a convention to do the job of formula_terms with a netCDF construction involving a dummy variable, like grid_mapping, but we hadn't thought of that idea then (in the beginning of CF). If we had chosen to do so then, these two constructions would already look formally quite similar, I think, and then the idea of combining them might not appear so surprising. I agree of course that we do not have to represent these two parts of CF with a single logical construct, but I see it as a simplification which would avoid some redundancy e.g. from (2) above. Please could you explain what headache or confusion it will cause you?

Cheers

Jonathan

Jonathan,

I think you've put your finger directly on the disagreement I have with your approach. As I look at it, grid_mapping does not do the thing you describe in point 1, and formula_terms is not involved with the thing you describe in point 2.

X/Y coordinates and lat/lon coordinates are just two types of coordinates. Both need to be georeferenced. The grid_mapping doesn't describe a transform. It declares the CRS that your coordinates are defined with reference to - both X/Y and lat/lon. Formula_terms declares a transform - one that takes unitless values as input (along with other inputs) and outputs georeferenceable coordinate values. Continuing to insist that the grid_mapping is a transform obfuscates what it is really needed for (which isn't what you first thought), leading to confusion.

As for the headache, I have often found that when I model a system, if I lump elements together that should be separate it comes back to cause trouble later, forcing me to go back and refactor the model in order to move forward. That's a headache.

Grace and peace,

Jim

comment:72 in reply to: ↑ 71 ; follow-up: Changed 3 years ago by davidhassell

Replying to biard:

Dear Jim,

X/Y coordinates and lat/lon coordinates are just two types of coordinates. Both need to be georeferenced. The grid_mapping doesn't describe a transform. It declares the CRS that your coordinates are defined with reference to - both X/Y and lat/lon. Formula_terms declares a transform - one that takes unitless values as input (along with other inputs) and outputs georeferenceable coordinate values. Continuing to insist that the grid_mapping is a transform obfuscates what it is really needed for (which isn't what you first thought), leading to confusion.

I'd just like to make sure that the language is not obfuscating things. Our new georeference construct is neither a transform nor a coordinate reference system. Neither phrase appears in its definition, by design. It is, as it states, merely something which can provide information needed to locate spatial dimension and auxiliary coordinates. It does not proscribe what that information has to be.

As for the headache, I have often found that when I model a system, if I lump elements together that should be separate it comes back to cause trouble later, forcing me to go back and refactor the model in order to move forward. That's a headache.

That is indeed sound advice, but in this case I have found that the georeference construct is practicable in the cf-python library that I work on.

All the best,

David

comment:73 in reply to: ↑ 72 Changed 3 years ago by biard

Replying to davidhassell:

Replying to biard:

Dear Jim,

X/Y coordinates and lat/lon coordinates are just two types of coordinates. Both need to be georeferenced. The grid_mapping doesn't describe a transform. It declares the CRS that your coordinates are defined with reference to - both X/Y and lat/lon. Formula_terms declares a transform - one that takes unitless values as input (along with other inputs) and outputs georeferenceable coordinate values. Continuing to insist that the grid_mapping is a transform obfuscates what it is really needed for (which isn't what you first thought), leading to confusion.

I'd just like to make sure that the language is not obfuscating things. Our new georeference construct is neither a transform nor a coordinate reference system. Neither phrase appears in its definition, by design. It is, as it states, merely something which can provide information needed to locate spatial dimension and auxiliary coordinates. It does not proscribe what that information has to be.

As for the headache, I have often found that when I model a system, if I lump elements together that should be separate it comes back to cause trouble later, forcing me to go back and refactor the model in order to move forward. That's a headache.

That is indeed sound advice, but in this case I have found that the georeference construct is practicable in the cf-python library that I work on.

All the best,

David

David,

The problem is, as far as I can see, the new georeference construct that you are proposing is creating an ad-hoc aggregation of two completely different constructs, one of which does have something to do with georeferencing, and one of which does not.

Grace and peace,

Jim

comment:74 in reply to: ↑ 71 ; follow-up: Changed 3 years ago by jonathan

Dear Jim

Replying to biard:

I think you've put your finger directly on the disagreement I have with your approach. As I look at it, grid_mapping does not do the thing you describe in point 1 [i.e. georeferencing], and formula_terms is not involved with the thing you describe in point 2 [i.e. defining a geophysical surface].

  1. Latitude and longitude are implicitly georeferenced - not precisely, without stating the ellipsoid, but with sufficient definition for many purposes, especially in the spherical GCM world. Projection coordinates on a Cartesian plane, however, are not at all georeferenced without the projection information. The grid_mapping provides that information.
  1. Vertical coordinates in CF, identified by standard_names, are referred to a geophysically defined surface e.g. height_above_reference_ellipsoid. For some purposes, it may be necessary to specify precisely what that surface is. We cannot currently do this in CF for vertical coordinates, but I think it is highly likely that we will want to do it, since the issue has already been raised. We could do it using a grid_mapping, and then a vertical coordinate would require both formula_terms and grid_mapping for precise georeferencing.

In summary, both grid_mapping and formula_terms have function (1), for different kinds of coordinate. grid_mapping has function (2), which is currently allowed only for horizontal coordinates. This mismatch between purposes and CF-netCDF constructs arises from the history of the convention. If we were starting from scratch, I think we would have a netCDF construct like grid_mapping, but extended, for both horizontal and vertical coordinates, and that would simpler. Essentially, that is what we are proposing for the logical data model.

Cheers

Jonathan

comment:75 follow-up: Changed 3 years ago by markh

A posting on behalf of my colleague:

Jim, I am not able to post directly to the mailing list and Trac system, but I actually liked the word 'measurement' even though this invokes the idea of observations.

You all are talking about an abstract conceptual model, and the distinction between an 'ob' and a NWP field value is not as great as Jonathon seems to imply. There are many NWP products for customers that are 'virtual forecast obs'. There are two distinct aspects: the generating process and the topological structure of the data. It seems to me that you are discussing the value and its relationship to the immediately surrounding grid points, which is surely part of the second aspect.

In the underlying conceptual model, METCE, created by WMO and OGC, led by Jeremy Tandy, all forecasts are considered estimates (measurements!) of an underlying theoretical value.

Also NetCDF is used to represent observations on a regular grid - these are usually called images, though this is sometime misleading too.

Can I suggest the wording:

The 'cell methods construct' describes the methods by which the data values of the 'field construct' are derived from a source estimation field (which may or may not be represented by an existing 'field construct').

HTH, Chris

comment:76 in reply to: ↑ 75 Changed 3 years ago by jonathan

As Mark and Jim know, I sent the following reply to Chris by email:

Thanks for your email. I am pretty sure that "measurement" would be a confusing word to use for model data, because people never describe simulated values as measurements. That's we chose the more agnostic "data value". I understand your point but I imagine that many users would not really understand what was meant by "source estimation field".

In more words, the idea of cell_methods is this: Each cell has a data value. However, if the quantity itself is continuously varying as a function of the independent variables, that data value is somehow representing all the values within the cell. The cell methods says what statistic of the variation is being used. It could be a point value, meaning it's a just one value from all those within the cell. It could be a sum, meaning it's an integral over all the cell (like rainfall accumulation over time). It could be some other statistic, such as mean or maximum. Maybe we should just spend more sentences on it, instead of trying to get away with a single sentence.

comment:77 in reply to: ↑ 74 ; follow-up: Changed 3 years ago by biard

Replying to jonathan:

Dear Jim

Replying to biard:

I think you've put your finger directly on the disagreement I have with your approach. As I look at it, grid_mapping does not do the thing you describe in point 1 [i.e. georeferencing], and formula_terms is not involved with the thing you describe in point 2 [i.e. defining a geophysical surface].

  1. Latitude and longitude are implicitly georeferenced - not precisely, without stating the ellipsoid, but with sufficient definition for many purposes, especially in the spherical GCM world. Projection coordinates on a Cartesian plane, however, are not at all georeferenced without the projection information. The grid_mapping provides that information.
  1. Vertical coordinates in CF, identified by standard_names, are referred to a geophysically defined surface e.g. height_above_reference_ellipsoid. For some purposes, it may be necessary to specify precisely what that surface is. We cannot currently do this in CF for vertical coordinates, but I think it is highly likely that we will want to do it, since the issue has already been raised. We could do it using a grid_mapping, and then a vertical coordinate would require both formula_terms and grid_mapping for precise georeferencing.

In summary, both grid_mapping and formula_terms have function (1), for different kinds of coordinate. grid_mapping has function (2), which is currently allowed only for horizontal coordinates. This mismatch between purposes and CF-netCDF constructs arises from the history of the convention. If we were starting from scratch, I think we would have a netCDF construct like grid_mapping, but extended, for both horizontal and vertical coordinates, and that would simpler. Essentially, that is what we are proposing for the logical data model.

Cheers

Jonathan

Jonathan,

I can see your point about latitude and longitude being coarsely georeferenced (assuming they aren't rotated), and it's quite clear that in the history of all this, it was assumed that lat, lon, and height were de facto georeferenced. This was, clearly, good enough for the GCM world. Working from this background, it's easy to see why grid_mapping was thought of as a transform. I don't, however, think it's a good idea to continue to think of it this way. I'm in full agreement that grid_mapping needs to be extended to allow definition of a fully 3D CRS.

Regarding your inclusions i.e. georeferencing? and i.e. defining a geophysical surface? in your reply to my reply (to your reply...):

Grid_mapping does accomplish georeferencing. It georeferences all the the georeferenceable coordinate variables, both X/Y and lon/lat, assuming it declares a projected or Cartesian CRS. (If it's a lon/lat CRS, then there won't be any X/Y coordinate variables.)

Formula_terms doesn't define a geophysical surface. As stated in the CF standard formula_terms provides "a mapping between the dimensionless coordinate values and dimensional values that can be uniquely located with respect to a point on the earth's surface." (Emphasis mine.) The output of applying formula_terms is georeferenceable. This input isn't.

This is what it boils down to in my view:

The contents of a variable that are dimensionless vertical coordinate values are not, themselves, georeferenceable (sigma coordinates, etc). They can be used as one of the inputs to the transform/function declared by the formula_terms attribute, and the outputs are georeferenceable. The outputs require a declaration of the CRS used in order to fix their locations relative to the body of the Earth. The historical view of the output values was that they were implicitly georeferenced, which was considered good enough for many purposes, and often was/is.

The contents of a variable that are horizontal projection coordinate values are, themselves, georeferenceable. They require a declaration of the CRS used in order to fix their locations relative to the body of the Earth. Longitudes and latitudes within the declared CRS may be obtained from the contents of a pair of X and Y coordinate variables using transforms that are associated with, but not provided by the grid_mapping associated with the X and Y coordinate variables.

The contents of a variable that are latitude or longitude values are, themselves, georeferenceable. They require a declaration of the CRS used in order to fix their locations relative to the body of the Earth. The historical view of the values was that they were implicitly georeferenced, which was considered good enough for many purposes, and often was/is.

If you don't assume that georeferenceable equates to georeferenced, then I think it's clear that formula_terms does not equate to grid_mapping in functionality. CRSs are a complicated topic, and there are subtleties and variations that the above descriptions gloss over (as we've seen from the CRS discussions), but I don't think any of those variations negate the point that formula_terms transforms non-georeferenceable values into georeferenceable values, while grid_mapping (if I, for the sake of argument, accept the premise) bidirectionally transforms one set of georeferenceable values into another set of georeferenceable values.

Even if you had used the dummy variable formalism for formula_terms (which would have been great!), it wouldn't make formula_terms any more equivalent logically to grid_mapping.

Grace and peace,

Jim

comment:78 in reply to: ↑ 77 Changed 3 years ago by markh

Replying to biard:

Hello Jim

I think your use of the terms georeferenceable and georeferenced is very helpful in this discussion. It helpfully illustrates the difference that I agree should exist in the data model.

Coordinates, collections of values, are georeferenceable, in principal. To explicitly georeference a coordinate requires a coordinate reference system. These are separate concepts.

all the best mark

comment:79 in reply to: ↑ 55 ; follow-ups: Changed 3 years ago by jonathan

Dear Jim

Both grid_mapping and formula_terms can be described logically by the text which David posted. For instance,

  • a vertical georeference construct might apply to an atmosphere hybrid height coordinate, in which case it would contain the a (dimensions of height) and b (dimensionless) auxiliary coordinate constructs, a field of surface height above the geoid, the auxiliary coordinate construct of height above the geoid (if it exists in the dataset), and optionally the name of the geoid (if we permit this extension). Its purpose is to relate the a and b, which by themselves do not geolocate the data, to the height above the geoid, which is a geophysically defined surface whose precise realisation might be specified as well.
  • a horizontal georeference construct might apply to a rotated latitude-longitude coordinate system, in which case it would contain the rotated latitude and longitude coordinate constructs, the (unrotated, geographical) latitude and longitude of the rotated pole, the auxiliary coordinate constructs of latitude and longitude, and optionally the definition of the reference ellipsoid. Its purpose is to relate the rotated latitude and longitude, which by themselves do not geolocate the data, to geographical latitude and longitude, which do geolocate the data, optionally with the ellipsoid to make their definition more precise.

What is to be gained in clarity or simplicity by regarding these two things as distinct concepts? We don't talk about transforms or coordinate reference systems because these terms have technical and sometimes restrictive meanings, and therefore may make the description less clear.

Best wishes

Jonathan

comment:80 Changed 3 years ago by edavis

I'm not remembering or finding any mention in this discussion of the OGC CF-netCDF implementation specification (11-165r2). Stefano Nativi and Ben Domenico have put a lot of work into developing this mapping of CF-netCDF into the various OGC/ISO abstract models. Seems like it might be useful especially in terms of the recent grid mapping / dimensionless vertical coordinates / CRS discussions given that OGC has a lot of experience with CRS and is more recently trying to include concepts to cover dimensionless vertical coordinates (sorry, not finding that specification).

Hope that's helpful,

Ethan

comment:81 in reply to: ↑ 79 Changed 3 years ago by biard

Replying to jonathan:

Dear Jim

Both grid_mapping and formula_terms can be described logically by the text which David posted. For instance,

  • a vertical georeference construct might apply to an atmosphere hybrid height coordinate, in which case it would contain the a (dimensions of height) and b (dimensionless) auxiliary coordinate constructs, a field of surface height above the geoid, the auxiliary coordinate construct of height above the geoid (if it exists in the dataset), and optionally the name of the geoid (if we permit this extension). Its purpose is to relate the a and b, which by themselves do not geolocate the data, to the height above the geoid, which is a geophysically defined surface whose precise realisation might be specified as well.
  • a horizontal georeference construct might apply to a rotated latitude-longitude coordinate system, in which case it would contain the rotated latitude and longitude coordinate constructs, the (unrotated, geographical) latitude and longitude of the rotated pole, the auxiliary coordinate constructs of latitude and longitude, and optionally the definition of the reference ellipsoid. Its purpose is to relate the rotated latitude and longitude, which by themselves do not geolocate the data, to geographical latitude and longitude, which do geolocate the data, optionally with the ellipsoid to make their definition more precise.

What is to be gained in clarity or simplicity by regarding these two things as distinct concepts? We don't talk about transforms or coordinate reference systems because these terms have technical and sometimes restrictive meanings, and therefore may make the description less clear.

Best wishes

Jonathan

Jonathan,

It's clear that we are not going to come to an agreement on this, and I'm not the one bearing the burden of trying to write the model, so this will be the last posting I'll make about the georeferencing construct. I think that clarity and simplicity is exactly what you gain by keeping formula_terms and grid_mapping as distinct concepts. Your avoidance of talking about coordinate reference systems just makes the whole thing even more unclear.

The formula_terms attribute, if not restricted to producing georeferenceable values, could be easily used for non-georeferencing purposes. It has no intrinsic affinity with georeferencing. Grid_mapping, on the other hand, is clearly a georeferencing construct, and doesn't lend itself to other uses. What greater clarity or simplicity is obtained by coming up with a construct that attempts to put these two simple yet different concepts into a single genericized one?

Grace and peace,

Jim

comment:82 in reply to: ↑ 79 Changed 3 years ago by markh

Replying to jonathan: Hello Jonathan

Both grid_mapping and formula_terms can be described logically by the text which David posted. For instance,

I can see that this can be done, the important question for me is should it be done?

Where are the benefits and where are the costs?

I do not perceive the benefits of this conflation to form a new geolocation type, based on the comments to date.

This conflation is quite a jump from the current CF conventions terminology, which concerns me; I think it is difficult to explain.

What is to be gained in clarity or simplicity by regarding these two things as distinct concepts? We don't talk about transforms or coordinate reference systems because these terms have technical and sometimes restrictive meanings, and therefore may make the description less clear.

A particular benefit I see in having two separate concepts in the CF data model is interoperability with other domains and concepts.

If a concept in the CF data model cleanly and clearly relates to a concept in another domain then this is a big supporting factor for interoperability.

I think this is the case here: the georeferencing capability provided by a coordinate reference system type of thing in CF is very closely aligned to the ISO19111 definition of a coordinate reference system. This brings huge benefits in providing interoperability with other ISO aware communities, such as the OGC, as highlighted by edavis. I think this is the approach Stefano has taken with the OGC to date. The documents he has written link grid_mapping variables to OGC CRS instances.

The derivation of coordinate values, perhaps for georeferencing purposes, based on bespoke defined algorithms and reference data sets is a much more specialised field, which is not widely used in the ISO and OGC communities (for example).

Conflating the well known georeferencing function of coordinate reference systems with these derived coordinate functionality in a single data model type is making it very hard for that type to be understood and used effectively by other communities.

It also makes it much harder for CF to adopt useful building blocks for other communities, as the interfaces are all in different places.

So, the cost of the approach of conflating these concepts is it isolates the CF community from other communities, just when we and they stand to benefit so much from better interoperability.

The benefit has to significantly outweigh this cost, and I am afraid I cannot see this.

With this in mind I strongly favour the separation of these concepts within the data model. Whilst they can be logically conflated, I think the cost is prohibitive and the benefit minimal.

all the best mark

comment:83 Changed 3 years ago by jonblower

Hi all,

Just to add a vote to this debate having talked off-line a while ago with Jonathan and David, and having read Mark and Jim's comments above. I'm persuaded that formula_terms and grid_mapping are doing different jobs. Formula_terms turns dimensionless coordinate values into dimensional ones; grid_mapping adds information about CRSs that is not already implicitly or explicitly present. You might need both, for example to turn dimensionless vertical coordinates into heights (using formula_terms) and then provide the ellipsoid (using grid_mapping).

Therefore I don't see much benefit in conflating these orthogonal concepts.

Cheers, Jon

comment:84 Changed 3 years ago by jonathan

Dear Jon

Thanks for thinking about it.

Although formula_terms was introduced to turn vertical dimensionless coordinats into dimensional ones, there are cases (e.g. hybrid height) where some of the input has physical dimensions, and perhaps others will be added too. Hence its title is out of date and Appendix D should be renamed.

The purpose of formula_terms is stated in 4.3.2 as to "provide a mapping between the dimensionless coordinate values and dimensional values that can positively and uniquely indicate the location of the data". But dealing with "dimensionless" coordinates is not the main purpose of it, so this statement is not quite right and not clear enough, I would say; it could for instance be modified to read "provide a mapping between the values of the one-dimensional variables containing vertical coordinate values (which may be dimensionless) and dimensional values that can uniquely indicate the vertical location of the data" to describe the current situation.

The purpose of grid_mapping is described in 5.6 as "to describe the mapping between the given [horizontal] coordinate variables and the true latitude and longitude coordinates". Would you not agree there is a similarity of purpose? In both cases, the 1D coordinate values are not georeferenced, and the intention is to make a link to other values which are georeferenced (at least approximately). I think putting together two constructs which have a similar logical purpose, although they are achieved with different netCDF mechanisms, is a simplification in logical terms. What do you or others see as the logical distinction between these purposes, apart from one being vertical and the other horizontal?

The reference ellipsoid has two distinct kinds of purpose. (1) You need it to convert between projection coordinates and lat-lon coordinates. (2) You need it (as you have explained) to assign the height, latitude and longitude of any point which is not actually on the ellipsoid, since this is done by dropping a perpendicular to the surface. We could put purpose (1) together with formula_terms and the projection part of grid_mapping, and identify purpose (2) as a different logical construct, which will probably also be needed for vertical coordinate variables (that hasn't been proposed yet). But that separation would be inconvenient because it's the same information needed for both purposes, and the purposes are not distinguished in CRS descriptions.

I don't think that merging formula_terms and grid_mapping into one logical construct gives any problem for interoperability with other standards. Relevant elements of this construct, including ellipsoid and map projection definitions, map onto CRS descriptions quite straightforwardly, as Etienne Tourigny showed. I agree that interoperability is important. What specific problems can other people see with this approach?

Cheers

Jonathan

comment:85 Changed 2 years ago by davidhassell

Hello,

To those who were at the GO-ESSP meeting this week, many thanks for listening to my talk on the CF data model and for your constructive feedback.

The latest, full description of the data model that I presented can be found at http://www.met.reading.ac.uk/~david/cfdm_0.8.html

All the best,

David

Note: See TracTickets for help on using tickets.