Opened 5 years ago

Closed 9 months ago

Last modified 9 months ago

#88 closed task (fixed)

Terms of Reference for the CF Data Model

Reported by: markh Owned by: davidhassell
Priority: medium Milestone: Unstructured Grid Data Model
Component: cf-conventions Version:
Keywords: data model, scope Cc: d.c.hassell@…, edward.campbell

Description

Objective

The purpose of this ticket is to agree the scope and terms of reference for the CF data model.

Proposal

Scope, Terms and Conditions

  1. The CF community will adopt a data model as part of the CF Metadata Project.
  2. The data model will be a complementary resource to the:
    • CF Conventions Document
    • CF Standard Name Table
    • CF Conformance Requirements & Recommendations
    • Guidelines for Construction of CF Standard Names
  3. The data model will be maintained by the community, using the same mechanisms as are used for the conventions, conformance and standard_name documents.
  4. The data model, once it has reached v1.0, will be consistent with the CF Conventions Document.
    • This consistency will be maintained.
      • Changes to the specification should be evaluated to determine whether they are consistent with the data model: if inconsistencies exist, these should be addressed, either by altering the specification change proposal or by proposing a change to the data model.
  5. The scope of the data model is to define the concepts of CF and the relationships that exist between these concepts.
  6. The data model provides a logical abstraction of the concepts defined by CF, independent of implementation details.
  7. The data model does not define the interface to CF.

Benefits

The data model is believed to offer the following benefits providing:

  • an orientation guide to the CF Conventions Document
  • a guide to the development of software compatible with CF
  • a reference point for gap analysis and conflict analysis of the CF specification
  • a communication tool for discussing CF concepts and proposals for changes to the CF specification

Change History (25)

comment:1 Changed 5 years ago by davidhassell

  • Cc d.c.hassell@… added

comment:2 follow-up: Changed 5 years ago by davidhassell

Hello Mark,

Thanks for putting this together. I support this proposal's scope and benefits as you have described them, but I will, of course, continue to think about it, particularly the implications on governance and consistency with the CF conventions.

From my personal experience as a software developer, I have found the proposed CF data model very useful when designing the cf-python package. I hope that it will facilitate the creation of an API which 'behaves/feels like CF', and therefore should be, at some level, intuitive so use.

All the best,

David

comment:3 follow-up: Changed 5 years ago by jonathan

Dear Mark

Thank you for taking the initiative in this ticket and ticket 68. I agree with the aims you have outlined for having a data model. While I agree that the data model does not determine the API, the two are linked, as David suggests. At least through having a common data model different APIs would probably be easier to relate.

On point 4, I am unclear how the data model and its consistency with the standards document would be maintained. I suspect that that task would have to be delegated by the CF conventions committee to a permanent subcommittee, because I do not think that all proposers of changes to the convention in trac tickets would feel able or willing to address this aspect. To require it would be too onerous, I think, and would impede changes to the convention. However, all proposers of changes should be encouraged to bear the data model in mind, and it would be useful in the discussions about their proposals.

Another question of principle is whether the data model is for CF, or for CF-netCDF. I think it should be for the former, as in ticket 68. That is, we mention in the data model how the concepts are embodied in netCDF, but the concepts themselves are independent of netCDF and could sometimes be more general. This question is relevant to how this data model relates to OGC CF-netCDF data model of Domenico and Nativi, and to the Unidata CDM.

Best wishes

Jonathan

comment:4 in reply to: ↑ 2 Changed 5 years ago by markh

Replying to davidhassell:

From my personal experience as a software developer, I have found the proposed CF data model very useful when designing the cf-python package. I hope that it will facilitate the creation of an API which 'behaves/feels like CF', and therefore should be, at some level, intuitive so use.

This is a useful point. I wonder whether to capture this in the benefits section:

Benefits

The data model is believed to offer the following benefits:

  • Providing an orientation guide to the CF Conventions Document
  • Guiding the development of software compatible with CF
  • Facilitating the creation of an Application Programming Interface which 'behaves/feels like CF' and is intuitive to use.
  • Providing a reference point for gap analysis and conflict analysis of the CF specification
  • Providing a communication tool for discussing CF concepts and proposals for changes to the CF specification

comment:5 in reply to: ↑ 3 Changed 5 years ago by markh

Replying to jonathan:

Thank you for taking the initiative in this ticket and ticket 68.

My pleasure, I think there is real value to be added here.

I agree with the aims you have outlined for having a data model. While I agree that the data model does not determine the API, the two are linked, as David suggests. At least through having a common data model different APIs would probably be easier to relate.

I agree. I think it is helpful to be clear that there are links, while stating that agreement on the data model details is not the final word on the API.

On point 4, I am unclear how the data model and its consistency with the standards document would be maintained. I suspect that that task would have to be delegated by the CF conventions committee to a permanent subcommittee, because I do not think that all proposers of changes to the convention in trac tickets would feel able or willing to address this aspect. To require it would be too onerous, I think, and would impede changes to the convention. However, all proposers of changes should be encouraged to bear the data model in mind, and it would be useful in the discussions about their proposals.

I think a permanent subcommittee would be a useful approach to this. I agree that there is a risk that if it is left to proposers then it might impede convention changes; I think there is also a risk that convention changes would occur, inconsistent with the model, which would not fit with the terms of reference as they have been stated here.

I think listing approaches to mitigate identified risks would be a useful addition to the terms of reference: a subcommittee to manage the relationship between the data model and the specification would be a good entry on this list.

Another question of principle is whether the data model is for CF, or for CF-netCDF. I think it should be for the former, as in ticket 68. That is, we mention in the data model how the concepts are embodied in netCDF, but the concepts themselves are independent of netCDF and could sometimes be more general. This question is relevant to how this data model relates to OGC CF-netCDF data model of Domenico and Nativi, and to the Unidata CDM.

We need to be very clear on this. I think the particular added value is in abstracting away from the CF-NetCDF implementation details to a logical model. I am very keen to define the data model for CF.

The relationship between this CF data model and the OGC CF-NetCDF data model should be made clear, perhaps in due course, rather than up front, we may want to mull this over some more.

If this model is agreed to be 'implementation agnostic CF' then the relationship to the OGC's CF-NetCDF data model can be developed and agreed between the two groups. Perhaps this relationship should also be the responsibility of the same subcommittee.

comment:6 Changed 5 years ago by markh

I feel it is worth adding the statement to the terms of reference (as a sub point of point 4):

  • The work to develop the CF logical data model will be based on the CF Specification, version 1.6.

We may want to maintain the same versioning scheme as the standard, such that the agreed end of the development work is te CF data model version 1.6 (rather than 1.0)

This may aid clarity

comment:7 Changed 5 years ago by markh

I received the following comment via email:

Just a quick aside comment - I think it would be good to have the phrase 'implementation neutral' (or similar) somewhere in the description of the CF data model.

I think you say something like "without implementation details", but neutrality is even more explicit.

Dom

comment:8 in reply to: ↑ description ; follow-up: Changed 5 years ago by markh

To restate the proposal, including amendments from comments

Objective

The purpose of this ticket is to agree the scope and terms of reference for the CF data model.

Proposal

Scope, Terms and Conditions

  1. The CF community will adopt a data model as part of the CF Metadata Project.
    • This will be administered by a new subcommittee
  2. The data model will be a complementary resource to the:
    • CF Conventions Document
    • CF Standard Name Table
    • CF Conformance Requirements & Recommendations
    • Guidelines for Construction of CF Standard Names
  3. The data model will be maintained by the community, using the same mechanisms as are used for the conventions, conformance and standard_name documents.
  4. The data model, once it has reached v1.0, will be consistent with the CF Conventions Document.
    • The work to develop the CF logical data model will be based on the CF Specification, version 1.6
    • This consistency will be maintained for future version of the CF specification:
      • Changes to the specification should be evaluated to determine whether they are consistent with the data model: if inconsistencies exist, these should be addressed, either by altering the specification change proposal or by proposing a change to the data model;
      • This consistency is the responsibility of the data model subcommittee.
  5. The scope of the data model is to define the concepts of CF and the relationships that exist between these concepts.
  6. The data model provides a logical, implementation neutral, abstraction of the concepts defined by CF.
  7. The data model does not define the interface to CF.

Benefits

The data model is believed to offer the following benefits providing:

  • Providing an orientation guide to the CF Conventions Document
  • Guiding the development of software compatible with CF
  • Facilitating the creation of an Application Programming Interface which 'behaves/feels like CF' and is intuitive to use.
  • Providing a reference point for gap analysis and conflict analysis of the CF specification
  • Providing a communication tool for discussing CF concepts and proposals for changes to the CF specification

comment:9 Changed 5 years ago by markh

Comments received via email:

Ben Domenico

Hi Jonathan, David, Mark, John,

As you might have guessed, Stefano and I are very much committed to publishing a CF data model and have drafted one in the OGC Standards Working Group (SWG). I'll give you my reaction to your effort and Stefano can add to it or correct it.

Philosophically our approach in the CF-netCDF SWG has been to take what's been done in the netCDF and CF world and cast in into a form required by the OGC. For some time, I had hoped that we could just use the existing netCDF specifications or those formalized by the NASA SPG, but that shortcut simply has not worked out, so we end up attempting to convey precisely the same information in OGC form. So far, it has worked out well, but it turns out to be more time consuming than our optimistic early estimates. The main point is that we are trying to take what the netCDF and CF communities have created and cast it in a form that can be adoped as a set of OGC standards. The thought is that this approach will ensure that the two do not diverge.

As it stands now, I agree with the comments David made in his comparison, namely, the main differences between the proposed CF data model and the draft OGC CF Conventions extension to the netCDF core are:

-- the OGC work defines the CF data model as an extension to the netCDF data model whereas the CF proposal defines it as an independent abstract spec.

-- the OGC work is based on CF 1.6 whereas the CF proposal is based on 1.5 so the former includes the sampling geometries and the latter does not.

So there are differences between the two specs but no obvious "conflicts" between them. For our objectives at the moment, that seems fine to me.

At some time in the future, if your work on the independent CF data model spec is adopted by the CF community, it might be worth considering bringing it to the OGC as well. Obviously now is not the time to take that route. However it could eventually become even more important as groups like OPeNDAP become more active in the OGC realm. They might appreciate having a free-standing CF data model. Alas we do not always get to choose the order in which these things happen, but I see no major difficulties with the two efforts continuing in parallel -- especially if we continue to keep each other informed.

Stefano, please clarify or correct my comments as you see fit. You have been delving much more into the details (where the devils tend to lurk) than I.

Many thanks for calling this to our attention again.

-- Ben

Stefano Nativi

Hi all,

Indeed, our OGC work extends the netCDF data model specification by adding the semantics introduced by the CF 1.6 conventions -especially to encode the ISO/OGC required information.

Naturally, I agree with you about the utility to keep each others informed about the ongoing efforts.

Thank you for having raised that.

Stefano

comment:10 Changed 5 years ago by edward.campbell

  • Cc edward.campbell added

comment:11 Changed 5 years ago by markh

Additional Benefit, suggested by Karl Taylor:

  • documenting the data model will set the ground work to expand CF beyond netCDF files.

comment:12 Changed 5 years ago by davidhassell

Comments from Karl

Dear all,

I have just commented on ticket 68, and will repeat a couple of things here.

I generally support going forward with development of a data model and these terms of reference seem to cover most of what's needed.

I'm worried about multiple data models.

Also, what good is a data model without an accompanying set of controlled vocabularies? If everyone can use different vocabularies, then how can software be developed to interpret the data ingested, even if it conforms to the data model? I'm not talking about the names of the attributes, but the allowed values of the attributes that are specified in the conventions and standard names. I'm sure I must be missing something fundamental and obvious here, but if not, I think the terms of reference should indicate something about how the controlled vocabularies (for things like cell methods, for example) get specified.

thanks,

Karl

comment:13 Changed 5 years ago by davidhassell

Another benefit of a data model is that it facilites the development of general procedures for manipulating CF fields.

A good example of this is the CF aggregation rules (ticket #78). Since these are based on the draft CF data model, they apply equally to CF data in any file format and may be included easily in any API which is based on the CF data model.

All the best,

David

comment:14 Changed 5 years ago by jonathan

Dear Karl

Thanks for your comments and support for the principle.

I agree that controlled vocabulary is necessary for the netCDF convention. In the data model I do not think we need to prescribe how attribute values are specified. We could simply indicate the need for it, where appropriate. For example, we might wish to have multi-language support, in which case there could be more than one standard_name table, but this would not make any logical difference to the data model. By not being too prescriptive, we can keep the data model general.

Cheers

Jonathan

comment:15 follow-up: Changed 5 years ago by jonathan

Dear all

Since this ticket currently has no moderator, I will do that, by agreement with Mark.

The proposal is to define an abstract CF data model, which will be an additional document in the CF convention, maintained in parallel to the CF standard document, standard name table and conformance document. Various benefits have been outlined.

It appears from the discussion that the proposal is generally acceptable in the form as revised by Mark at https://cf-pcmdi.llnl.gov/trac/ticket/88#comment:8, noting Karl's comment for the need to consider the role of controlled vocabulary, and the additional benefit that Karl pointed out at https://cf-pcmdi.llnl.gov/trac/ticket/88#comment:11. That comment and a couple of others have drawn attention to the fact that the CF data model does not refer to netCDF or any other file format. Independence of file format distinguishes it from the CF-netCDF data model and the CDM.

Enough support has been given already for the proposal to be accepted, and there are no outstanding objections. Since this is a significant change to the way the CF convention is conceived, it would be useful to see any other expressions of support or concerns. If there are no more concerns expressed, the proposal will be accepted in three weeks, according to the rules.

Jonathan

comment:16 follow-up: Changed 5 years ago by mgschultz

I give my support to this proposal.

However, one comment related to point #4: It currently reads: "The data model, once it has reached v1.0, will be consistent with the CF Conventions Document.

This consistency will be maintained.

Changes to the specification should be evaluated to determine whether they are consistent with the data model: if inconsistencies exist, these should be addressed, either by altering the specification change proposal or by proposing a change to the data model."

Further, it has been proposed to establish a separate sub-committee to maintain the data model. This, I think, is not ideal: in my view there should be the possibility that the data model actually influences the convention document (see also track ticket #90). If we identify situations where the data model can be kept much simpler but this would require a change in the convention (e.g. make some rules stricter), then I advocate that a clean and easy data model should take precedence over the convention text (after suitable discussion, of course). If a sub-committee is established, it may be difficult to ensure the consistency. I'd rather see the data model being discussed alongside with the convention changes. If this is done via track tickets, it shall be easy enough to pick your discussion of interest.

comment:17 in reply to: ↑ 16 Changed 5 years ago by markh

Replying to mgschultz:

I give my support to this proposal.

thank you

However, one comment related to point #4: It currently reads: "The data model, once it has reached v1.0, will be consistent with the CF Conventions Document.

This consistency will be maintained.

Changes to the specification should be evaluated to determine whether they are consistent with the data model: if inconsistencies exist, these should be addressed, either by altering the specification change proposal or by proposing a change to the data model."

Further, it has been proposed to establish a separate sub-committee to maintain the data model. This, I think, is not ideal: in my view there should be the possibility that the data model actually influences the convention document (see also track ticket #90). If we identify situations where the data model can be kept much simpler but this would require a change in the convention (e.g. make some rules stricter), then I advocate that a clean and easy data model should take precedence over the convention text (after suitable discussion, of course).

You make a very valid point here. I think that the data model will influence the specification and I do not think it should be implied otherwise.

I suggest an additional point under '4.', to read:

The data model may drive changes to the conventions specification. In this case proposed changes to specification will be discussed using the current mechanisms and ratified by the conventions subcommittee alongside a change to the data model.

Any such changes will happen in step, with the data model and specification being published together to ensure ongoing consistency.

If a sub-committee is established, it may be difficult to ensure the consistency. I'd rather see the data model being discussed alongside with the convention changes. If this is done via track tickets, it shall be easy enough to pick your discussion of interest.

I think it is very important that discussions around the data model take place alongside the conventions and vocabulary discussions, no artificial barriers should be erected, dividing the community.

In my view the committee is useful to provide the impetus to make this activity happen and maintain active involvement in the development of the model. The key to all these activities is the CF community and discussions need to take place where any interested party may get involved. The wide range of interests in the community is a real asset and we must do all we can to involve everyone in this work.

I think that providing some distinct responsibilities by having subcommittees is helpful, but it is more of an administrative function, participation must be open to all. I think your concerns should be highlighted; they describe a particular set of risks which need to be mitigated.

Perhaps we could add some statements to the Terms of Reference, such as:

It is the responsibility of the subcommittee to involve the community in all discussions and liaise closely with the conventions subcommittee to ensure that the model and conventions are two facets of the same entity.

comment:18 Changed 5 years ago by markh

I have also considered the numbering of the data model versions. I suggest that the data model is versioned as a parallel document to the conventions document, to clearly show which version of the conventions the data model is consistent with.

As such, I suggest that the subcommittee should aim to continue the work of #68, with the whole community, and publish a

v1.5 CF Data Model

Which is consistent with this version of the conventions.

Once this is complete, work can continue to publish a

v1.6 CF Data Model

which reflects the changes made to the conventions documents at 1.6.

comment:19 in reply to: ↑ 15 Changed 5 years ago by markh

Replying to jonathan:

Dear all

Since this ticket currently has no moderator, I will do that, by agreement with Mark.

That would be very helpful, thank you.

comment:20 Changed 5 years ago by jonathan

We've got several changes to point 4 now, and it could be split into several points. Following the above postings, I would like to suggest some rewording to these points, thus:

  • A version of the data model will be published at the same time as or as soon as possible after each version of the CF conventions, consistent with that version and having the same version number, beginning from version 1.5.
  • Discussions of proposed changes to the CF conventions should consider consistency with the data model. If inconsistencies exist, these should be addressed, either by altering the proposal or by proposing a change to the data model.
  • Equally, consideration of the data model may motivate changes to the CF conventions. In this case proposed changes to the conventions will be discussed and agreed using the current mechanisms.
  • The responsibility for maintaining the data model and for its consistency with the CF conventions will belong to a new committee, but anyone may propose changes to the data model in the same way as changes to the CF conventions.

I've added the last bit myself. I think this should be an open process, just like everything else. Do others agree? Like Mark, I think the committee would be useful to provide an impetus and ensure that the data model is maintained, but the data model shouldn't be the property of the committee. Would you be happy with this, Martin?

Having mentioned the committee in my last point above, I think it could be omitted from Mark's point 1.

Since this ticket is not a proposal to change the conventions, the text we agree won't go into the conventions document. However, some of it could be used as a preamble to the data model document.

Jonathan

comment:21 in reply to: ↑ 8 Changed 5 years ago by jonathan

The current version of this proposal, including some rewording as I suggested in the last update to the ticket and the extra benefit which Karl earlier identified, is:

Objective

The purpose of this ticket is to agree the scope and terms of reference for the CF data model.

Scope, Terms and Conditions

  1. The CF community will adopt a data model as part of the CF Metadata Project.
  2. The data model will be a complementary resource to the:
    • CF Conventions Document
    • CF Standard Name Table
    • CF Conformance Requirements & Recommendations
    • Guidelines for Construction of CF Standard Names
  3. The data model will be maintained by the community, using the same mechanisms as are used for the conventions, conformance and standard_name documents.
  4. A version of the data model will be published at the same time as or as soon as possible after each version of the CF conventions, consistent with that version and having the same version number, beginning from version 1.5.
  5. Discussions of proposed changes to the CF conventions should consider consistency with the data model. If inconsistencies exist, these should be addressed, either by altering the proposal or by proposing a change to the data model.
  6. Equally, consideration of the data model may motivate changes to the CF conventions. In this case proposed changes to the conventions will be discussed and agreed using the current mechanisms.
  7. The responsibility for maintaining the data model and for its consistency with the CF conventions will belong to a new committee, but anyone may propose changes to the data model in the same way as changes to the CF conventions.
  8. The scope of the data model is to define the concepts of CF and the relationships that exist between these concepts.
  9. The data model provides a logical, implementation neutral, abstraction of the concepts defined by CF.
  10. The data model does not define the interface to CF.

Benefits

The data model is believed to offer the following benefits:

  • Providing an orientation guide to the CF Conventions Document
  • Guiding the development of software compatible with CF
  • Facilitating the creation of an Application Programming Interface which 'behaves/feels like CF' and is intuitive to use.
  • Providing a reference point for gap analysis and conflict analysis of the CF specification
  • Providing a communication tool for discussing CF concepts and proposals for changes to the CF specification
  • Setting the ground work to expand CF beyond netCDF files.

No comments have been made since my last posting and summary on 12th July. The ticket will be accepted on 2nd August if no-one has further objections or comments to make before then.

Jonathan

comment:22 Changed 5 years ago by jonathan

As no-one has commented further, this ticket should be regarded as accepted. I will leave it open until the first version of the data model is place, but it does not require any change to the CF conventions document. However, some of what is above will belong in a procedural document about how the data model is to be maintained, and some will belong in the preamble to the data model document.

Thanks for proposing the ticket, Mark.

Jonathan

comment:23 Changed 10 months ago by davidhassell

  • Owner changed from cf-conventions@… to davidhassell
  • Status changed from new to accepted

comment:24 Changed 9 months ago by davidhassell

I'm re-opening this ticket since the CF data model has not been resolved, and it has no impact on the current conventions (and nor will it at the upcoming v1.7)

David

But I'm not sure how ...!

Last edited 9 months ago by davidhassell (previous) (diff)

comment:25 Changed 9 months ago by davidhassell

  • Resolution set to fixed
  • Status changed from accepted to closed
Note: See TracTickets for help on using tickets.