Opened 3 years ago

Last modified 3 years ago

#113 assigned enhancement

Review of CF feature types

Reported by: mgschultz Owned by:
Priority: medium Milestone:
Component: cf-conventions Version:
Keywords: featureType, Grid, Point, TimeSeries, Profile Cc:

Description

While comparing the CF1.6 document with the INSPIRE data specifications for atmospheric conditions and meteorological features and the Unidata THREDDS code/web pages, I found large overlap, but also a few inconsistencies or ambiguities plus the promise that future versions of the convention would deal with "difficult" feature types such as time series of regionally aggregated information, the description of fronts etc. Maybe this is a good time to initiate this discussion?

This post has two sections: 1) some specific suggestions for changing the contents of the (1.6) conventions document, 2) an overview of feature types including their definitions and representations that tries to harmonize across the three different sources. I have not yet dared to tackle the new feature types mentioned above.

1) Specific changes recommended to the conventions document:

  • The text in section 9.1 just below Table 9.1 says "The space-time coordinates that are indicated for each feature are mandatory.". However, in section 9.2 there is a seemingly contradictory statement "If there is only a single feature to be stored in a data variable, there is no need for an instance dimension and it is permitted to omit it." -- I propose to clearly label the optional indices as optional in table 9.1 and describe features versus collections before Table 9.1 (i.e. interchange sections 9.1 and 9.2 with potential minor rewriting).
  • Would it make sense to specify the maximum allowed dimensions or is this too restrictive? As far as I can see there may be a small set for each feature type and it could thus be helpful to have these listed explicitly rather than stating that extra dimensions can be added at will.

2) Collection of feature types from CF, THREDDS, and INSPIRE:

Note that definitions of terms further down are often incomplete.

  • RectifiedGridCoverage?: coverage whose domain consists of a rectified grid – a grid for which there is an affine transformation between the grid coordinates and the coordinates of a coordinate reference system. Basically, this means that the 2-dimensional grid can be described by two 1-dimensional vectors (examples: longitude and latitude or UTM easting and UTM northing). The grid can have additional dimensions such as a vertical dimension z or an ensemble coordinate e. Thus the representation is data(m, i, j, k, o) e(m) x(i) y(j) z(k) t(o) with m, k, o optional.
  • ReferenceableGridCoverage?: coverage whose domain consists of a referenceable grid – a grid associated with a transformation that can be used to convert grid coordinate values to values of coordinates referenced to a coordinate reference system. Thus the representation is data(m, i, k, o) e(m) x(i) y(i) z(k) t(o) with m, k, o optional.
  • Point: one or more parameters measured at a set of points in time and space (having no implied coordinate relationship to other points). Thus the representation is data(i) x(i) y(i) t(i) with i optional.

  • TimeSeries?: a time-series of data points at the same location, with varying time. A series of data points at the same spatial location with monotonically increasing times. Thus the representation is data(i, o) x(i) y(i) t(i, o) with i optional.
  • Trajectory: A series of data points along a path through space with monotonically increasing times (example: backtrajectory model calculation; aircraft flight path). Thus the representation is data(i, o) x(i, o) y(i, o) z(i, o) t(i, o) with i, z optional.
  • Profile: an ordered set of data points along a vertical line at a fixed horizontal position and fixed time (example: one balloon sounding or aircraft takeoff or landing). Thus the representation is data(i, k) x(i) y(i) z(i, k) t(i) with i optional.
  • TimeSeriesProfile?: a series of profile features at the same horizontal position with monotonically increasing times (example: balloon soundings from a station). Thus the representation is data(i, k, o) x(i) y(i) z(i, k, o) t(i, o) with i optional.
  • TrajectoryProfile?: a series of profile features located at points ordered along a trajectory (example: aircraft lidar profiles). Thus the representation is data(i, k, o) x(i, o) y(i, o) z(i, k, o) t(i, o) with i optional.
  • ForecastModelRunCollection?: a collection of forecast model runs, which can be uniquely identified by the start of the model run, called the model run time, also called the analysis time or generating time. Each model run has a series of forecast times. A collection of these runs therefore has two time coordinates, the run time and the forecast time. An FMRC creates a 2D ctime collection dataset, and then creates various 1D time subsets out of it.
  • Image: Image data which may consist of several layers or channels. Thus the representation is data(i, j, h) c(h) with h optional. (?? Need optional geo-rectification of images, time series of images etc. ??)
  • RADIAL: Radial data
  • SECTION: Section data
  • STATION: Station data (alias for TimeSeries?) -- should aliases be allowed?
  • STATION_PROFILE: Stations of profiles (alias for TimeSeriesProfile?) -- should aliases be allowed?
  • SWATH: Swath Data
  • ANY: No specific type (?? Shall this be allowed??)
  • ANY_POINT: Any of the point types (?? What is this needed for ??)

Best regards, Martin

Change History (25)

comment:1 Changed 3 years ago by graybeal

Yes to both.

I would really like to see item (2), though maybe not the most far-reaching cases yet. I've spent a lot of time trying to reconcile the CF and THREDDS feature types, and the guidance in the CF examples and NODC template examples. Would be great to get more authoritative.

Thoughts: a) Include OGC features too, if any are different b) Swath needs to be defined. This might be of two forms: (1) A series of images following a (linear? well-defined?) path, like from satellite or aircraft (2) A sequence of {measurements in a line}, such as an AUV with a scanning sonar might make (uh, sorry, if that's not exactly the right example). c) It might be best to order the work, so that the lowest hanging fruit is accomplished before the next major revision.

comment:2 Changed 3 years ago by ngalbraith

Thanks, Martin, I'm glad you're taking this on!

I'd like to see a summary/description of feature types used by THREDDS; even collecting the URLs in one place would be helpful.

Not sure what John means by 'OGC features' - John, is there a reference for this?

comment:3 Changed 3 years ago by mgschultz

Thanks for the supporting comments. Here are the reference links for the "original" feature types I have been looking at:

  • OGC: I am also not quite sure what is meant here. The "Web feature service" is essentially "unbound", meaning that a special request "GetFeatureTypes?" shall return all feature types from a specific service. These can be anything. Examples are "road", "bridge", "inWater_A", etc. If you dig deep enough you may come up with some useful definitions which could perhaps shed new light on the CF feature types. For example the Water ML standard (http://www.opengeospatial.org/standards/waterml) has various time series classes...

comment:4 Changed 3 years ago by graybeal

Re OGC features, what Martin outlined is what I was thinking -- I knew they had a concept of feature, and was thinking somewhere in SWE there might be something more concrete that related. Fishing, really, but the example of WaterML was a good one.

comment:5 Changed 3 years ago by graybeal

Steve H mentions to include the cf_satellite discussion progress when talking about swath.

comment:6 Changed 3 years ago by ngalbraith

With regards to item 1, I'm not sure I see a contradiction between stating that "space-time coordinates ...are mandatory" and "If there is only a single feature to be stored in a data variable, there is no need for an instance dimension." I do agree that the instance dimension should be shown as an optional field in the table, and the fact that an instance variable is still required should be noted (thanks to Bob S for clarifying this point).

My question on this is whether, in the absence of an instance variable (actually, now, dimension) the featureType attribute still should be used, and, if not, whether such a file is CF1.6 compliant. This probably has been answered somewhere in the email discussion, but should be clarified in the docs.

On item 2, I'm glad you're referring to the Java source; I just did a quick search on my email archive, looking for feature type in subject lines, and find this cropping up in the netcdf-java and thredds lists as well as in cf-metadata.

I have one additional item that's been in 4 or 5 email threads, on which I admit to having dropped the ball every time. It's the question of describing 2D data from surface moorings as timeSeriesProfile features. Is this within the scope of this ticket? I'm sure it's a closed issue for most CF folks, but I have lingering questions; don't want to divert the effort going on here, though.

Thanks - Nan

Last edited 3 years ago by ngalbraith (previous) (diff)

comment:7 Changed 3 years ago by ngalbraith

Also in regards to item 1, modifications to the actual text of the convention document:the description of the "array subscripts" immediately following [table 9.1 http://cf-convention.github.io/1.6.html#idp6280704] does not describe "o", although it's used in the table. That info is in the next section, and might be a little confusing (or I might just need more coffee).

And, table 9.1 should be in the list of tables - this might be a site transition issue, and not need to be part of the document change discussion.

comment:8 Changed 3 years ago by jonathan

Dear Martin

Thanks for this initiative.

I don't think the statements are contradictory in 9.1 that certain space-time coordinates are mandatory and in 9.2 that a size-one instance dimension can be omitted, because the instance dimension is not a spatiotemporal dimension - it is just an index running over the collection of features. The mandatory space-time coordinates are for the individual single features in Table 9.1. The idea is that the feature types are distinguished by which sort of spatiotemporal coordinate they must have and are allowed to have. However, if the document is unclear, it should certainly be clarified.

It would be good to add new feature types that are distinguishable in these terms from the existing ones, provided that these new types are required in practice i.e. there are present use-cases for them. This would follow our general practice that we only add to the convention when there is a need, rather than anticipating possible needs.

Best wishes

Jonathan

comment:9 Changed 3 years ago by ngalbraith

The THREDDS link above (​https://github.com/Unidata/thredds/blob/master/cdm/src/main/java/ucar/nc2/constants/FeatureType.java) has disappeared.

Do we have another link? I've seen different versions of the list of terms used by THREDDS to describe the 'shape' of a dataset, and would really like to know where the 'definitive' list lives.

comment:10 Changed 3 years ago by graybeal

Yeah, Nan. The one pointed to by ACDD Working is the best I could find (at that time), and it still works: http://www.unidata.ucar.edu/software/thredds/current/tds/catalog/InvCatalogSpec.html#dataType

Since it's code I figure it's authoritative. But of course there could be multiple code bases -- your link seems to be an entirely separate development code base. So maybe there are 2 authorities....

Last edited 3 years ago by graybeal (previous) (diff)

comment:11 Changed 3 years ago by graybeal

Searching in the Unidata repository (yay for open source!), I found what may be the curent version of Nan's disappeared link:

https://github.com/Unidata/thredds/blob/target-4.3.22/cdm/src/main/java/ucar/nc2/constants/FeatureType.java

Last year, I was of the impression this was not a released stable code base, so we couldn't use it in ACDD at this stage. Perhaps that has changed, and cdm is the currently maintained code base and Unidata's vision for the future.

This code lists feature types ANY, GRID, RADIAL, SWATH, IMAGE, ANY_POINT, POINT, PROFILE, SECTION, STATION, STATION_PROFILE, TRAJECTORY, STATION_RADIAL, FMRC, UGRID.

If that's the case, since the Unidata cdm FeatureType?.java list is fairly different from the THREDDS/cdm_data_type list (point, profile, section, station, station_profile, trajectory, grid, image, swath), it may be that 3 attributes would be needed (current/future CF DSG; future Unidata cdm FeatureType?; and existing/historical THREDDS data types.

Last edited 3 years ago by graybeal (previous) (diff)

comment:12 Changed 3 years ago by mgschultz

Dear all,

I have now attempted to come up with a systematic overview of *coverage types* (to be distinguished from *feature types*) based on analysis of dimensions and coordinates that seem appropriate and useful. Please see http://redmine.iek.fz-juelich.de/projects/julich_wcs_interface/wiki/Coordinates_and_coverage_types for descriptions and illustrated examples. Unless we receive strong criticism, we plan to implement these coverage types in our python adaptation of the community data model (CDM). Please let me know if I have missed important other ways of organizing earth system science data.

It may turn out to be useful to add further subclasses for specific geometries (e.g. radar scans), but these would be special cases of the given types.

Perhaps somewhat naively I would advocate that these coverage types should find their way into future CF versions - perhaps with a mapping to the existing (incomplete) ones?

Martin

comment:13 Changed 3 years ago by biard

Martin,

I checked out the list of definitions, and they look good. But. (You knew there had to be a but, right?) Aren't you being overly restrictive by explicitly calling out latitude and longitude instead of more generalized X and Y, especially in the coverage type names? I'm coming from the satellite and aerial sensor community, where the natural coordinates may not be latitude and longitude, but this is also a concern for polar grids, where using latitude and longitude can become problematic and a polar stereographic coordinate system is the more natural fit. That's the only issue I see, and perhaps in your specific domain it's not an issue, but I feel like your work would be more generally applicable if your namings were less tightly tied to lat and lon.

Grace and peace,

Jim

comment:14 follow-up: Changed 3 years ago by jonathan

Dear Martin

Thanks for this list. I think I must be missing something basic, because I don't understand the purpose of the rectified coverages. They look like gridded 1D-, 2D- and 3D data variables. Chapter 9 of CF was introduced to enable efficient and convenient storage of collections of 0D- and 1D data variables which have a variety of dimensions and coordinates, because to store each feature in its own data variable (which is logically equivalent) would require a very large number of netCDF dimensions and coordinate variables and would produce a rather cumbersome file. When you say that you hope the rectified coverage types could be added to CF, do you mean that this same storage problem arises with gridded fields as well?

Best wishes and thanks

Jonathan

comment:15 follow-up: Changed 3 years ago by graybeal

Hi Martin,

I appreciate this analysis, every time I see a new one I learn more about the fundamental organizations of data and the existing models. I especially appreciate the drawings, extremely helpful in the discussion.

Some of this may duplicate Jonathan's and Jim's comments, but I'm not 100% sure we are saying the same thing. (Some of it may reflect that I have not mastered the dimensional representations of data yet, too.)

You've implicitly asked for two kinds of review: (a) is this a good and complete system for JOIN, and (b) is this a good system for CF to adopt? There are different criteria for those two, because a 'closed system' may handle all the difficulties and niceties of interfacing the model to users; whereas an open specification has to be reasonably transparent to naive users such as myself.

The focus on lat/lon/z systems for JOIN may be suitable; for a generic specification, I think not so much. Not only are lat/lon/z systems based on different references (so not automatically interoperable; not sure if this is an issue for JOIN), but in an open spec they are better expressed as a special subset of the general case, imho.

Some more detailed thoughts:

In the 1d_nonrectified_grid diagram you just took a 1D rectified grid and angled it. Then for the 2d_nonrectified_grid you took the 2D rectified grid and transformed it, so that points were no longer regularly spaced. This seemed inconsistent, so I went to the definitions to understand where the distinction lay. 1d_rectified_grid says "the coordinate values need not be regularly spaced", so i recommend this have a diagrammatic example too.

The fact that regular grids are not a special category precludes handling that common case efficiently, and points to the question of what is the mapping between CF's grid definitions and yours. (Put another way, how/where would you fit your definitions into the existing CF standard?

The use of 'scan/swath' as the label for 2d_nonrectified_grid diagram is a little confusing, just because I thought many scans and swaths are regular, which the diagram isn't.

1d_nonrectified_grid says "Any x, y, z coordinate may be a 1-dimensional array, a scalar, or missing." (Suggest adding a written example of the last.) I'm afraid the brevity confused me. I assume you are saying that any coordinate axis may be represented by an array of values, not that any individual coordinate may be an array.

Is the 't' axis for point_timeseries_collection the same for every 'i'? (Perhaps that would be a point_collection_timeseries...)

Do you want to represent a single trajectory differently than multiple trajectories? (I thought trajectory_timeseries should be the latter, personally. But this is murky ground.)

I think I'll leave it there -- some of the more advanced ones are a little hard to parse out in the time and expertise I have, and I may already be too picky for your needs.

comment:16 in reply to: ↑ 14 ; follow-up: Changed 3 years ago by mgschultz

Replying to jonathan:

Dear Martin

Thanks for this list. I think I must be missing something basic, because I don't understand the purpose of the rectified coverages. They look like gridded 1D-, 2D- and 3D data variables. Chapter 9 of CF was introduced to enable efficient and convenient storage of collections of 0D- and 1D data variables which have a variety of dimensions and coordinates, because to store each feature in its own data variable (which is logically equivalent) would require a very large number of netCDF dimensions and coordinate variables and would produce a rather cumbersome file. When you say that you hope the rectified coverage types could be added to CF, do you mean that this same storage problem arises with gridded fields as well?

Best wishes and thanks

Jonathan

Hi Jonathan,

thanks for your comment. This distinction between "rectified" and "non-rectified" grid coverages reflects the alignment of the gridded data along the coordinate axes. A global model output will usually consist of a (more or less) regular grid, so you will find, for example temperature(time, lev, lat, lon). This is a rectified grid, because you can describe the lon and lat values by 1d-coordinate variables lon(lon) and lat(lat). If, on the other hand, you have a rotated grid, the variable will be temperature(time, lev, y, x), and the coordinate variables will be lon(y,x) and lat(y,x). This requires different processing when you want to put the data onto a map or extract a subset, etc. This is why I distinguish "nonrectified" from "rectified" (grid) coverages.

Cheers,

Martin

comment:17 in reply to: ↑ 15 ; follow-up: Changed 3 years ago by mgschultz

Replying to graybeal:

Dear John,

thanks for your message and sorry for taking so long to get back to you. I admit that I had to read this twice to see the value of your comments ;-) I have now updated the wiki page to take some of your comments and concerns into account. I am not really willing to exchange the figure for now, though - maybe some imagination should be left to the reader? I agree with most of your statements and particularly liked the "correction" (clarification?) of "point_timeseries_collection" versus "point_collection_timeseries". This tiny but important distinction had escaped my notice.

Concerning your meta-point about the use or application of this model, I would still argue that it may be useful beyond the specific implementation in JOIN - even though I consent that here or there one may want to phrase things in a somewhat more general way, especially concerning the longitude/latitude coordinates. Nevertheless, I believe it is an important step from a completely abstract model to a model that can actually be implemented in practice. Obviously, there are a myriad of other possibilities of x, y, z, t, i coordinates possible in theory - but I believe it is useful to map those which actually occur in practice (this has always been a good CF principle) to a set of named coverage types. What has been in CF so far has been rather incomplete I would argue (and this was the starting point of this track ticket). I would even go as far as to suggest that the current CF types should be renamed according to this new scheme -- in this context one should of course attempt a mapping.

Happy to hear more critical comments,

best regards,

Martin

PS: I updated the description on http://redmine.iek.fz-juelich.de/projects/julich_wcs_interface/wiki/Coordinates_and_coverage_types to accomodate most comments until now.

Last edited 3 years ago by mgschultz (previous) (diff)

comment:18 follow-up: Changed 3 years ago by jonblower

Hi Martin,

You may or may not be aware that in the ISO models, a "rectified grid" is quite specific and requires there to be an affine transformation between grid space and coordinate space. Essentially this means that the grid points need to be equally spaced in external (e.g. lat-lon) coordinates.

This may be what you mean, but if this is more restrictive than you intended, you could consider the term "rectilinear grid", which is not an official term in any standard that I know of, but I use it in my own work to indicate essentially that all coordinate axes are 1D (as opposed to a curvilinear grid).

Hope this helps, Jon

comment:19 in reply to: ↑ 18 Changed 3 years ago by mgschultz

Replying to jonblower:

Hi Jon,

excellent point! I finally took the pain to understand what they mean with "affine transformation", and I see now that this is actually quite different from what I had intended "rectified" to mean (partly, because it indeed requires equi-distant points, and partly, because it allows rotation and skewing, which changes the representation of coordinates - at least if these are restricted to lon/lat).

Yes, I have heard your term "rectilinear" before. Not sure, I am 100% pleased with that term, though, because it contains "linear", which to some means the same as "affine" and/or may also suggest that points are equally spaced. "Aligned" may be a better word? I'd be happy to learn more about this. And if eventually an Englishman, an American, and an Australian (plus a Scot?) can agree on one term, I'd be very happy to accept this ;-)

Cheers,

Martin

comment:20 in reply to: ↑ 16 Changed 3 years ago by jonathan

Dear Martin

This distinction between "rectified" and "non-rectified" grid coverages reflects the alignment of the gridded data along the coordinate axes. A global model output will usually consist of a (more or less) regular grid, so you will find, for example temperature(time, lev, lat, lon). This is a rectified grid, because you can describe the lon and lat values by 1d-coordinate variables lon(lon) and lat(lat). If, on the other hand, you have a rotated grid, the variable will be temperature(time, lev, y, x), and the coordinate variables will be lon(y,x) and lat(y,x). This requires different processing when you want to put the data onto a map or extract a subset, etc. This is why I distinguish "nonrectified" from "rectified" (grid) coverages.

Yes, I agree, there is a difference in the way the information is stored. But we already have means to store both rectified and non-rectified grids in CF. That is, CF can already deal with "coverage" data in efficient ways, can't it? The situation with features was different: chapter 9 was introduced to provide more efficient ways to store collections of features in netCDF. Do you see new requirements for CF-netCDF arising from your classification?

Best wishes and thanks

Jonathan

comment:21 Changed 3 years ago by jonblower

Hi Martin,

The term "rectilinear" is not intended to indicate that there is a linear relationship between grid indices and coordinate values. Instead, it indicates that the grid lines form straight lines in coordinate space ("rectilinear" means "relating to straight lines", roughly). This is to distinguish from "curvilinear" grids, whose grid lines form curved lines in coordinate space.

So personally I'm happy with this term. "Aligned grid" could work if others prefer it, but I do tend to prefer "rectilinear" since we have already accepted "curvilinear" as a common term.

Cheers, Jon

comment:22 in reply to: ↑ 17 Changed 3 years ago by graybeal

Replying to mgschultz:

I admit that I had to read this twice to see the value of your comments ;-)

Yeah, uh oh, my bad. :-) It's always hard for me to come back into this material and understand even my own details.

I am not really willing to exchange the figure for now, though - maybe some imagination should be left to the reader?

Depends on whether the emphasis is narrative or precision. Because these are subtle and important distinctions, I lean toward complete and consistent materials, so the user can easily create a mental model that exactly matches the one you want them to have.

What has been in CF so far has been rather incomplete I would argue (and this was the starting point of this track ticket).

Yes, completeness has been a problem with many of these models, IMHO. I found the increase in set size helpful.

I would even go as far as to suggest that the current CF types should be renamed according to this new scheme -- in this context one should of course attempt a mapping.

Possible yes to renaming, though I'd want to review the names attentively with the mappings in hand.

As an example of my naming views, I concur that reusing 'rectified' with different meaning is not OK; this contributed to my confusion with the drawings. I agree rectilinear is not perfect, but it's OK for me. I think 'aligned' is much better but still carries a hint of linearity (more so if 'aligned grid'). Other possibilities could be explored (oriented, parallel, …).

Finally, re meta usefulness for CF, I think the key question is: Do the enhancements in your model address the goal of the CF section 9, or go beyond? (And if the latter, should the goal of the section be expanded?) The mapping would help me assess that. The fact that CF is storage-oriented, perhaps downplays the value of the semantically richer descriptions in your model. That is, if two feature models have different conceptual origin, but are stored exactly the same way, how much does CF care, and how much should it care?

comment:23 Changed 3 years ago by graybeal

  • Owner changed from cf-conventions@… to graybeal
  • Status changed from new to assigned

comment:24 Changed 3 years ago by graybeal

  • Owner graybeal deleted

Sorry, made the last (assigment) change to the wrong ticket. Can't quite see what the full default name is (cf-conventions@...), so am leaving blank.

comment:25 Changed 3 years ago by mgschultz

Thanks to your critical comments, I have now done more research and significantly rewrote the "coverage type" description on our wiki. To help clarify the various concepts involved, I also wrote a "coverage primer", which I hope you find useful. One thing that came out of this is a much clearer view on rectified coverages, which I hope you can agree on (for the moment I am pretty convinced that I am correct ;-) ). The links are:

Concerning the relevance for CF, I admit that I may have strayed a little. Yet, after all of the work I put into this, I am convinced that we should make an attempt to improve CF in this regard. I don't think that I will find the time to rewrite section 6, but perhaps the Annex H with the incomplete listing of "Annotated Examples of Discrete Geometries" can benefit from the more systematic and complete listing on the MetOcean data types wiki page.

Note: See TracTickets for help on using tickets.