Opened 12 years ago

Closed 12 years ago

#9 closed enhancement (wontfix)

Extensions to CF grid mapping attributes to support coordinate reference system properties

Reported by: pbentley Owned by: cf-conventions@…
Priority: low Milestone:
Component: cf-conventions Version: 1.0
Keywords: coordinates datums projections Cc:

Description

1. Title

Proposed Extensions to CF Grid Mapping Attributes to support CRS Properties

2. Moderator

Jonathan Gregory

3. Requirement

Previous posts to the CF mailing list have identified the requirement for additional attributes that could be used to provide a fuller definition of the characteristics of the coordinate reference system (CRS) used by spatial coordinates within a netCDF file. This proposal attempts to define attributes for several commonly used CRS properties.

4. Initial Statement of Technical Proposal

Owing to the length of this proposal, the full specification text is included in the attached PDF document.

(NB: If it is considered more convenient, e.g. for discussion purposes, to upload the full text of this proposal into this Trac ticket, then the author is happy to do so.)

5. Benefits

Scope: potentially all producers and end-users of netCDF datasets could exploit the proposed new attributes.

New capabilities: the proposed new attributes would enable data producers to more accurately record the specific characteristics of the coordinate reference system (or systems) used to define spatial coordinates within netCDF files.

Example use-case 1: A data producer has collected meteorological observations using a sensor platform which uses, for example, a particular geodetic datum (e.g. WGS 84, NAD 83, OSGB 36) to record spatial coordinates. It is desirable for this piece of CRS metadata, and others like it, to be recorded in appropriate netCDF CF attributes.

Example use-case 2: A climate data center wishes to convert a legacy dataset to netCDF format and make it available over the internet. The legacy dataset is based upon an unusual or customised coordinate reference system (e.g. transverse mercator projection using, say, the Clarke 1880 ellipsoid). As before, these CRS details need to be encoded in agreed, standardised CF attributes.

6. Status Quo

The author is not aware of alternative CF attributes or mechanisms that could be used to encode the desired additional CRS properties.

Attachments (1)

cf_grid_mapping_attributes_v1-0.pdf (167.2 KB) - added by pbentley 12 years ago.
Proposal for extensions to CF grid mapping attributes

Download all attachments as: .zip

Change History (28)

Changed 12 years ago by pbentley

Proposal for extensions to CF grid mapping attributes

comment:1 follow-up: Changed 12 years ago by jonathan

Dear Phil via CF trac ticket #9

Thanks for your proposal on trac concerning extensions to CF grid mapping attributes to support coordinate reference system properties. You made the proposal on 12 July but there hasn't been any following discussion. As moderator, I'd like to try to provoke some! (Sorry I haven't had time to do so before now.)

The proposal as it stands is quite long. It might stimulate discussion if you could paste into the trac ticket the essential details of the proposal and the examples, so that everyone sees them. In any case, to decide on a change we do need explicit specifications of the textual changes proposed to the standard document and the conformance document.

You are right in saying there have been previous email discussions on this issue, so I hope that others may have views on your proposal. I think we do need to develop the CF standard in this area, but I would suggest that we could restrict the changes to the particular areas where needs have been identified in the past (following our usual principle of doing things when required), which are for definitions of the reference ellipsoid and the vertical datum.

Would it be correct to say that we could describe the reference ellipsoid (the figure of the Earth) using your attributes ellipsoid_id ellipsoid_name inverse_flattening semi_major_axis and semi_minor_axis? I am not sure that the definition of the ellipsoid should be part of a grid_mapping. There are quantities whose value depends on the reference ellipsoid (e.g. to do with satellite altimetry), regardless of the coordinate system you are using, and up to now grid_mapping has only been used to define what you would call non- geographic coordinate systems. Perhaps we should introduce a separate dummy variable (like grid_mapping in intent, pointed to from the data variable) to carry the geodetic information.

The vertical datum is identified using your attributes vertical_datum_id and vertical_datum_name, which could also be attached to a geodetic dummy variable. However I am not clear how they actually work. What does it mean to say that we are using the OSGB 1936 datum? Somehow it must pin down the vertical axis - is it something like at a particular (lat,lon) location on the ellipsoid specified the value of altitude is zero? Is that the kind of info implied by the vertical_datum_id?

I think your projection_name (your example of "British National Grid") is potentially useful as extra description, and that does belong in the grid_ mapping, where the projection itself is defined as already specified by the standard. I'm not convinced, however, that additional identification of the overall coordinate reference system (the crs attributes) is useful. That info is redundant, if we are separately specifying the things it implies (reference ellipsoid, projection, vertical datum). Any redundancy is liable to lead to inconsistency, and it's not obvious to me why it would be valuable.

perspective_point_height would be needed for specifying a projection from a satellite view but so far no-one has asked for that projection, so perhaps we should leave it until they do.

Thanks for your work on this. Best wishes

Jonathan

comment:2 follow-up: Changed 12 years ago by tomgross

CF grid mapping attributes comments:

The need to provide reference information for spatial coordinates

seems to me to be inescapable. So CF should include methods for this.

My comments involve only the vertical coordinate. In the world of

sea level data the datum is usually referred to as the reference height. As a tide gauge is tied into a surveyed system the references will usually include several steps. The tide gauge to the dock benchmark, the dock benchmark to a land benchmark and the land benchmark to one of your CRS coordinate systems. In addition we alter these with references to Mean Sea Level, Mean Lower Low Water etc. etc. So any vertical data point must have an attribute stating the reference. I would request an attribute "reference" which would be set equal to the crs local name in your system, or it could be set equal to the local name of another variable with the same horizontal dimension as the original data, or perhaps the chain of references will end with reference="MLLW" or some such named local convention.

float Z(lat, lon)

Z:reference="MLLW";

float Z(lat, lon)

Z:reference="zmllw";

float zmllw(lat, lon)

zmllw:reference="WGS 84";

float Z(lat,lon)

Z:reference="zmllw";

float zmllw(lat, lon)

zmllw:reference="modelzero";

float modelzero(lat,lon)

modelzero:reference="WGS 84";

-- Thomas F Gross IOC/UNESCO Ocean Observation and Services

comment:3 follow-up: Changed 12 years ago by caron

Hi Philip:

Thanks for getting started on this. We certainly need this.

The main issue here I think is controlled vocabularies.

Many of the elements are described as "Well-known name of the XXX ...", and the ids are "Identifier of the XXX as defined by a controlled vocabulary or by an external authority". Are the "well known names" controlled or just human redable?

Your example has, eg:

crs:crs_id = "urn:ogc:def:crs:EPSG:6.3:4326" ;

I find these kinds of ids rather opaque. Its unlikely that many in our community will know what to put in there (I certainly dont). So one possibility is that we list the ids and names of the ones actually in use in CF community, thus acting to educate and document these for ourselves.

Other misc comments:

  • cleaning up the projection parameter names I think is good, though we end up having to support both names forever, i think.
  • "scale_factor" will conflict with existing attribute convention.
  • the dummy variable for me acts as a collector of info on a coordinate transformation, of which the projection (or grid_mapping) is one part. So I think its ok to put other transformation onfo there, like ellipsoid and datum.
  • the Java library already has coded a perspective_point_height / satellite perspective view mapping (for eumetsat data i think), and ive been intending to propose adding it.

comment:4 follow-ups: Changed 12 years ago by lowry

Dear All,

Read Phil's document and it encapsualtes all the steer I've been getting from CRS experts. Use of URNs is coming highly recommended. Linking to EPSG is a no-brainer. Is there a resolver for OGC URNs that leads to an XML document describing the resource.

Tom makes a good point. There's a lot of data around expressed to local references and these are much more valuable if that is specified. The solution we've adopted in SeaDataNet? is to set up a controlled vocabulary of terms used to describe local datums. This may also be referenced by URN (e.g. SDN:L111::D02 is Highest Astronomical Tide) that will (when we've finished building in a month or so) resolve to a SKOS document describing the concept. These will be used in place of EPSG references for local datums.

Populating these attributes won't be easy and if we adopt them - as I think we should - some guidance notes would be a good idea.

Roy.

comment:5 in reply to: ↑ 4 ; follow-up: Changed 12 years ago by rsignell

CF Folks,

ArcGIS 9.2 now writes !NetCDF files with coordinate reference system properties. I tried writing a 2D topography grid with Geographic, UTM and Miller projections to NetCDF. All three netCDF files output by Arc have a variable attribute called "esri_pe_string" which completely specifies the coordinate reference system using ESRI WKT (which I gather is similar to OGC WKT). The UTM file also has a "grid_mapping" variable, since UTM is a projection included in the CF specification.

Is the specification we are proposing for CF compatible with these example NetCDF files produced by ArcGIS?

These NetCDF files, of course, provide complete coordinate system to ArcGIS when read back in.

Whatever we do, it would be nice if our CF compliant files were able to be read by the big GIS packages like ArcGIS. If we do something different, I guess they will have to add the new conventions, but keep their "esri_pe_string" for backwards compatibility, introducing redundancy?

-Rich Signell

comment:6 follow-up: Changed 12 years ago by jonathan

Dear all

In reply to Rich: "Is the specification we are proposing for CF compatible with these example NetCDF files produced by ArcGIS?" That's a good question - do you know? I agree it would be nice if ArcGIS could read our files, but not if that comes at the cost of making our metadata less self-describing and more opaque to humans. What does theirs look like?

In reply to Roy and John: I also am a bit unhappy about the redundancy between the familiar names and the precise but opaque IDs. If it were possible to omit the IDs and use the names only, having them externally mapped onto the EPSG or other IDs, that would be a good solution. It imposes a maintenance requirement, of course.

In reply to Tom: I still don't understand how vertical datums work. Is it (as I guessed in my first comment on this ticket) something like at a particular (lat,lon) location on the ellipsoid specified the value of altitude is zero? Is that the kind of info implied by the vertical_datum_id? From your example I gather that the height specified as zero might not be the altitude i.e. the datum is not on the geoid, and there is a further step involved that relates whatever it is, such as mean lower low water, to the geoid. That makes it harder to use. Can we bypass this step, or combine them into one specification that says both what the datum is, and what its height is above the geoid?

Cheers

Jonathan

comment:7 in reply to: ↑ 6 Changed 12 years ago by rsignell

Replying to jonathan:

Jonathan,

Three example NetCDF files with fully specified coordinate systems as generated by ArcGIS 9.2 are at: http://cf-pcmdi.llnl.gov/trac/wiki/ArcNetCDF_examples (Note: sometimes I get an empty screen when I navigate to the CF Trac pages and I have to hit "refresh" on Firefox 3 or 4 times to get the page to come up -- I think others have had this problem too). The link was there in my previous Trac ticket comment, but of course that didn't come through in the e-mail version. Perhaps in future Trac comments we should try to use full URL links for the benefit of e-mail readers.

If you take a look at these files, I think the ArcGIS-produced NetCDF files meet existing CF conventions for the Geographic and UTM conventions. But the additional information needed to fully specify the coordinate system, such as the datums, are all encoded in the "esri_pe_string", a single string containing ESRI Well Known Text, and is an attribute of the data variable, not the grid_mapping container variable. And they don't write lon,lat variables for projected coordinates.

I imagine they don't really care where the esri_pe_string is placed, but I'm sure this is what they would like to read and write. I find WKT pretty readable, but I'm not sure about what others think.

-Rich

comment:8 Changed 12 years ago by lowry

The issue being raised by Rich comes down to whether we want to follow the de facto practice established by a single commercial company or the standards set by strong, accessible governance in the form of OGC. We use ESRI software, which is why my vote goes to OGC.....

Roy.

comment:9 Changed 12 years ago by rsignell

Roy,

I agree that OGC is the way to go.

So is this a future scenario?:

  • We accept OGC WKT as a CF Standard, or at least grid_mapping attributes that could be translated to a complete OGC WKT.
  • If ESRI ArcGIS wanted to read these, they could modify their reader, and if they didn't want to write CF coordinate conventions, we could convert files that ESRI produced into valid CF using something like the tools described in

http://www.gdal.org/ogr/osr_tutorial.html

that can apparently translate between different coordinate specifications?

Do we need to develop a demonstration implementation of the new CF spatial coordinate conventions?

If so, perhaps we could try to use a modified version of "gdalwarp" from the GDAL tools (conveniently bundled for Windows & Linux at http://fwtools.maptools.org).

In the current GDAL tools, if you select "NetCDF output" you get something like this, which is closer than the ESRI NetCDF:

[rsignell@ricsigdtlx cf]$ ncdump -h test36_miller_gdal.nc
netcdf test36_miller_gdal {
dimensions:
        x = 144 ;
        y = 92 ;
variables:
        char miller_cylindrical ;
                miller_cylindrical:Northernmost_Northing = 4911333.35025183 ;
                miller_cylindrical:Southernmost_Northing = 4804259.39790056 ;
                miller_cylindrical:Easternmost_Easting = 33603.9747430358 ;
                miller_cylindrical:Westernmost_Easting = -133990.037632865 ;
                miller_cylindrical:spatial_ref = "PROJCS[\"unnamed\",GEOGCS[\"NAD83\",DATUM[\"North_American_Datum_1983\",SPHEROID[\"GRS 1980\",6378137,298.2572221010042,AUTHORITY[\"EPSG\",\"7019\"]],AUTHORITY[\"EPSG\",\"6269\"]],PRIMEM[\"Greenwich\",0],UNIT[\"degree\",0.0174532925199433],AUTHORITY[\"EPSG\",\"4269\"]],PROJECTION[\"Miller_Cylindrical\"],PARAMETER[\"latitude_of_center\",0],PARAMETER[\"longitude_of_center\",-70],PARAMETER[\"false_easting\",0],PARAMETER[\"false_northing\",0],UNIT[\"metre\",1,AUTHORITY[\"EPSG\",\"9001\"]]]" ;
                miller_cylindrical:GeoTransform = "-133990 1163.85 0 4.91133e+06 0 -1163.85 " ;
                miller_cylindrical:grid_mapping_name = "miller_cylindrical" ;
                miller_cylindrical:longitude_of_central_meridian = -70.f ;
                miller_cylindrical:false_easting = 0.f ;
                miller_cylindrical:false_northing = 0.f ;
        float Band1(y, x) ;
                Band1:_FillValue = -9999.f ;
                Band1:grid_mapping = "miller_cylindrical" ;
                Band1:long_name = "GDAL Band Number 1" ;

// global attributes:
                :Conventions = "CF-1.0" ;
                :AREA_OR_POINT = "Area" ;
}

Note that there are no "coordinates" defined, and no lon,lat (or even x,y) variables, but at least there is a CF "grid_mapping" attribute for the variable that points to a container variable containing human readable spatial coordinate attributes as well as an OGC WKT string that make it convenient to work with the GDAL tools.

-Rich

comment:10 Changed 12 years ago by jonathan

As moderator, I think this discussion, while interesting, is straying from Phil's proposal to define attributes to describe ellipsoid, projection and vertical datum. Are there further comments on Phil's proposal? (I certainly don't wish to stop this debate, but I'd suggest that discussion should use the email list, while an alternative definite proposal should be a new trac ticket.)

Thanks

Jonathan

comment:11 Changed 12 years ago by rsignell

I thought that trying to understand how two of the more common geospatial packages (ArcGIS & GDAL) represent CRS specifications in NetCDF was relevant for the discussion of Phil's proposal.

As I understand it, Phil's proposal provides an extended standard vocabulary for CF's grid_mapping so that full CRS specifications can be readable by humans, while at the same time allowing for efficient machine reading via the attributes with an "_id" suffix.

What I don't understand is:

  • Whether the proposed "crs_id" attribute is allowed to be any namespace, or only the OGC.
  • How the proposal allows for Well Known Text descriptions of the CRS. Annex B of the proposal describes and gives an example of WKT, but I don't see any mention of WKT in the actual proposal. It seems that allowing for a single string specification of ESRI or OGC WKT would play the same role as the "_id" variables, providing an optional and supplemental mechanism for identifying CRS properies that will allow intelligent software clients (like Arc or GDAL) to more easily operate.

I also didn't understand whether the "urn:ogc:def:crs:EPSG:6.3:4326" syntax was being invented for this proposal, or was already in practice. Perhaps John Caron had the same question. I now know by reading http://portal.opengeospatial.org/files/?artifact_id=8814 that the URN "urn:ogc:def:objectType:authority:version:code" is a registered OGC namespace, that the “urn”, “ogc”, “def”, and six “:” parts of this URN are fixed, and that OGC officially supports the EPSG.

-Rich

comment:12 Changed 12 years ago by lowry

I was as surprised as Rich by Jonathan’s comment. I thought we were discussing three different ways of encoding the CRS (which includes ellipsoid and datum) into CF netCDF:

(1) Phil’s proposal (2) The ESRI way (3) The GDAL way

Rich is arguing that if we use ESRI/GDAL encodings then there are ready-made tools that can do something useful (e.g. co-ordinate transforms) based on the netCDF attributes.

My view of Phil’s proposal is that it represents the emerging standard and is therefore more likely to be supported by tooling in the future (such is my faith in standards!!).

Phil’s clarification on Rich’s queries would be useful.

comment:13 Changed 12 years ago by jonathan

Dear Roy and Rich

OK, thanks for the clarification. My comment came because I am struggling to understand this discussion in view of references to other technical standards and software I don't know about. If it leads back to a modified proposal to make specific changes to the CF standard, that's fine, and I'll probably manage to follow it. :-)

My own questions are:

(1) How the equivalence works between the ID strings full of : (URN) and the more comprehensible names (WKT?). That is probably the same as Rich's question. It would be better not to have both if they are equivalent, and best to have only the comprehensible one, for the sake of self-describing files. In fact a string attribute whose meaning can only be decoded by consulting an external table is not consistent with the intentions of CF.

(2) Why do we need the higher-level concept of a CRS, if we provide for separate definition of ellipsoid, projection and datum? It is the separate elements which we have been asked for in the past. Giving an ID to the combination as well strikes me as redundant and therefore unattractive. Rich's example, for instance, looks like a specification of a projection (a grid mapping), not an ellipsoid; the definition of the ellipsoid is, I presume, a separate thing, for which Phil's proposal has attributes. Is it in fact OK to separate projection and ellipsoid?

(3) How do vertical datums work? To repeat: is it something like at a particular (lat,lon) location on the ellipsoid specified the value of altitude (height above the geoid), or some other vertical reference (e.g. mean lower low water) is zero? We have in the past been asked to specify the vertical datum in CF. It would be very helpful if someone could explain what it means.

Please carry on talking! Any comments on the discussion, Phil? Thanks

Jonathan

comment:14 follow-up: Changed 12 years ago by lowry

Hello Jonathan,

(1) I strongly feel that duplication of labelling in both human-readable and machine-readable form is highly desirable. I thought in Paris there was acceptance of the concept that providing we didn’t abandon the concept of making CF files fully usable without external reference then it should be permissible to enrichen CF by including hooks to external resources. URNs are a standardised encoding of these hooks. If Phil were proposing URNs with no human-readable alternative I’d be totally against it, but he’s not.

(2) The relationship between CRS, datum, ellipsoid and projection is something I just about understand, but not to the extent that I could give an explanation that would clarify rather than confuse. All I know is that everyone I work with who does fully understand these concepts describes them in the way that Phil has done and includes all the same elements that he has specified.

(3) Consider a tide gauge installation. These have a physical vertical reference – usually a metal bolt fixed to a wall – referred to as tide-gauge zero. This is levelled through a benchmark system to a standardised zero, in the case of the UK Ordnance Datum Newlyn, which again has a physical representation solidly fixed to the Earth’s surface. These are physical datums. Further datums may be derived from analysis of long-term standardised data from the tide gauge, such as mean sea level over a specified period, mean low water and so on. Sea-level data exist with their zero value set to any one of these. What people have been asking for from CF is a way of labelling sea-level data in CF with the definition of its zero value (its ‘datum’).

Unfortunately ‘datum’ also has a very specific meaning as the zero of the vertical component in a CRS. Sometimes the ‘zero’ of sea-level data has been mapped to a standardised geoid and forms part of a CRS, in which case all that needs to be specified is the 3-D CRS. However, this usually isn’t the case and often data are presented labelled with either a 2-D CRS defining their position (e.g. lat/lon) or a projection of a 2-D CRS (e.g. UK National Grid eastings and northings) plus a ‘datum’ that isn’t part of a CRS. This minor semantic difference in ‘datum’ has caused no end of confusion in conversations between oceanographers and geographers.

One of the things I really like about Phil’s proposal is that it delivers a mechanism for handling both the oceanographers’ datum and the geographers’ datum in a standardised way.

Roy.

comment:15 in reply to: ↑ 14 Changed 12 years ago by jonathan

Dear Roy

Thanks for your useful comments.

(1) I strongly feel that duplication of labelling in both human-readable and machine-readable form is highly desirable. ... it should be permissible to enrichen CF by including hooks to external resources. URNs are a standardised encoding of these hooks.

I don't like duplication. I wonder what, in fact, we gain by making a link to an external resource through a URN. For the netCDF file to be self-describing, it has to contain the definition of the projection and the ellipsoid. We already have ways of doing some of that, and Phil has supplemented them, especially for the ellipsoid. What *extra* information would you get by being able to look up the URN elsewhere?

I do agree that it is useful to include a recognisable name. That is helpful because it means a human will know that this is a particular well-known case, whereas they might not recognise the array of attributes that define the projection or ellipsoid as being that case. But even this amount of redundancy is a pitfall. Will we require the cf-checker, for instance, to verify that the projection is really the one named, if a name is given? If we can't guarantee it is correct, then the name may be misleading, and it would not be safe to compare projections just by comparing their names. That suggests the names should be regarded as informal information. If we include a URN as well, that's definitely formal, and it is a necessity that it must be consistent with the name; but if it is consistent, why give both in the netCDF file? Software could look up the name in an external table to find the URN.

Your explanation of vertical datums was helpful. Still, I don't think we ought to add to the CF standard without fully understanding (as I don't) what information is needed in practice. I think that we need a use-case for this. We have had requests for vertical datum definition in the past. If someone has a real case that we can study involving the need to tie data to a sea-level-related datum, that would be helpful.

We also have to bear in mind that CF is used for model data too. That is a reason why I suspect that ellipsoid and projection ought to be separated. A model might use a projection but assume a spherical Earth, for example. As I've said before, I am less keen on the "combination" of projection and ellipsoid in a CRS. I think it should be sufficient to describe them separately in standardised ways.

Best wishes

Jonathan

comment:16 in reply to: ↑ 1 Changed 12 years ago by pbentley

Replying to jonathan:

Firstly, apologies for the delay in responding to the various comments on this proposal. I've been on leave for the past couple of weeks. It's going to take me a while to reply to each comment.

The proposal as it stands is quite long. It might stimulate discussion if you could paste into the trac ticket the essential details of the proposal and the examples, so that everyone sees them.

Owing to the length of this particular proposal I thought it better, on this occasion, to attach it as a PDF in order to make it easier to print and read. IMHO the current Trac interface is not well suited to lengthier submissions; perhaps this is something that the committee needs to look into.

You are right in saying there have been previous email discussions on this issue, so I hope that others may have views on your proposal. I think we do need to develop the CF standard in this area, but I would suggest that we could restrict the changes to the particular areas where needs have been identified in the past (following our usual principle of doing things when required), which are for definitions of the reference ellipsoid and the vertical datum.

My understanding (and steer) was that there was a requirement to be able to specify a range of general CRS-related properties, a fact which I intimated in the first two paragraphs of the Scope and Purpose section. It's true, however, that I was seeking a generic solution to this particular requirement, principally because the CRS domain is - in my experience - best treated in a holistic, integrated fashion. If I've missed the point, and all you require is to be able to specify the figure of earth and vertical datum, then I suggest we consider 'parking' this proposal and submitting a much more restricted one. My own view, however, is that we would be better served trying to look ahead and anticipate related CRS definition needs.

Would it be correct to say that we could describe the reference ellipsoid (the figure of the Earth) using your attributes ellipsoid_id ellipsoid_name inverse_flattening semi_major_axis and semi_minor_axis? I am not sure that the definition of the ellipsoid should be part of a grid_mapping. There are quantities whose value depends on the reference ellipsoid (e.g. to do with satellite altimetry), regardless of the coordinate system you are using, and up to now grid_mapping has only been used to define what you would call non- geographic coordinate systems. Perhaps we should introduce a separate dummy variable (like grid_mapping in intent, pointed to from the data variable) to carry the geodetic information.

Sensu stricto, one only needs to specify two of the three parameters semi-major axis, semi-minor axis and inverse flattening to define the ellipsoid. The other pieces of information are supplementary, but variously useful depending on usage/context. As to where to specify this information, yes it would be possible - perhaps even sensible - to define it under a separate variable, e.g.

int ellipsoid:
   ellipsoid:semi_major_axis=1287368
   ellipsoid:semi_minor_axis=3947598
   ...

But by the same token you'd probably then want to do the same for geodetic datum, map projection, and so on. Where would you stop? BTW, the grid_mapping attribute is in fact currently used to define, indirectly, geographic coord system parameters (e.g. longitude_of_central_meridian).

One option might be to collect all CRS definitions under a variable called 'crs' (or maybe 'crs_definition'). This, of course, would assume that all spatial variables in the netCDF file are based on the same CRS.

The fundamental problem seems to be that we need to encode a hierarchical CRS data model into netCDF's flat metadata encoding scheme.

The vertical datum is identified using your attributes vertical_datum_id and vertical_datum_name, which could also be attached to a geodetic dummy variable. However I am not clear how they actually work. What does it mean to say that we are using the OSGB 1936 datum? Somehow it must pin down the vertical axis - is it something like at a particular (lat,lon) location on the ellipsoid specified the value of altitude is zero? Is that the kind of info implied by the vertical_datum_id?

I have limited experience of working with vertical datums. My understanding is that the vertical datum ID/name attributes merely identify the vertical datum against which a series of vertical coordinates are referenced. In the same way that lat/long coordinates are defined in the context of a horizontal (i.e. geodetic) datum, e.g. WGS 1984, we don't usually need to know the definition of that datum. Historically they have been defined empirically by a series of measurements, e.g. the triangulation of the UK in the case of the OSGB 1936 datum, and tidal measurements at Newlyn in Cornwall in the case of Ordnance Datum Newlyn. Today, of course, most new geodetic datums are defined using satellite observations.

I think your projection_name (your example of "British National Grid") is potentially useful as extra description, and that does belong in the grid_ mapping, where the projection itself is defined as already specified by the standard. I'm not convinced, however, that additional identification of the overall coordinate reference system (the crs attributes) is useful. That info is redundant, if we are separately specifying the things it implies (reference ellipsoid, projection, vertical datum). Any redundancy is liable to lead to inconsistency, and it's not obvious to me why it would be valuable.

At the end of the day, all these attributes are optional and so it will be up to the data creator to decide if they are worth recording. My own view is that many of these attributes will not be directly meaningful or useful to most end-users, but rather will be particularly useful to software clients, esp. GIS and visualisation applications.

perspective_point_height would be needed for specifying a projection from a satellite view but so far no-one has asked for that projection, so perhaps we should leave it until they do.

IMO, it's easier to add it now rather than wait until someone expressly asks for it, in which case we then have to go through a further revision cycle!

--Phil

comment:17 in reply to: ↑ 2 Changed 12 years ago by pbentley

Replying to tomgross:

Hi Thomas,

So any vertical data point must have an attribute stating the reference. I would request an attribute "reference" which would be set equal to the crs local name in your system, or it could be set equal to the local name of another variable with the same horizontal dimension as the original data, or perhaps the chain of references will end with reference="MLLW" or some such named local convention.

This looks to be a non-trivial problem! Are you saying that different variables in the netCDF file will be based upon (or perhaps derived from) different vertical CRSs that are chained together in some kind of parent-child hierarchy? I'll need to look back at the OGC docs to see if they can provide any guidance in this respect. Any chance you could email me a real-world example to help my understanding?

As a general observation, though, I think the attribute name "reference" is too vague. I'd suggest something more expressive, such as "parent_vertical_datum" or "source_vertical_datum", and that the value of this attribute should always point directly to a vertical CRS definition (rather than indirectly via an intermediate variable).

--Phil

comment:18 in reply to: ↑ 3 Changed 12 years ago by pbentley

Replying to caron:

Hi John,

Many of the elements are described as "Well-known name of the XXX ...", and the ids are "Identifier of the XXX as defined by a controlled vocabulary or by an external authority". Are the "well known names" controlled or just human redable?

Good question! My take on this is that the well-known names can also be considered controlled in that, to the best of my knowledge, every CRS concept identified by a unique ID (e.g. the OGC URN for CRS 4326) has an associated human-readable, well-known name (e.g. "WGS 1984"). However, if software agents do not enforce or, preferably, facilitate the selection of correct well-known names then the name attribute is effectively uncontrolled (but then this applies to any attribute I suppose!)

Your example has, eg:

crs:crs_id = "urn:ogc:def:crs:EPSG:6.3:4326" ;

I find these kinds of ids rather opaque. Its unlikely that many in our community will know what to put in there (I certainly dont). So one possibility is that we list the ids and names of the ones actually in use in CF community, thus acting to educate and document these for ourselves.

Yes, I agree that's a good idea. And I don't think the core list would be too long.

  • "scale_factor" will conflict with existing attribute convention.

Yes, but I was hoping that it would be clear from context (i.e. it's an attribute of the grid_mapping variable) what it refers to. In over 20 years of working with map projections, I've never seen it referred to as anything other than 'scale factor'.

  • the Java library already has coded a perspective_point_height / satellite perspective view mapping (for eumetsat data i think), and i've been intending to propose adding it.

Great. I'll consider it seconded!

--Phil

comment:19 in reply to: ↑ 4 Changed 12 years ago by pbentley

Replying to lowry:

Hi Roy,

Read Phil's document and it encapsualtes all the steer I've been getting from CRS experts. Use of URNs is coming highly recommended. Linking to EPSG is a no-brainer. Is there a resolver for OGC URNs that leads to an XML document describing the resource.

I've only ever been aware of the EPSG geodetic database as an MS Access or, more recently, a vanilla SQL script database. However, I've just checked the web site today (http://www.epsg.org/Geodetic.html) and I noticed that a link has appeared to an "Online Registry". Unfortunately the link was down when I tried it :-( It would certainly be v useful if the site supported a web service that served up an XML document given a URN.

Populating these attributes won't be easy and if we adopt them - as I think we should - some guidance notes would be a good idea.

Indeed. My original proposal document was in fact even longer because I had included fuller descriptions of the attributes (plus some supplementary sections that I excised completely). In the end, though, I had to do some serious pruning just to get the present document down to its current state of verbosity!

--Phil

comment:20 in reply to: ↑ 5 Changed 12 years ago by pbentley

Replying to rsignell:

Hi Rich,

Is the specification we are proposing for CF compatible with these example NetCDF files produced by ArcGIS?

Whatever we do, it would be nice if our CF compliant files were able to be read by the big GIS packages like ArcGIS. If we do something different, I guess they will have to add the new conventions, but keep their "esri_pe_string" for backwards compatibility, introducing redundancy?

Thanks for producing the ArcGIS netCDF examples - they're very useful. We use ArcGIS here at the Met Office but it hadn't occurred to me to look at their netCDF output. I think I'd assumed that it would be vanilla CF-netCDF !

As it stands at present, this proposal does not conform to the ArcGIS way of recording CRS properties. Although I think we should be prepared to adopt or adapt vendor-devised solutions, where appropriate, we also need to be neutral and not favour a particular vendor or vendor product. (And this from someone who used to work for ESRI!)

Interestingly, in one of my early emails to the mailing list regarding CRS attributes, I recall that I did suggest using the CRS WKT string in a similar manner to the 'esri_pe_string'. However, I was of the understanding that CF attributes are primarily used to encode simple, atomic pieces of data, not compound ones such as the CRS WKT string, which would require fairly complex parsing by client software. Hence the reason for 'unpacking' the CRS definition into separate items in this proposal.

I can see, though, that this does represent a very succinct mechanism for encoding the standard CRS properties. Should we then consider the use of a new attribute called "crs_wkt" (this name being vendor-neutral)? The question then is, do we make this the sole attribute for recording CRS properties, or do we include it as an optional extra attribute?

--Phil

comment:21 Changed 12 years ago by pbentley

Hi Folks,

I can see that this discussion thread has become fairly convoluted. Not helped by me! Rather than continue to respond to each comment in turn, I'll try to condense my remaining replies and observations below.

  1. Discussions here and on the CF mailing-list have concluded that the 'crs_id' attribute is definitely desirable. I think a decision needs to be reached as to whether this attribute (and others) should be assigned to data variables or to grid_mapping variables.
  1. The 'crs_id' attribute should encode URNs from the OGC namespace until such time as another authority for CRS definitions emerges. The CF community is not an authority for CRS URNs and therefore should not seek to invent its own.
  1. There is strong steer for including a 'crs_wkt' attribute which can be used to encode a compound CRS well-known text definition based upon the OGC specification of a CRS WKT string. This will enable better interoperability with commercial software packages.
  1. Such a 'crs_wkt' attribute should, I think, be assigned to data variables and NOT to grid_mapping variables since a CRS WKT item is not a sub-element of a grid mapping (in fact I'd say the latter is a sub-element of the larger CRS definition).
  1. For the foreseeable future both URN-style '_id' attributes and human-readable attributes need to be supported.
  1. If we decide to adopt a compound 'crs_wkt' attribute then we could consider dropping the various individual CRS attributes (ellipsoid_id, ellipsoid_name, scale_factor, etc) from the current proposal. This does not preclude them from being reconsidered as part of a subsequent proposal.
  1. If we did decide on 6, and further agree to making 'crs_id' and 'crs_wkt' attributes of data variables, then we do not (I think) need to make any changes to the grid_mapping specification. This would make the current proposal considerably leaner.
  1. It appears that further investigation needs to be undertaken, and examples produced, in connection with derived vertical coordinate systems. However, I believe this could be 'retro-fitted' without impacting on the ideas encapsulated in 1 - 7 above.

Based on the aforementioned, a CDL snippet might look thus:

variables:
   float temp(time, lat, lon) ;
      ...
      temp:crs_id = "urn:ogc:def:crs:EPSG:6.3:4326" ;
      temp:crs_type = "geographic_2d" ; // Not essential, but may be a useful hint to software.
      temp:crs_wkt = "GEOGCS[\"GCS_WGS_1984\",DATUM[\"D_WGS_1984\",SPHEROID[\"WGS_1984\",6378137.0,298.257223563]],PRIMEM[\"Greenwich\",0.0],UNIT[\"Degree\",0.0174532925199433]]" ;
      temp:grid_mapping = "geographic" ;
      ...
   int geographic ;
      // Defined as per normal.

One corollary of this scheme is that the 'crs_wkt' attribute would potentially need to be duplicated for each data variable that uses it. Ctrl-C, Ctrl-V is your friend ;-)

Thanks for your insightful contributions thus far. If nothing else, it's clear that this is a large and complex area, but one that the CF community needs to get to grips with!

--Phil

comment:22 follow-up: Changed 12 years ago by rsignell

Phil,

I support the idea of both 'crs_id' and 'crs_wkt' attributes, but assigned to the grid_mapping variable.

I feel that 'crs_id' and 'crs_wkt' just represent an efficient encoding of the other information in the grid_mapping, so belong with the grid_mapping. There is no need for more than one 'crs_id' and 'crs_wkt' attribute per grid_mapping variable. And I think having those long WKT strings assigned to each variable would actually make the results of "ncdump -h" rather harder to read!

I know that ESRI is currently putting the WKT string in each variable, but as previously said, we don't have to bow to a particular vendor. And arguably the most popular open source coordinate conversion software, GDAL, assigns their WKT to the grid_mapping variable.

-Rich

comment:23 in reply to: ↑ 22 Changed 12 years ago by pbentley

Replying to rsignell:

Rich,

I feel that 'crs_id' and 'crs_wkt' just represent an efficient encoding of the other information in the grid_mapping, so belong with the grid_mapping. There is no need for more than one 'crs_id' and 'crs_wkt' attribute per grid_mapping variable. And I think having those long WKT strings assigned to each variable would actually make the results of "ncdump -h" rather harder to read!

I'd be happy to endorse your suggestions. In the longer term I think that the grid_mapping moniker should perhaps be deprecated in favour of something more descriptive (I suggested 'crs' or 'crs_definition' in one of my previous responses). However, I can see the benefits to current software tooling for putting the new attributes under the existing grid_mapping variable.

--Phil

comment:24 follow-up: Changed 12 years ago by jonathan

Dear Phil et al.

More than three weeks have passed since the last modification to this ticket; unfortunately I haven't had time to work on it. I've now reread the discussion and also read a few other documents to help me understand this subject a bit better. Phil, thank you for your summary on 4 Oct. I echo your thanks to all contributors for thoughtful comments, some of which unfortunately I didn't understand the first time round - sorry about that. The following is a mixture of summary and my own further comments and suggestions.

  1. The proposed crs_id attribute (e.g. urn:ogc:def:crs:EPSG:6.3:4326 for WGS84) is popular, because (a) some software might be able to identify it and thus know what formulae to use without inspecting other attributes, and (b) in the OGC database there might be extra information about the coordinate reference system, not included in other CF attributes. You and Rich Signell agreed that crs_id should be an attribute of the grid_mapping_variable to avoid duplication on every relevant data variable.
  1. In the discussion it was proposed also to define an attribute which concatenated information that defines the CRS, for instance in OGC "well-known text" format. This would duplicate information that is currently stated by the projection parameters of the grid_mapping variable and that would be stated by other proposed new attributes that define the ellipsoid. Such redundancy could lead to inconsistency in the file, which I think is not a good idea. It is worth mentioning that we introduced the current format of the grid_mapping variable because a previous discussion concluded (a) that an attribute containing a concatenation of information was awkward for humans to read and inconvenient for software to parse, so it was better to have separate attributes; (b) that we should avoid having to repeat this information as an attribute of every data variable. As a matter of fact, I personally was in favour of an attribute of the data variable of the form "projection_name: transverse_mercator scale_factor: 0.9996 false_easting: 500000 ...", which is more like what you are proposing now. But although I was in favour of that, I would argue now that since we decided then to go down the road of separate attributes and the grid_mapping variable, we should continue in that direction and not repeat this information as an attribute of the data variable. Does anyone have a comment on this?
  1. John Caron suggested and you agreed that it would be a good idea to maintain our own list of the crs_id values in use in CF. Another reason for doing that is that the OGC database is distributed in Microsoft Access format, which is probably not convenient for CF-aware applications in general. I would like to propose that we should maintain an xml file as part of CF that lists the translation of crs_id into crs_name and the parameters which define the CRS. This would enable the CF checker to check that the crs_id was consistent with the other grid_mapping parameters stated in the file. It could also produce a warning or an error if the crs_id was not known to CF.
  1. The crs_id implies (where relevant) the projection, datum, ellipsoid and prime meridian, so I would argue against introducing the proposed attributes of projection_id, datum_id, ellipsoid_id and prime_meridian_id. If relevant, they could be recorded in the xml translation table for crs_id. However, the equivalent names crs_name, ellipsoid_name, geodetic_datum_name and projection_name would make the file more human-readable if they were included, so I'm in favour of them.
  1. A general reason for needing to include the separate parameters of the projection and ellipsoid, even though they may be implied by the crs_id, is that they allow choices to be made which are not in the OGC database. I expect that any CRS we are likely to need for the real world is in the database, but this is not necessarily true for the model world. A model might use some other projection or ellipsoid for which no crs_id exists.
  1. You propose a new grid_mapping parameter perspective_point_height for a new class of perspective projections. John Caron said such projections have been encoded in software. However I don't think we can add this attribute until/unless we also include grid mapping(s) which need it as a parameter in Appendix F, because that is where it would go in the standard. Could you or John include the details of such a projection as part of the present proposal?
  1. You suggest renaming the attributes longitude_of_central_meridian, scale_factor_at_central_meridian and scale_factor_at_projection_origin. I would point out that we introduced these parameter names because they are used by FGDC, which is another authority we were urged to be consistent with. Hence I'd suggest leaving them as they are.
  1. The term "grid mapping" isn't synonymous with "coordinate reference system" as discussed by OGC. It was chosen as a rather more general term that can encompass any relationship between index space and geographical coordinates. It currently includes the rotated-pole transformation, and it could include descriptions of numerical grids constructed by tiling the world with polygons. On these grounds it seems a useful term to me.
  1. If we understand CRS in the sense of your Appendix B, I don't think there is a need for the crs_type attribute. Any software which recognises the crs_id will know the crs_type (from our xml table, for instance), while if it doesn't recognise the crs_id it will be able to deduce what to do from the separate projection and ellipsoid parameters.
  1. From your discussion with Tom Gross it appears to me that we may not yet be in a position to address the issue of vertical coordinates and datums. I now understand (I hope correctly) that to relate an altitude (orthometric height, height above the geoid) to the geodetic datum (the reference ellipsoid) you need a geoid model as well. A geoid model is also included in WGS84, for example, but this is far more complicated than the reference ellipsoid and could not possibly be included as CF metadata, since the geoid has important and irregular variations on small spatial scales. The "Ordnance Datum Newlyn", for example, implies a particular geoid, which is described with reference to an ellipsoid. There may be well be a need to state which geoid is in use, and I believe this has been raised in the past on the email list, but it's not the same issue as the reference ellipsoid and projection. We could provide an attribute that named the geoid from a possible list specified as part of the CF standard, for example. Any comments?
  1. Given the above points, I would advocate defining crs_id, crs_name, ellipsoid_name, geodetic_datum_name, inverse_flattening, prime_meridian_longitude, semi_major_axis and semi_minor_axis as attributes of the grid_mapping variable. According to your appendix B, the prime meridian is part of a geodetic datum, which is implied by a CRS. Hence the prime_meridian_longitude is allowed if the crs_id is specified or the ellipsoid defined, but it could default to 0, I suggest. The CF checker could check it, and also that the ellipsoid parameters are consistent if they are all specified, since there is redundancy among them. Giving the parameters of the ellipsoid is useful because it permits conversions between lon-lat and Cartesian coordinates.
  1. If we are to agree such changes to CF, we need a proposal which states exactly and explicitly what textual changes are to be made, and also what changes are required to the conformance document. This is so we can ask PCMDI to make the requisite changes in the documents.

Best wishes

Jonathan

comment:25 in reply to: ↑ 24 Changed 12 years ago by pbentley

Replying to jonathan:

Hi Jonathan,

Thanks for summarising the key points of agreement w.r.t. this proposal. I shall try to compose a follow-up proposal (as a separate Trac ticket) over the next few weeks. Hopefully this will be (i) more concise; (ii) in a format suitable for updating the CF document; and (iii) acceptable to the CF community with minimal further comment and discussion.

Best wishes

--Phil

comment:26 Changed 12 years ago by jonathan

Thanks for your work on this, Phil, and thanks to all for comments. I will close this ticket when Phil opens a new one, or in three weeks' time if no further comments are made, whichever is sooner. In the meantime I'll leave it open in case anyone wants to record further comments for Phil to consider in reformulating his proposal.

Jonathan

comment:27 Changed 12 years ago by jonathan

  • Resolution set to wontfix
  • Status changed from new to closed

Superseded by ticket 18.

Note: See TracTickets for help on using tickets.