Opened 6 years ago

Closed 9 months ago

#86 closed enhancement (fixed)

Allow coordinate variables to be scaled integers

Reported by: rhorne@… Owned by: davidhassell
Priority: medium Milestone:
Component: cf-conventions Version:
Keywords: coordinate variables, scaled integers Cc:

Description

Currently, a coordinate variable that makes use of the standard NetCDF scaled integer convention is not CF metadata compliant.

In addition to the reduction in NetCDF dataset (file) size, there are other advantaged to allowing it including simplifying product formatting software, and having an attribute (scale_factor) that provides insight into the spatial resolution of the data variable.

To update the CF metadata conventions standard document requires changing a two rows in Table A.1 Attributes in Appendix A.

(1) In the "add_offset" row, the "Use" column needs to change from "D" to "C, D"

(2) In the "scale_factor" row, the "Use" column needs to change from "D" to "C, D"

Change History (17)

comment:1 Changed 6 years ago by jonathan

Dear Randy

Thanks for opening this ticket. The email discussion on this subject suggested that this change would be seen by most people not to be a material change to the convention, in which case it's fine to propose correcting it as a defect. If anyone does think this is a material change and they are concerned about it, they should comment on this ticket. Otherwise, silence indicates acceptance - that's the rule for correcting defects!

Cheers

Jonathan

comment:2 follow-up: Changed 5 years ago by rhorne@…

Is there anything left to be done for getting for this trac item approved ?

comment:3 in reply to: ↑ 2 Changed 5 years ago by davidhassell

Replying to rhorne@excaliburlabs.com:

Is there anything left to be done for getting for this trac item approved ?

Hello Randy,

Many apologies for jumping in so late - I've not thought about this for a while, but should have done so.

I support the proposed change which allows for the reduction in netCDF dataset (file) size, but I have a concern about its other suggested purpose.

From the e-mail list discussion, it sounds like you would like to know what the real coordinate values are but not use them, i.e. you would like to not apply the additive offset or scale factor at the time of reading.

If that's right, how would the software reading the data know, when presented with add_offset or scale_factor, whether or not to unpack values? 8.1 of the conventions suggest to me that unpacking is to be expected at the time of reading.

Assuming I'm still holding the right end of the stick, could I re-suggest that using scaled and offset units for this purpose is more appropriate? For example:

        float longitude(longitude) ;
                longitude:units = "0.5 degrees_east @ 98" ;
                longitude:standard_name = "longitude" ;

data:

 longitude = 0, 1, 2, 3 ;

is completely equivalent to:

        float longitude(longitude) ;
                longitude:units = "degrees_east" ;
                longitude:standard_name = "longitude" ;

data:

 longitude = 49.0, 49.5, 50.0, 50.5

By using the units property in this way, it seems to me that the issue of "do I or don't I unpack" goes away and you still have the flexibility you require in your software - because you can choose whether or not to convert the units back to 'degrees_east'. The offset and scale are easily extracted from the units string.

One of these products spans the 180 degrees_east longitude (i.e. -180 degrees_east longitude) line. The implication being that the software resolving the scale factor and additive offset needs to be cognizant of the longitude coordinate variable "units".

Using the units property should also solve this, I think.

If I have misunderstood your use case, please let me know!

All the best,

David

comment:4 Changed 5 years ago by Dave.Allured

Randy,

I am concerned that this proposal would require updates and increase complexity for many existing Netcdf readers, including general purpose software as well as my own code. In my case, I read coordinate variables directly through the Netcdf API, not through a general purpose data interface that would automatically unpack. I have never included the unpacking code in my coordinate readers. I acknowledge that the added software complexity would be slight in each individual case, but it adds up.

Also consider that the whole concept of scale/offset packing, just for data arrays alone, is confusing to some Netcdf beginners, generating significant amounts of user list and help desk traffic. The latter is in part from local experience within my work group.

For spatial resolution, I would suggest adding a simple non-CF attribute such as "resolution = 0.5" or "delta_x = 0.5" or "step = 0.5", or perhaps "step_range = 0.44, 0.5" for non-rectilinear coordinates. IMO this is easier for human understanding than "scale_factor = 0.5", for the specific purpose that you cited.

Given that one of your justifications is simplifying writer software, I am left wondering whether the value added is worth the trouble. The status quo for coordinate variables seems to be working very well for a long time (excepting time coordinates, ouch, different topic).

Do you have a particular situation where the reduction in file size would be significant? Do you know of any general purpose software that already handles packed coordinate variables?

--Dave

comment:5 follow-up: Changed 5 years ago by rhorne@…

Dave:

As it turns out, we have come to conclude there is no need for the user applications (at least in our use cases) to know the "packed" coordinate variable values, but the other rationales still apply.

I would also tend to believe that there is no need for any user application to need to know the packed coordinate variable value because of its inherent correspondence with the data variable.

This topic was also taken up on the CF message board (http://mailman.cgd.ucar.edu/pipermail/cf-metadata/2012/055468.html). Note that this capability is provided by the netcdf-Java library

very respectfully,

randy

comment:6 in reply to: ↑ 5 Changed 5 years ago by Dave.Allured

Randy,

Thank you for the link to the original discussion, that was helpful. My concerns about existing software were already mentioned 11 months ago by Jonathan Gregory and Mike Grant. Common libraries or tools that do not unpack coordinates automatically include netcdf-fortran, NCL, and NCO utilities, as well as netcdf-C which was mentioned earlier.

If you could somehow include a caveat that packed coordinates should only be used for particular applications with significant file size improvement, there might be fewer user cases needing code updates. The previously mentioned swath data is an example.

--Dave

comment:7 follow-ups: Changed 5 years ago by rhorne@…

Dave:

Will the following revision to the proposed update to the CF standard alleviate the concern ?

To update the CF metadata conventions standard document requires changing a two rows in Table A.1 Attributes in Appendix A.

(1) In the "add_offset" row, the "Use" column needs to change from "D" to "C, D"

(2) In the "scale_factor" row, the "Use" column needs to change from "D" to "C, D"

In addition in the following statement will be added to the "Description" cell in Table A.1 Attributes for attributes scale_factor and add_offset.

"The use of packed coordinate variables is allowed, but not preferred to primarily support use cases where dataset sizing constraints exist."

very respectfully,

randy

comment:8 in reply to: ↑ 7 Changed 5 years ago by davidhassell

Hello Randy,

I'm fine with this.

Many thanks and all the best,

David

comment:9 in reply to: ↑ 7 Changed 5 years ago by Dave.Allured

Replying to rhorne@excaliburlabs.com:

Randy,

That will be sufficient. Thank you for including this caveat.

--Dave

comment:10 Changed 5 years ago by rhorne@…

Folks:

Almost a month has elapsed since the last correspondence. The subject of this trac item captures the intent of this change "Allow coordinate variables to be scaled integers".

To update the CF metadata conventions standard document requires changing a two rows in Table A.1 Attributes in Appendix A.

(1) In the "add_offset" row, the "Use" column needs to change from "D" to "C, D"

(2) In the "scale_factor" row, the "Use" column needs to change from "D" to "C, D"

In addition, the following statement will be added to the "Description" cell in Table A.1 Attributes for attributes scale_factor and add_offset.

"The use of packed coordinate variables is allowed, but not preferred to primarily support use cases where dataset sizing constraints exist."

I support this enhancement. Does the conventions committee support this enhancement ?

very respectfully,

randy

comment:11 Changed 5 years ago by jonathan

Dear Randy

The rule for defect tickets is that if no-one objects for three weeks, they are accepted by default. That seems to be the case here.

Reading your sentence, though, I find I am not quite sure what it means! "The use of packed coordinate variables is allowed, but not preferred to primarily support use cases where dataset sizing constraints exist." Does it mean, "In cases where there is a strong constraint on dataset size, it is allowed to pack the coordinate variables (using add_offset and/or scale_factor), but this is not recommended in general."

Cheers

Jonathan

comment:12 Changed 5 years ago by rhorne@…

Jonathan:

Thanks for the clarification on the rules. I think your wording of the caveat is more clear. Here is the update.

To update the CF metadata conventions standard document requires changing a two rows in Table A.1 Attributes in Appendix A.

(1) In the "add_offset" row, the "Use" column needs to change from "D" to "C, D"

(2) In the "scale_factor" row, the "Use" column needs to change from "D" to "C, D"

In addition, the following statement will be added to the "Description" cell in Table A.1 Attributes for attributes scale_factor and add_offset.

"In cases where there is a strong constraint on dataset size, it is allowed to pack the coordinate variables (using add_offset and/or scale_factor), but this is not recommended in general."

very repspectfully,

randy

comment:13 follow-up: Changed 5 years ago by mggrant

No objections and no intent to delay this, but if you're tinkering with the recommendation line, you could also add a note that activating netcdf-4 internal compression is also preferable as a first option, where this is practical, e.g.

"In cases where there is a strong constraint on dataset size and netcdf-4 internal compression is unavailable or insufficient, it is allowed to pack the coordinate variables (using add_offset and/or scale_factor), but this is not recommended in general."

comment:14 in reply to: ↑ 13 ; follow-up: Changed 5 years ago by Dave.Allured

  • Keywords integers added; inetegers removed
  • Type changed from defect to enhancement

Replying to mggrant:

Mike,

It seems to me that the advice to consider netcdf-4 compression is too general to be used in the specific context of coordinate variables. Does this need to be restated for every other space-saving method within CF?

--Dave

comment:15 in reply to: ↑ 14 Changed 5 years ago by mggrant

Replying to Dave.Allured:

I have no issue with that comment not being included - I mentioned it only as packing is a method that breaks "normal" usage and is something that people in my field tend to reach for without thinking about other, perhaps better options first. The general advice not to use packing is ok.

comment:16 Changed 11 months ago by davidhassell

  • Owner changed from cf-conventions@… to davidhassell
  • Status changed from new to accepted

comment:17 Changed 9 months ago by painter1

  • Resolution set to fixed
  • Status changed from accepted to closed
Note: See TracTickets for help on using tickets.