Proposal for standard attributes actual_min and actual_max
|Reported by:||jonblower||Owned by:||cf-conventions@…|
It is very useful for data mining and visualization applications to know the minimum and maximum values of a particular variable in a NetCDF file, without needing to extract the entire variable and calculate this in the application. Here we propose a new pair of standard variables actual_min and actual_max that contain the min and max values of a variable.
The proposed new attributes would prevent misuse of the valid_min and valid_max attributes, which are intended to be used to delimit a valid data range, but in fact are often used to denote the actual range of data in a variable. The latter (mis)use leads to incorrect assessment of missing values by tools.
Supports data mining: It is a quick operation to find, say, all those data files that contain temperature values above 30degC.
Supports visualization: Visualization tools can use the actual_min/max to generate a sensible colour scale range for displaying the contents of a file. (However see caveats below.)
In the context of aggregations (by NcML or otherwise), the actual_min/max could be easily calculated by taking the minimum/maximum of the attributes of the components of the aggregation.
The new attributes represent redundant metadata and could be incorrectly generated or otherwise become inconsistent. Mitigation: Allow the CF-checker to check these attribute values if they are present in a file.
These attribute values would not be correct for a subset of data from the file and so any data subsetting tools must be aware of this and recalculate or remove these attributes from any data product subset.
Data values outside the valid_range would not be counted in the actual_min/max. An alternative nomenclature could be actual_valid_min/max (although personally I find this more confusing).
For visualization, the actual_min/max will not always represent the optimal scale range, particularly if examining a restricted geographical area, or when looking at data from a particular elevation. A more sophisticated solution could involve expressing actual_min/max as an array quantity, with a value pair for each elevation in the data volume. This increases the complexity of the solution and places extra burdens on data providers and tool developers (and does not completly solve the problems described).
Change History (23)
comment:22 Changed 8 months ago by jonathan
- Resolution set to fixed
- Status changed from new to closed