Opened 3 months ago

Last modified 3 months ago

#157 new defect

Clarification to Section 2.3 - Naming Conventions

Reported by: ros Owned by: cf-conventions@…
Priority: medium Milestone:
Component: cf-conventions Version:
Keywords: Cc:

Description

It is a requirement that variable, dimension and attribute names begin with a letter and be composed of letters, digits and underscores, however, the wording in the CF conventions document means it can be interpreted as only a recommendation.

I therefore propose we change

"Variable, dimension and attribute names should begin with a letter and be composed of letters, digits, and underscores. Note that this is in conformance with the COARDS conventions, but is more restrictive than the netCDF interface which allows use of the hyphen character. The netCDF interface also allows leading underscores in names, but the NUG states that this is reserved for system use."

to read

"Variable, dimension and attribute names must begin with a letter and be composed of letters, digits, and underscores (with the exception of the NUG defined attribute _FillValue). Note that this is in conformance with the COARDS conventions, but is more restrictive than the netCDF interface which allows use of the hyphen character. The netCDF interface also allows leading underscores in names, but the NUG states that this is reserved for system use."

Since this is a defect ticket, which aims to clarify the convention but not to alter it in meaning, it will be accepted by default unless there is an objection or alternative suggestion.

Regards,
Ros.

Change History (3)

comment:1 Changed 3 months ago by heiko.klein

Hi Ros,

we store several chemical components or isotopes in CF-compliant files. These components often have a tendency to start with digits rather than letters and I used the freedom of 'should' to name these variables e.g. like "133Xe_concentration". I even send a ticket to the netcdf-group when versions of netcdf (3.6-4.0?) forbid writing digits as first character in variable-names. Most programs I know have no problems with these variable-names (ncap2 does have problems, but I never found out if escaping of variable-names is possible).

Most 'should' rules in CF exist for compatibility with COARDS. Changing a 'should' rule to a 'must' rule is definitely a change and not a defect in CF, and in this case just to satisfy the COARDS convention, which I haven't seen in active use any longer. CF does not enforce any other restrictions on variable-names.

The only 'defect' in the variable-name description is the missing definition of 'letter', which might have been a-zA-Z in a ASCII sense from netcdf, but since NUG from netcdf-4 could include umlauts and is UTF-8.

Heiko

comment:2 Changed 3 months ago by davidhassell

Hello,

Starting an attribute with a [0-9] (e.g. 133Xe_concentration) is a different case to starting it with an underscore (e.g. _myAttribute), because of the netCDF "system use" rule. I think that I'm in favour of disallowing leading underscores, because of the potential conflicts between the netCDF and CF namespaces. Could we allow:

Variable, dimension and attribute names must begin with a letter or digit ...

Are there many existing dataset which use leading underscores?

David

comment:3 Changed 3 months ago by heiko.klein

Hi,

concerning beginning underscores. We have alread one exception for attributes with _FillValue. Currently we have to add some other attributes starting with _ because netcdf-java needs them, e.g. _CoordinateAxisType when distributing through thredds. I wouldn't like to invalidate these datasets.

I see any good reason why we would need a must rather than a should for variables/attributes and dimensions? Where does this cause problems or misunderstandings?

Heiko

Note: See TracTickets for help on using tickets.