Opened 5 years ago

Last modified 5 years ago

#91 new task

review of standard_names for interoperability

Reported by: mgschultz Owned by: cf-standard-names@…
Priority: medium Milestone:
Component: cf-standard-names Version:
Keywords: Cc:

Description

This is related to track item #90 and goes back to the previous discussion on a grammar-based definition of standard_names.

In general, the standard_names are a great success story and are already widely used. However, in some cases, they carry a bit of legacy which may make them easier to read for humans, but harder to understand for software. A good while back, Jonathan had constructed a set of grammar rules which could help define new standard names. Unfortunately, this never made it into the mainstream. Furthermore, his set of rules was based on the existing standard_names, and therefore included some small incoherencies due to legacy names.

A simple example is "air_temperature". This is, of course, very well known. Yet, in a more formal system, it describes an intrinsic property of a physical medium. Therefore, the rule should be "<property>_of_<medium>", and the standard_name should be "temperature_of_air". Then it would be (more) consistent with all other standard names describing things "of_air" or "in_air", and it would be much simpler to apply the same rule to other compartments "temperature_of_land_surface", "temperature_of_ocean_water_at_depth", and so on.

If there is at least some support for this, I propose to review all existing standard_names and construct a new grammar with as few rules as possible. This should of course not start from scratch, but build on the existing grammar rules and standard_names as much as possible. The existing names could become aliases, so that backward compatibility would be ensured. Once the new grammar would be adopted, any new standard_name proposal should follow its rules, or a separate discussion would be needed why the grammar rules would need to be amended.

Side remark: while reviewing the names, an additional column should be added to the standard_name table to indicate whether a comment attribute is needed (recommended?) for this standard_name or not.

Change History (3)

comment:1 Changed 5 years ago by jonathan

Dear Martin

I have some sympathy for and some reservations about this. If people don't remember what I did before (quite likely!) and are interested, it can be found at http://www.met.reading.ac.uk/~jonathan/CF_metadata/14.1/. In that document, I also explained what I thought (and still think) about our approach to standard names.

I would say that the biggest need we have is not for a large overhaul of the standard name table or the philosophy, but easier and quicker systems for proposing and agreeing standard names. That was really my motivation for the earlier work. I think that if we can agree a syntax and lexicon for standard names, it would be relatively easy to set up web facilities to allow people to construct and propose new standard names which

  • Follow an existing pattern and use existing vocabulary, or
  • Follow an existing pattern with a proposed expansion to vocabulary (such as new chemical species)

If these cases could be semi-automated, it could save a lot of effort in searching the existing table for templates, making sure things are in the right order, correcting spelling mistakes, and so on. If this can be done, it does not really matter if the rules are in some cases quite subtle, because proposers won't have to become intimate with them. Quite a lot of proposals could be done like this. We have agreed, in previous CF discussions, that all names have to be agreed explicitly and added to the table, to avoid "green dogs" (in Roy's nomenclature) and to avoid introducing synonyms (some concepts can be describe in more than one way). However, proposals of the above two kinds are usually easy to agree.

Proposals for new names which do not follow an existing pattern (like your emission names) are not uncommon. I do not really see how we can avoid thinking hard about such cases. That would remain the main business of the email discussions.

Your simple example is not so simple, I would say. There may be a subtlety here. In terms of my syntax, air_temperature has the pattern "(medium) (scalar)" whereas specific_kinetic_energy_of_air is "(scalar) of_(medium)". I think these patterns are deliberately different. Maybe it is just English: "temperature of air" would be clumsy, while "air specific kinetic energy" would be an awkwardly long phrase. But also they feel different to me. It feels like temperature is more an attribute of air, whereas specific kinetic energy is a function of it. If there is anything in this, it reflects the fact the standard names began with the aim of being "what people say to describe a quantity" (in English), as far as possible, while also trying to be clear and unambiguous. You have also raised the interesting question of multilingual support. That would be fair. We could imagine having a lexicon and syntax in German too, and mapping the elements. Perfect automatic translation would be possible if we did a good job!

But this is not to say that improvements can't be made. If it would be useful, I could try to find the time to bring my analysis up to date. You could use this as a starting-point, perhaps. I am sure that some useful simplifications could be made, by merging some patterns at least. This would require aliases, as you say. As usual, I would tend to argue that it is useful if it would bring a substantial benefit to existing cases.

Best wishes

Jonathan

comment:2 Changed 5 years ago by mcginnis

I think Jonathan's point:

...standard names began with the aim of being "what people say to describe a quantity" (in English), as far as possible, while also trying to be clear and unambiguous.

is a really important one. Consistency is desirable, but it should yield to usability.

I would actually oppose changing air_temperature to temperature_of_air for this reason. Air temperature is probably the single most commonly recorded geophysical quantity out there, and everyone refers to it using that name. No-one struggles with confusion over what it means. If the standard_name were changed, it wouldn't resolve any major usage problems, and therefore I don't think the new name would be widely adopted; most people would just continue to create data sets using the old name, and the end result would be that we'd have two names in use for a common quantity, which is a net loss in usability compared to having one well-established name that isn't quite regular.

I like the idea of having an automated system to aid in the construction of new names. I'm less certain about the notion of changing well-established names solely to make them more regular. But perhaps the right answer is to make an initial assessment and see what might change; it may be that all the other changes would be clear improvements, and we can just grandfather "air_temperature" and be done with it...

Cheers,

Seth

comment:3 Changed 5 years ago by mgschultz

For the most part I agree with you. Indeed, I chose the "air_temperature" example on purpose in order to provoke such a remark. Meanwhile I have downloaded Jonathan's grammar tools and started playing with those (or rather with the output they generate). As an excercise, I reviewed all terms that contain "tropopause" and tried to sort them into phrasetypes manually, taking into account your quests for using standard english language.

Here are the results (do we actually have a wiki where such kind of content could be posted and evolve?):

Existing standard_names and possible aliases:

  • tropopause downwelling longwave_flux == downwelling longwave (radiative) flux at tropopause == downward (?)...
  • tropopause net_downward longwave_flux == net downward longwave (radiative) flux at tropopause
  • tropopause net_downward shortwave_flux == net downward shortwave (radiative) flux at tropopause
  • tropopause upwelling shortwave_flux == upwelling shortwave (radiative) flux at tropopause == upward (?) ...
  • tropopause air pressure == air pressure at tropopause == pressure of air at tropopause
  • tropopause air temperature == air temperature at tropopause == temperature of air at tropopause
  • tropopause adjusted longwave_forcing == adjusted longwave (radiative) forcing at tropopause
  • tropopause adjusted radiative_forcing == adjusted radiative forcing at tropopause
  • tropopause adjusted shortwave_forcing == adjusted shortwave (radiative) forcing at tropopause
  • tropopause instantaneous longwave_forcing == instantaneous longwave (radiative) forcing at tropopause
  • tropopause instantaneous radiative_forcing == instantaneous radiative forcing at tropopause
  • tropopause instantaneous shortwave_forcing == instantaneous shortwave (radiative) forcing at tropopause
  • dynamic_tropopause potential_temperature == potential temperature (of air) at dynamical tropopause
  • tropopause altitude == altitude at tropopause

In my view, this analysis brings up two (minor) inconsistencies in the present definitions:

  1. "shortwave_flux" should be replaced by "shortwave_radiative_flux" and "longwave_flux" by "longwave_radiative_flux"
  1. specify medium for "potential_temperature": "air_potential_temperature" or "potential_temperature_of_air" (leaving "air" out makes the implicit assumption that "potential_temperature" is always a property of air, and of nothing else. While this could be true, it requires knowledge which cannot be easily tested by computer systems. Here, again, I would advocate to adopt the more explicit name as standard_name and allow for an alias to please the human user).

In the next step, I come up with a new grammar:

  • downwelling (direction)
  • downward (direction)
  • upwelling (direction)
  • upward (direction)
  • net (prefix)
  • shortwave (wavelength-range)
  • longwave (wavelength-range)
  • tropopause (level)
  • radiative_flux (vector)
  • radiative_forcing (scalar)
  • air_temperature (scalar)
  • air_pressure (scalar)
  • [air_]potential_temperature (scalar)
  • altitude (scalar)
  • adjusted (pre-adjective)
  • instantaneous (pre-adjective)

BTW: shouldn't "downward" and "downwelling" also be treated as synonyms here?

Results of new grammar (hand-picked): Terms in [parantheses] are optional elements.

[net|total] (direction) [(wavelength-range)] (vector) at (level) == (level) [net|total] (direction) [(wavelength-range)] (vector)

  • tropopause downwelling longwave_flux == downwelling longwave (radiative) flux at tropopause == downward (?)...
  • tropopause net_downward longwave_flux == net downward longwave (radiative) flux at tropopause
  • tropopause net_downward shortwave_flux == net downward shortwave (radiative) flux at tropopause
  • tropopause upwelling shortwave_flux == upwelling shortwave (radiative) flux at tropopause == upward (?) ...

[(pre-adjective)] [(wavelength-range)] (scalar) at (level) == (level) [(pre-adjective)] [(wavelength-range)] (scalar)

  • tropopause adjusted longwave_forcing == adjusted longwave (radiative) forcing at tropopause
  • tropopause adjusted radiative_forcing == adjusted radiative forcing at tropopause
  • tropopause adjusted shortwave_forcing == adjusted shortwave (radiative) forcing at tropopause
  • tropopause instantaneous longwave_forcing == instantaneous longwave (radiative) forcing at tropopause
  • tropopause instantaneous radiative_forcing == instantaneous radiative forcing at tropopause
  • tropopause instantaneous shortwave_forcing == instantaneous shortwave (radiative) forcing at tropopause

(level) (scalar) == (scalar) at (level)

  • tropopause air pressure == air pressure at tropopause == pressure of air at tropopause
  • tropopause air temperature == air temperature at tropopause == temperature of air at tropopause
  • dynamic_tropopause potential_temperature == potential temperature (of air) at dynamical tropopause
  • tropopause altitude == altitude at tropopause

Observations and guesses:

  1. The equivalences could potentially help to reduce the number of phrase types considerably. This may not be apparent here, because the "tropopause" terms use the same order, but there are several other phrases with "at (surface)" which should be considered equivalent in meaning.
  2. It may be helpful to introduce a concept of "atoms" and "molecules". Example: "air_temperature" is a scalar, but at the same time it is (scalar) of (medium). In a phrase it should always be listed at its highest aggregation level (the "molecule"), but for some applications (automatic processing) it may be helpful to "know" that this molecule ultimately consists of a temperature and a medium. So, it would make physically sense to compare "air_temperature" and "cloud_top_temperature", for example. -- while this may not be relevant in practice for the time being, this concept could help to define the scalars and vectors more consistently.

Original grammar: 20 (surface) (component) (vector)

  • tropopause downwelling longwave_flux
  • tropopause net_downward longwave_flux
  • tropopause net_downward shortwave_flux
  • tropopause upwelling shortwave_flux

5 (surface) (medium) (scalar)

  • tropopause air pressure
  • tropopause air temperature

24 (surface) (pre_adjective) (scalar)

  • tropopause adjusted longwave_forcing
  • tropopause adjusted radiative_forcing
  • tropopause adjusted shortwave_forcing
  • tropopause instantaneous longwave_forcing
  • tropopause instantaneous radiative_forcing
  • tropopause instantaneous shortwave_forcing

30 (surface) (scalar)

  • dynamic_tropopause potential_temperature
  • tropopause altitude
Note: See TracTickets for help on using tickets.