Changes between Initial Version and Version 3 of Ticket #159

03/02/17 09:09:23 (3 years ago)


  • Ticket #159 – Description

    initial v3  
    22I propose that we append this paragraph to the end of CF section 2.2:
    4   All char and string variables must include a charset attribute to
    5   identify the character set (encoding) used by the variable. The
    6   value of the attribute must be the "Preferred MIME Name" or "Name"
    7   of one of the 8-bit encodings (so not UTF-16 or UTF-32, since CF
    8   chars are 8-bits) listed at
    9 .
    10   Charset names are case-insensitive.
    11   The only recommended charset names are "ISO-8859-1" (which is
    12   useful for European languages and for backwards compatibility
    13   with 7-bit ASCII characters) and "UTF-8" (which is useful when
    14   full Unicode is needed). (In older files with variables that
    15   don't specify a charset, the character set being used remains
    16   ambiguous.)
     4  Each char array variable that is to be interpreted
     5  as an array of individual characters (not string(s))
     6  must have a "charset" attribute which
     7  clarifies that the variable is to be interpreted as
     8  individual characters (not string(s)) and specifies
     9  the 8-bit character set used by the chars.
     10  Currently, the only values allowed for "charset"
     11  are "ISO-8859-1" and "ISO-8859-15".
     12  A scalar char variable may also use the "charset"
     13  attribute, which defaults to "ISO-8859-15" if
     14  it is not specified.
     16  A string or string array variable (including a char
     17  array variable that is to be interpreted as a string
     18  or array of strings) may have an "_Encoding" attribute.
     19  Alternatively, a file may have a global "_Encoding"
     20  attribute which applies to all strings (scalar and
     21  array) in the file. Currently, the only values
     22  allowed for "_Encoding" are "ISO-8859-1",
     23  "ISO-8859-15" and "UTF-8". A missing "_Encoding"
     24  attribute defaults to UTF-8.
     26(This 2017-03-02 version is the consensus revised proposal from Chris Barker, Heiko Klein, and Bob Simons. This replaces the original proposed text.)