WebOS QuickInstall Under Linux
gnome 声音的几个小细节

libc 之 locales

tubo posted @ 2014年9月03日 00:07 in 未分类 , 839 阅读


1 Locales

软件的国际化,意味着使软件符合用户的习惯。 ISO C 中,通过 locale 来实现这一目的。

每一台机器可以支持多个 locales , 用户可以通过环境变量来设置程序将要使用的 locale.

1.1 Locale 的作用

每个 locale 均由若干为不同目的而定义的规范构成。 这些规范包括:

  • 什么样的宽字符序列是合法的,以及如何来解释他们。
  • 如何对字符进行分类。
  • 本地语言和字符的对照表。
  • 如何格式化数字的显示。
  • 输出以及错误提示使用何种语言。
  • 使用何种语言来回答 yes-or-no questions。
  • 使用何种语言来应对复杂的用户输入。

1.2 Locale 的选择

选择 (设置) Locale 的最简方法是设置环境变量: LANG , 该方法将会选择这个 locale 的所有规范。例如:

[yyc@localhost ~]$ locale

同时,我们也可以单独设置一个 locale 中的某个单独的规范, 例如早期的 fcitx (Linux 下的中文输入法), 要求 LC_CTYPE 必须为 GB2312 , 则可以进行如下设置:

[yyc@localhost ~]$ export LC_CTYPE="zh_CN.GB2312"
[yyc@localhost ~]$ locale

一个系统不一定支持所有的 locales , 但所有的系统都需要支持一个标准的 Locale —— "C" 或者 "POSIX" 。

1.3 Locales 影响到的 Activities 的类别

locale 定义的规范可以分为若干类别,这些类别如下, 其中,每个类别的名字既可以作为环境变量名而在环境变量中找到, 也可以作为宏名在函数 setlocale 中作为参数。












    影响用户接口中消息中使用的语言及用于匹配 yes-or-no questions 答案的正则表达式。

  • LC_ALL

    该符号并非环境变量,用在 setlocale() 中,用于设置上述所有的类别。

  • LANG

    如果设置了该环境变量,则该环境变量的值会影响上述所有的类别, 除非用户又显示地、重新设置了上述类别中的某一个。

1.4 Locale 的设置

由 C Family 编写的应用程序启动时可以自动继承通过环境变量设置的 locale , 但这种继承仅限于应用程序本身,对应用程序所使用的库不起作用 —— 这些库提供的函数将默认使用标准库中的 C Locale 。

我们可以通过 setlocale() 来通知库函数使用由环境变量指定的 locale:

setlocale(LC_ALL, "");

setlocale() 还可以用来指定 locale 中的某个单独的规范:

char * setlocale (int CATEGORY, const char *LOCALE);

该函数用于将当前 Locale 中的 CATEGORY 设置为 LOCALE 。

  • 如果 *LOCALE 为 NULL, 则返回当前使用的 LOCALE;
  • 如果 *LOCALE 不为 NULL且合法, 则返回当设置成功后使用的 LOCALE;
  • 如果 *LOCALE 不为 NULL且不合法, 则当前 locale 不变,函数返回 NULL。

1.5 标准 Locales

前面提到,并非所有的系统都支持所有的 locales , 但是所有的系统都必须支持若干标准的 locales, 这些标准 Locales 包括:

  • C:
    由标准 C 指定的 locale , 其属性和行为均符合 ISO C 标准。
  • POSIX:
    POSIX locale,Linux 下的 POSIX locale 当前与 C 完全一样。
  • ""
    空 locale ,使用该 locale 的程序会自动使用环境变量中规定的 locale 。

    locales 的定义和安装通常是由系统管理员完成的。

1.6 Locale 信息的获取

有多种方式可以用于获取 locale 信息, 其中最简单的方法是让 C library 自己去获取, 很多 Library 都可以这样去做。 以 strftime() 为例,同样的代码,在不同的 locale 下,输出会随 locale 而变。

但 有时程序无法自动完成 locale 信息的获取, 此时我们足要自己去做。 用来完成这个目的的函数有两个 localeconv() 和 nl_langinfo() 。 其中,前者是 标准C 提供的,可移植性好,但借口超烂。后者是 Unix 接口, 只要系统遵循 Unix 标准,就可以使用。

1.6.1 蹩脚的 localeconv

localeconv() 同 setlocale() 一样,是由标准 C 提供的,可移植, 但使用代价昂贵,可拓展性差。并且,它接提供了访问 locale 中的 LC_MONETARY 和 LC_NUMERIC , 通用性差。

localeconv() 原型为:

struct lconv * localeconv (void);

该函数返回一个 lconv 结构的指针, lconv 结构中的元素包含了如何在当前 locale 中格式化输出数字和货币的一些信息。 Glibc 中,其定义如下:

/* Structure giving information about numeric and monetary notation.  */
struct lconv
  /* Numeric (non-monetary) information.  */

  char *decimal_point;      /* Decimal point character.  */
  char *thousands_sep;      /* Thousands separator.  */
  /* Each element is the number of digits in each group;
     elements with higher indices are farther left.
     An element with value CHAR_MAX means that no further grouping is done.
     An element with value 0 means that the previous element is used
     for all groups farther left.  */
  char *grouping;

  /* Monetary information.  */

  /* First three chars are a currency symbol from ISO 4217.
     Fourth char is the separator.  Fifth char is '\0'.  */
  char *int_curr_symbol;
  char *currency_symbol;    /* Local currency symbol.  */
  char *mon_decimal_point;  /* Decimal point character.  */
  char *mon_thousands_sep;  /* Thousands separator.  */
  char *mon_grouping;       /* Like `grouping' element (above).  */
  char *positive_sign;      /* Sign for positive values.  */
  char *negative_sign;      /* Sign for negative values.  */
  char int_frac_digits;     /* Int'l fractional digits.  */
  char frac_digits;     /* Local fractional digits.  */
  /* 1 if currency_symbol precedes a positive value, 0 if succeeds.  */
  char p_cs_precedes;
  /* 1 iff a space separates currency_symbol from a positive value.  */
  char p_sep_by_space;
  /* 1 if currency_symbol precedes a negative value, 0 if succeeds.  */
  char n_cs_precedes;
  /* 1 iff a space separates currency_symbol from a negative value.  */
  char n_sep_by_space;
  /* Positive and negative sign positions:
     0 Parentheses surround the quantity and currency_symbol.
     1 The sign string precedes the quantity and currency_symbol.
     2 The sign string follows the quantity and currency_symbol.
     3 The sign string immediately precedes the currency_symbol.
     4 The sign string immediately follows the currency_symbol.  */
  char p_sign_posn;
  char n_sign_posn;
#ifdef __USE_ISOC99
  /* 1 if int_curr_symbol precedes a positive value, 0 if succeeds.  */
  char int_p_cs_precedes;
  /* 1 iff a space separates int_curr_symbol from a positive value.  */
  char int_p_sep_by_space;
  /* 1 if int_curr_symbol precedes a negative value, 0 if succeeds.  */
  char int_n_cs_precedes;
  /* 1 iff a space separates int_curr_symbol from a negative value.  */
  char int_n_sep_by_space;
  /* Positive and negative sign positions:
     0 Parentheses surround the quantity and int_curr_symbol.
     1 The sign string precedes the quantity and int_curr_symbol.
     2 The sign string follows the quantity and int_curr_symbol.
     3 The sign string immediately precedes the int_curr_symbol.
     4 The sign string immediately follows the int_curr_symbol.  */
  char int_p_sign_posn;
  char int_n_sign_posn;
  char __int_p_cs_precedes;
  char __int_p_sep_by_space;
  char __int_n_cs_precedes;
  char __int_n_sep_by_space;
  char __int_p_sign_posn;
  char __int_n_sign_posn;


1.6.2 优雅、迅捷的 nl_langinfo

char *nl_langinfo(ln_item ITEM);

nl_langinfo() 用于访问 locale 中的细节,粒度细,速度快。 其中, ITEM 定义在头文件 langinfo.h 中,解释如下:

      `nl_langinfo' returns a string with the name of the coded
      character set used in the selected locale.

      `nl_langinfo' returns the abbreviated weekday name.  `ABDAY_1'
      corresponds to Sunday.

      Similar to `ABDAY_1' etc., but here the return value is the
      unabbreviated weekday name.

      The return value is abbreviated name of the month.  `ABMON_1'
      corresponds to January.

      Similar to `ABMON_1' etc., but here the month names are not
      abbreviated.  Here the first value `MON_1' also corresponds
      to January.

      The return values are strings which can be used in the
      representation of time as an hour from 1 to 12 plus an am/pm

      Note that in locales which do not use this time representation
      these strings might be empty, in which case the am/pm format
      cannot be used at all.

      The return value can be used as a format string for
      `strftime' to represent time and date in a locale-specific

      The return value can be used as a format string for
      `strftime' to represent a date in a locale-specific way.

      The return value can be used as a format string for
      `strftime' to represent time in a locale-specific way.

      The return value can be used as a format string for
      `strftime' to represent time in the am/pm format.

      Note that if the am/pm format does not make any sense for the
      selected locale, the return value might be the same as the
      one for `T_FMT'.

      The return value represents the era used in the current

      Most locales do not define this value.  An example of a
      locale which does define this value is the Japanese one.  In
      Japan, the traditional representation of dates includes the
      name of the era corresponding to the then-emperor's reign.

      Normally it should not be necessary to use this value
      directly.  Specifying the `E' modifier in their format
      strings causes the `strftime' functions to use this
      information.  The format of the returned string is not
      specified, and therefore you should not assume knowledge of
      it on different systems.

      The return value gives the year in the relevant era of the
      locale.  As for `ERA' it should not be necessary to use this
      value directly.

      This return value can be used as a format string for
      `strftime' to represent dates and times in a locale-specific
      era-based way.

      This return value can be used as a format string for
      `strftime' to represent a date in a locale-specific era-based

      This return value can be used as a format string for
      `strftime' to represent time in a locale-specific era-based

      The return value is a representation of up to 100 values used
      to represent the values 0 to 99.  As for `ERA' this value is
      not intended to be used directly, but instead indirectly
      through the `strftime' function.  When the modifier `O' is
      used in a format which would otherwise use numerals to
      represent hours, minutes, seconds, weekdays, months, or
      weeks, the appropriate value for the locale is used instead.

      The same as the value returned by `localeconv' in the
      `int_curr_symbol' element of the `struct lconv'.

      The same as the value returned by `localeconv' in the
      `currency_symbol' element of the `struct lconv'.

      `CRNCYSTR' is a deprecated alias still required by Unix98.

      The same as the value returned by `localeconv' in the
      `mon_decimal_point' element of the `struct lconv'.

      The same as the value returned by `localeconv' in the
      `mon_thousands_sep' element of the `struct lconv'.

      The same as the value returned by `localeconv' in the
      `mon_grouping' element of the `struct lconv'.

      The same as the value returned by `localeconv' in the
      `positive_sign' element of the `struct lconv'.

      The same as the value returned by `localeconv' in the
      `negative_sign' element of the `struct lconv'.

      The same as the value returned by `localeconv' in the
      `int_frac_digits' element of the `struct lconv'.

      The same as the value returned by `localeconv' in the
      `frac_digits' element of the `struct lconv'.

      The same as the value returned by `localeconv' in the
      `p_cs_precedes' element of the `struct lconv'.

      The same as the value returned by `localeconv' in the
      `p_sep_by_space' element of the `struct lconv'.

      The same as the value returned by `localeconv' in the
      `n_cs_precedes' element of the `struct lconv'.

      The same as the value returned by `localeconv' in the
      `n_sep_by_space' element of the `struct lconv'.

      The same as the value returned by `localeconv' in the
      `p_sign_posn' element of the `struct lconv'.

      The same as the value returned by `localeconv' in the
      `n_sign_posn' element of the `struct lconv'.

      The same as the value returned by `localeconv' in the
      `int_p_cs_precedes' element of the `struct lconv'.

      The same as the value returned by `localeconv' in the
      `int_p_sep_by_space' element of the `struct lconv'.

      The same as the value returned by `localeconv' in the
      `int_n_cs_precedes' element of the `struct lconv'.

      The same as the value returned by `localeconv' in the
      `int_n_sep_by_space' element of the `struct lconv'.

      The same as the value returned by `localeconv' in the
      `int_p_sign_posn' element of the `struct lconv'.

      The same as the value returned by `localeconv' in the
      `int_n_sign_posn' element of the `struct lconv'.

      The same as the value returned by `localeconv' in the
      `decimal_point' element of the `struct lconv'.

      The name `RADIXCHAR' is a deprecated alias still used in

      The same as the value returned by `localeconv' in the
      `thousands_sep' element of the `struct lconv'.

      The name `THOUSEP' is a deprecated alias still used in Unix98.

      The same as the value returned by `localeconv' in the
      `grouping' element of the `struct lconv'.

      The return value is a regular expression which can be used
      with the `regex' function to recognize a positive response to
      a yes/no question.  The GNU C library provides the `rpmatch'
      function for easier handling in applications.

      The return value is a regular expression which can be used
      with the `regex' function to recognize a negative response to
      a yes/no question.

      The return value is a locale-specific translation of the
      positive response to a yes/no question.

      Using this value is deprecated since it is a very special
      case of message translation, and is better handled by the
      message translation functions (*note Message Translation::).

      The use of this symbol is deprecated.  Instead message
      translation should be used.

      The return value is a locale-specific translation of the
      negative response to a yes/no question.  What is said for
      `YESSTR' is also true here.

      The use of this symbol is deprecated.  Instead message
      translation should be used.

登录 *

loading captcha image...
or Ctrl+Enter