While everyone who programs in
PHP has to learn some English eventually to get a handle on its function names
and language constructs, PHP can create applications that speak just about any
language. Some applications need to be used by speakers of many different
languages. Taking an application written for French speakers and making it
useful for German speakers is made easier by PHP's support for
internationalization and localization.
Internationalization (often abbreviated I18N[1]) is the process of taking an application designed for
just one locale and restructuring it so that it can be used in many different
locales. Localization (often abbreviated L10N[2]) is the process of adding support for a new locale to an
internationalized application.
[1] The word "internationalization" has 18 letters between the first "i" and the last "n."
[2] The word "localization" has 10 letters between the first "l" and the "n."
A locale is a group of settings that
describe text formatting and language customs in a particular area of the world.
The settings are divided into six categories:
- LC_COLLATE
- LC_CTYPE
- LC_MONETARY
- LC_NUMERIC
- LC_TIME
- LC_MESSAGES
There is also a metacategory, LC_ALL, that encompasses
all the categories.
A locale name generally has three
components. The first, an abbreviation that indicates a language, is mandatory.
For example, "en" for English or "pt" for Portuguese. Next, after an underscore,
comes an optional country specifier, to distinguish between different countries
that speak different versions of the same language. For example, "en_US" for
U.S. English and "en_GB" for British English, or "pt_BR" for Brazilian
Portuguese and "pt_PT" for Portuguese Portuguese. Last, after a period, comes an optional character-set specifier. For example,
"zh_TW.Big5" for Taiwanese Chinese using the Big5 character set. While most
locale names follow these conventions, some don't. One difficulty in using
locales is that they can be arbitrarily named. Finding and setting a locale is
discussed in Section
16.2 through Section
16.4.
Different techniques are necessary for correct localization of
plain text, dates and times, and currency. Localization can also be applied to
external entities your program uses, such as images and included files.
Localizing these kinds of content is covered in Section
16.5 through Section
16.9.
Systems for dealing with large amounts of localization data are
discussed in Section
16.10 and Section
16.11. Section
16.10 shows some simple ways to manage the data, and Section
16.11 introduces GNU gettext, a full-featured set of tools that
provide localization support.