Internationalization Topics

Externalization Strategies

This section describes the various ways to externalize textual elements from the source code by introducing the concept of resource bundles. Java and the IBM ICU package for Java, C and C++ offer elegant localization support for externalizing text into locale-based resource files.

Java

The following discussion of resource-bundle defaulting is based on the Java design:

Resource Bundle Naming Conventions: Resource bundle names have some base name such as ResBnd_ and a suffix indicating the locale to which the resource bundle applies, e.g.:

ResBnd_en_US
ResBnd_en_CA
ResBnd_fr_FR
ResBnd_fr_CA

Design of the Resource-Bundle Directory Structure: Typically there will be one directory in which all resource bundles for an application reside. A program can locate its resources by concatenating the path name of the resource-bundle directory with the locale, e.g.:

<Resource Directory Path Name> + 'ResBnd_' + <locale>

There is a root resource bundle named simply ResBnd_ that normally is a duplicate of the en_US resource bundle. If French is supported, the fr_FR resource bundle is duplicated as the fr resource bundle, while the en_US resource bundle is duplicated as the en resource bundle. As a result the en_US resource bundle will be present as the root, en, and en_US resource bundles. The reason for this will become clear shortly.
The root resource bundle constitutes the base localization kit that is given to localizers for translation.

Resource-Bundle Lookup: With the preceding directory structure and naming conventions the Java resource-bundle lookup services, which ensures that users from supported and non-supported locales receive screen presentations in the most appropriate language, can be utilized.

Java supports a default lookup mechanism for resource files. The Java rules are (from the Sun Java documentation):

The resource bundle lookup searches for classes with various suffixes on the basis of (1) the desired locale and (2) the current default locale as returned by Locale.getDefault(), and (3) the root resource bundle (baseclass), in the following order from lower-level (more specific) to parent-level (less specific):

baseclass + "_" + language1 + "_" + country1 + "_" + variant1
baseclass + "_" + language1 + "_" + country1
baseclass + "_" + language1
baseclass + "_" + language2 + "_" + country2 + "_" + variant2
baseclass + "_" + language2 + "_" + country2
baseclass + "_" + language2
baseclass

For example, if the current default locale is en_US, the locale the caller is interested in is fr_CH, and the resource bundle name is MyResources, resource bundle lookup will search for the following classes, in order:

MyResources_fr_CH
MyResources_fr
MyResources_en_US
MyResources_en
MyResources

Resource Keys: The internal structure of the resource bundle is a list of <key, string> ordered pairs, where the key is a mnemonic identifier for the associated Unicode string. It is recommended that the mnemonic identifier be the module name suffixed by string identifying the string, e.g. myModule_EnterName, which might identify the string EnterName used by myModule.

Resource Bundle Types: Java supports two types of Resource Bundles. Java contains ListResourceBundles, which are class files, and PropertyResourceBundles, which are simple ASCII text files. There are pros and cons to these two different approaches. ListResourceBundles are compiled which means there are no text files sitting around that a user can edit & change GUI displays. However, the system must be recompiled whenever a GUI string is added.

Localization companies deal with PropertyResourceBundles better than ListResourceBundles. Lingoport's experience is that translators inadvertently modify ListResourceBundles in such a way as to cause syntax errors, necessitating a developer. ListResourceBundles can contain text, images, and other Java classes. Consequently, if ListResourceBundles are used, Java will employ the preceding resource-bundle lookup for both text and images. If PropertyResourceBundles are used and images stored as traditional image files, code must be added to support a locale-sensitive file lookup for images and icons.

Additional Resources: More information about Java's resource bundle implementation can be found here.


ICU Resource Bundles in C/C++

The following describes the implementation of resource bundles in ICU.

Resource Bundle Naming Conventions: Resource bundle names have some base name such as ResBnd_ and a suffix indicating the locale to which the resource bundle applies, e.g.:

ResBnd_en_US
ResBnd_en_CA
ResBnd_fr_FR
ResBnd_fr_CA

Design of the Resource-Bundle Directory Structure: Typically there will be one directory in which all resource bundles for an application reside. A program can locate its resources by concatenating the path name of the resource-bundle directory with the locale, e.g.

<Resource Directory Path Name> + 'ResBnd_' + <locale>

There is a root resource bundle named simply ResBnd_ that normally is a duplicate of the en_US resource bundle. If French is supported, the fr_FR resource bundle is duplicated as the fr resource bundle, while the en_US resource bundle is duplicated as the en resource bundle. As a result the en_US resource bundle will be present as the root, en, and en_US resource bundles. The reason for this will become clear shortly.

The root resource bundle constitutes the base localization kit that is given to localizers for translation, if resource bundles are chosen to maintain extracted text.

Resource-Bundle Lookup: With the preceding directory structure and naming conventions the C resource-bundle lookup services, which ensures that users from supported and non-supported locales receive screen presentations in the most appropriate language, can be utilized.

ICU supports a default lookup mechanism for resource files. ICU first looks for the most specific bundle. If this bundle is not found (or the element is not contained within the bundle) then it goes to a more general bundle. For example, if the de_AT_EURO bundle is requested and not found, ICU searches for the de_AT bundle and then searches by de. If de is not found, ICU tries a bundle for the default locale. Hence, the order of progression would be:

language1 + "_" + country1 + "_" + variant1
language1 + "_" + country1
language1
default

Language, country and variant are concatenated to define the bundle name. Additionally, to use a resource bundle within C, it is necessary to include the ures.h file. All of the resources are handled using a UResourceBundle *. To initialize the pointer, you must call the ures_open function. For more treatment of C/C++ resource bundles using ICU, please see the ICU Website.

Resource Keys: The internal structure of the resource bundle is a list of <key, string> ordered pairs, where the key is a mnemonic identifier for the associated Unicode string. It is recommended that the mnemonic identifier be the module name suffixed by string identifying the string, e.g. Registration_EnterName, which might identify the string EnterName used by the Registration module.

ICU cannot utilize resource bundle data in a text form. Instead, ICU uses an efficient, binary form. The main tool for compiling resource bundle data from text into a binary form is called genrb. This tool reads a text resource file and outputs a .res file - a binary form of a resource bundle.