Internationalization Topics

Microsoft Code Pages

A code page is a platform specific encoding of a character set, and can be represented in a table as a mapping of characters to single or multibyte values. Many code pages share the ASCII character set for characters in the range 0x00 - 0x7F .

The Microsoft run-time library uses the following types of code pages:

  • System-default ANSI code page. When an application starts, the run-time system automatically sets the multibyte code page to the operating system's default ANSI code page. To set the locale to the system-default ANSI code page, use the C call:
    setlocale(LC_ALL, "");
  • Locale code page. Many of the C run-time routines are dependent on the current locale setting, which, in turn, is dependent on the locale code page. On application startup, the locale-dependent routines in the Microsoft run-time library use the code page that corresponds to the "C" locale. However, you can change or query the locale code page within your application by calling setlocale .
  • Multibyte code page. In addition to locale-sensitive C run-time functions, Microsoft also supports many multibyte-character functions that are dependent on the application's multibyte code page setting. By default, these routines use the system-default ANSI code page. However, at run-time you can query and change the multibyte code page by calling _getmbcp and _setmbcp , respectively.
  • "C" locale code page. This is the name of the code page that corresponds to the ASCII character set, and is the code page that is used as the C/C++ application's default locale code page.

Multibyte Code Page Functions

Most multibyte-character routines in the Microsoft run-time library recognize multibyte-character sequences according to the current code page setting. This includes the _ismbc routines. The multibyte code page also affects multibyte processing in the following set of routines:

_exec functions _mktemp _stat
_fullpath _spawn functions _tempnam
_makepath _splitpath tmpnam

In addition, all run-time library routines that have multibyte-character argv or envp program arguments (such as the _exec and _spawn families) process these strings according to the multibyte code page. Hence these routines are also affected by a call to _setmbcp that changes the multibyte code page.

See the MSDN Library for more information on the multibyte code page-dependent functions.

Locale Code Page Functions

There are a number of functions that are dependent on the locale code page. As stated above, call setlocale to ensure that the locale is set properly before calling one of these functions.

atof, atoi, atol is functions isleadbyte localeconv MB_CUR_MAX _mbccpy _mbclen mblen _mbstrlen mbstowcs
mbtowc printf functions scanf functions setlocale, _wsetlocale strcoll, wcscoll _stricmp, _wcsicmp, _mbsicmp _stricoll, _wcsicoll _strncoll, _wcsncoll _strnicmp, _wcsnicmp, _mbsnicmp _strnicoll, _wcsnicoll
strftime, wcsftime _strlwr strtod, wcstod, strtol, wcstol, strtoul, wcstoul _strupr strxfrm, wcsxfrm tolower, towlower toupper, towupper wcstombs wctomb _wtoi, _wtol

See the MSDN Library for more details on C locale-dependent functions.

There are also many locale-dependent Win32 functions. See Windows C++ Locale Functions for details.

And for a comprehensive list of Microsoft code page identifiers, click here.