Internationalization and localization tools


Locale-Sensitive Wide Character Length Functions

 

Internationalization (I18n) Issue:

All of these functions operate on wide characters, which is to say they have arguments and/or return values of type wchar_t. These functions often replace single-byte character length functions when migrating to a UTF-16 or UTF-32 internationalized application.

The issue with this particular set of wide functions is that special attention needs to be paid to their size argument.

I18n Solution:

The wchar_t datatype is either 2 bytes in the case of the UTF-16 encoding of the UCS-2 character set, or 4 bytes in the case of the UTF-32 encoding of the UCS-4 character set.

In a single byte environment, the number of bytes and the number of characters in a character string is the same. This is not the case with wchar_t wide character strings, where the number of bytes in a string is either 2 or 4 times larger than the number of characters.

For example, in a single byte application, something like the following would work correctly:

char buffer[16];
memset(buffer, '0', sizeof(buffer)); // single byte only

This code works because the size of buffer (a byte value) is the same as the number of characters that can fit in the array.

Now, suppose the code is modified to use wchar_t as follows:

wchar_t buffer[16];
wmemset(buffer, '0', sizeof(buffer)); // overflow

This code will not work because wmemset is expecting the number of wide characters in the buffer, and instead we are passing the number of bytes with the sizeof operator. This is the type of problem that needs to be considered with this set of functions.

Click on a function for more information:

_wcsncnt

_wcsnset/wcsnset

wcpncpy

wcsncat

wcsncpy

wcsnlen

wcsnlen_l

wmemchr

wmemcmp

wmemcpy

wmemmove

wmempcpy

wmemset

 

 Locale-Sensitive C++ Methods

 

Lingoport internationalization and localization services and software