Single-Byte Character Manipulation Functions
Related Links
Link to Wide Character Manipulation Functions.
Link to Multibyte Character Manipulation Functions.
Link to Windows Generic Character Manipulation Functions.
Internationalization (I18n) Issue:
This category of potentially locale-sensitive functions operates or manipulate
characters that are 7-bit or 8-bit ASCII characters.
I18n Solution:
The appropriate wide-character or multibyte
equivalent function should be used within the internationalized
application. In the case of a Windows Generic application, call the equivalent
generic function, and use the _MBCS or _UNICODE
define to map to the correct multibyte or wide-character function.
I18n Discussion:
Character testing and conversion functions
Expanded ctype functions can handle only 256 bytes
because they are limited to taking a value that can be represented
as an unsigned char as input. This is despite the fact
that the functions actually take and return an int,
which is always at least 16 if not 32 bits. This causes these functions
to only work with single-byte code sets.
Multibyte characters require either specific multibyte functions,
or conversion to wide-characters to use the wide-character functions.
Character I/O
The single-byte character string input/output functions generally work for multibyte-character strings since
this encoding method allows a single null byte to terminate the string. These functions will not work
for wide-character strings because their code characters may include all-zero octets. In the
case of wide-character strings (i.e. based on wchar_t characters), use the
wide-character input/output functions.
As for single-character input/output functions, although they require a single byte character argument,
they can be called multiple times to output a multibyte character. This works for both Windows MBCS
characters, which are either 1 or 2 bytes per character, and ANSI UTF-8 platforms, where a character
can occupy 1 to 6 bytes.
Special consideration needs to be given Windows MBCS applications, and Windows Unicode applications
that are running on older Windows OS systems (i.e. Win 95/98/Me) that do not support
UTF-16 Unicode as the system's native encoding. In the case of a Windows MBCS application,
the system's multibyte code page will be used to either directly utilize the multibyte string (when
running on a non-Unicode system), or to convert the application's multibyte string
to a UTF-16 Unicode string before using it (on a Unicode system). In the case of a Unicode
application running on a non-Unicode system, the OS will use the system's multibyte code page
to convert the application's UTF-16 Unicode string to a multibyte string prior to use. In order
for strings to be correctly accessed in these scenarios, the application's UI language must be in
agreement with the system's multibyte code page; otherwise, characters may be lost in the
conversions.
There is no issue for a Windows Unicode application running on a later version of Windows (NT/2K/XP)
where the native encoding is UTF-16 Unicode. Using the wide character
functions will correctly input and output the wide characters.
See File I/O for information on reading and writing non-ASCII data
to files and streams.
Click on a function for more information:
_cgets
_cputs
ecvt/_ecvt
ecvt_r
fcvt/_fcvt
fcvt_r
fgetc
fgetc_unlocked
fgetchar/_fgetchar
fgets
fgets_unlocked
fputc
fputc_unlocked
fputchar/_fputchar
fputs
fputs_unlocked
gcvt/_gcvt
_gcvt_s
_getch
_getche
getc
getc_unlocked
getchar
getchar_unlocked
getdelim
getline
gets
isalnum
isalpha
isascii/__isascii
isblank
iscntrl
iscsym/__iscsym
iscsymf/__iscsymf
isdigit
isgraph
isleadbyte
islower
isprint
ispunct
isspace
isupper
isxdigit
_i64toa
itoa/_itoa
_itoa_s
ltoa/_ltoa
_ltoa_s
putc
putc_unlocked
_putch
putchar
putchar_unlocked
puts
qecvt
qecvt_r
qfcvt
qfcvt_r
qgcvt
strfry
__toascii
tolower/_tolower
toupper/_toupper
_ui64toa
_ui64toa_s
ultoa/_ultoa
_ultoa_s
ungetc
_ungetch
Locale-Sensitive C++ Methods

|