Internationalization and localization tools


Culture-Sensitive C# Class

using System.Text

public UTF8Encoding();

Internationalization (I18n) Class Overview

This class encodes Unicode characters using UCS Transformation Format, 8-bit form (UTF-8). This encoding supports all Unicode character values and surrogates.

For more information see Microsoft's MSDN online documentation. Also, see specific MSDN documentation on Encoding Properties.

I18n Issues

Use of this class probably does not pose an I18n problem. Globalyzer detects it by default because during the internationalization process it is important that you are aware of all of the places in your code where you are performing character encoding conversions. Further, UTF-8 is different from the form of Unicode used by C# internally (UTF-16, a two-byte Unicode encoding). UTF-8 is the recommended character encoding to use for multilingual web pages.

UTF-8 encodes Unicode characters with a variable number of bytes per character. This encoding is optimized for the lower 127 ASCII characters, yielding an efficient mechanism to encode English in an international way. The UTF-8 identifier is the Unicode byte order mark, hexadecimal 0xFEFF, which is represented in UTF-8 as hexadecimal 0xEF 0xBB 0xBF. The byte order mark is used to distinguish UTF-8 text from other encodings.

If, once you have examined a particular instantiation of the UTF8Encoding class, you determine that it does not pose I18n problems, you can use Globalyzer's Ignore Comment functionality to ensure that it isn't picked up in a subsequent scan.

Usage Example

UTF8Encoding utf8 = new UTF8Encoding();
Char[] chars = new Char[] {'a', 'b', 'c',
   '\uD869', '\uDED6', 'd'};
Byte[] bytes = utf8.GetBytes(chars);

C# Encoding Information

 

Lingoport internationalization and localization services and software