Internationalization Topics

Collation in Oracle 8i

There are some considerations for handling collation and sorting of UTF-8 encoded data. Oracle supports all linguistic definitions (sorts) for UTF-8 the same way they are supported with corresponding language specific character sets. Each linguistic sort will sensically sort the characters for the given language and treat the remaining Unicode characters as punctuation, essentially ignoring them. To conduct a linguistic sort, select active the sort definition using ALTER SESSION SET NLS_SORT= or specifying the NLS_SORT parameter in NLSSORT function.

An NLSSORT query takes the form:

SELECT value FROM table ORDER BY NLSSORT(value, 'NLS_SORT = LOCALE')

and takes approximately 50% longer to execute than an ASCII binary sorting routine. Please see prepared statements for more information regarding SQL queries.

The NLSSORT function replaces a character string with the equivalent sort string used by the linguistic sort mechanism. For a binary sort, the sort string is the same as the input string. The linguistic sort technique operates by replacing each character string with some other binary values, chosen so that sorting the resulting string produces the desired sorting sequence. When a linguistic sort is being used, NLSSORT returns the binary values that replace the original string.

To handle linguistic indexes for data in multiple languages for VARCHAR2 columns, build a single linguistic index for all languages. This can be accomplished by including a language column (LANG_COL in the example below) that contains NLS_LANGUAGE values for the corresponding column on which the index is built as a parameter to the NLSSORT function.

Please view the complete list of Oracle NLS parameters for more information.

Collation Example

The following example builds a single linguistic index for multiple languages. With this index, the rows with the same values for NLS_LANGUAGE are collated together.

CREATE INDEX i ON t (NLSSORT(col, 'NLS_SORT=' | | lang_col));

Which index to use is based on the argument of the NLSSORT function you specified in the ORDER BY clause.

SELECT col
FROM t
WHERE NLSSORT(col, 'NLS_SORT = '||lang_col) IS NOT NULL ORDER BY lang_col, NLSSORT(col, 'NLS_SORT = '||lang_col);

Note: The IS NOT NULL clause is required for using linguistic indexes. All column names should be in lower case. With single linguistic index the results are grouped by language.