|
East Asian languages such as Japanese, Chinese, and Korean are classified as double-byte character sets (DBCS), which means it takes two bytes to represent a single character as opposed to one byte for European languages. Although Roman characters can be easily represented in 128 characters using 7 bit ASCII, East Asian languages such as Chinese, Japanese and Korean require 16 bits to represent roughly 32,000 double-byte characters.
Some of the main tasks necessary to DBCS engineer English products include:
DBCS enabling (file name support, dialog box enabling, etc.)
Cultural adaptations (language specific sort and search, IME/FEP enabling, etc.)
Date/Time reformatting
Help files, documentation, and UI translation
If your source code is parsing a pathname string using byte-aligned pointer movements, then chances are your code won't recognize DBCS characters correctly. Proper usage of DBCS-capable API such as CharPrev and CharNext will eliminate this problem.
English sort engines won't work for DBCS languages since East Asian languages are based mostly on phonetics. Also, in some Japanese sentences there are mixes of Hankaku (8 bit) and Zenkaku (16 bit) phonetic characters so your sort routine must be able to handle this environment as well.
|