The Asian character sets that we support include Chinese,
          Japanese, Korean, and Thai. These can be complicated. For
          example, the Chinese sets must allow for thousands of
          different characters. See Section 9.1.12.7.1, “The cp932 Character Set”, for
          additional information about the cp932 and
          sjis character sets.
        
              big5 (Big5 Traditional Chinese)
              collations:
            
                  big5_bin
                
                  big5_chinese_ci (default)
                
              cp932 (SJIS for Windows Japanese)
              collations:
            
                  cp932_bin
                
                  cp932_japanese_ci (default)
                
              eucjpms (UJIS for Windows Japanese)
              collations:
            
                  eucjpms_bin
                
                  eucjpms_japanese_ci (default)
                
              euckr (EUC-KR Korean) collations:
            
                  euckr_bin
                
                  euckr_korean_ci (default)
                
              gb2312 (GB2312 Simplified Chinese)
              collations:
            
                  gb2312_bin
                
                  gb2312_chinese_ci (default)
                
              gbk (GBK Simplified Chinese)
              collations:
            
                  gbk_bin
                
                  gbk_chinese_ci (default)
                
              sjis (Shift-JIS Japanese) collations:
            
                  sjis_bin
                
                  sjis_japanese_ci (default)
                
              tis620 (TIS620 Thai) collations:
            
                  tis620_bin
                
                  tis620_thai_ci (default)
                
              ujis (EUC-JP Japanese) collations:
            
                  ujis_bin
                
                  ujis_japanese_ci (default)
                
          The big5_chinese_ci collation sorts on
          number of strokes.
        
For additional information about Asian collations in MySQL, see Collation-Charts.Org (big5, cp932, eucjpms, euckr, gb2312, gbk, sjis, tis620, ujis).
            Why is cp932
            needed?
          
            In MySQL, the sjis character set
            corresponds to the Shift_JIS character
            set defined by IANA, which supports JIS X0201 and JIS X0208
            characters. (See
            http://www.iana.org/assignments/character-sets.)
          
            However, the meaning of “SHIFT JIS” as a
            descriptive term has become very vague and it often includes
            the extensions to Shift_JIS that are
            defined by various vendors.
          
            For example, “SHIFT JIS” used in Japanese
            Windows environments is a Microsoft extension of
            Shift_JIS and its exact name is
            Microsoft Windows Codepage : 932 or
            cp932. In addition to the characters
            supported by Shift_JIS,
            cp932 supports extension characters such
            as NEC special characters, NEC selected — IBM extended
            characters, and IBM extended characters.
          
Since MySQL 4.1, many Japanese users have experienced problems using these extension characters. These problems stem from the following factors:
MySQL automatically converts character sets.
                Character sets are converted via Unicode
                (ucs2).
              
                The sjis character set does not
                support the conversion of these extension characters.
              
There are several conversion rules from so-called “SHIFT JIS” to Unicode, and some characters are converted to Unicode differently depending on the conversion rule. MySQL supports only one of these rules (described later).
            The MySQL cp932 character set is designed
            to solve these problems. It is available as of MySQL 4.1.12.
          
            Before MySQL 4.1, it was safe to use any version of
            “SHIFT JIS” in conjunction with the
            sjis character set. However, because
            MySQL supports character set conversion beginning with 4.1,
            it is important to separate IANA
            Shift_JIS and cp932
            into two different character sets because they provide
            different conversion rules.
          
            How does cp932
            differ from sjis?
          
            The cp932 character set differs from
            sjis in the following ways:
          
                cp932 supports NEC special
                characters, NEC selected — IBM extended
                characters, and IBM selected characters.
              
                Some cp932 characters have two
                different code points, both of which convert to the same
                Unicode code point. When converting from Unicode back to
                cp932, one of the code points must be
                selected. For this “round trip conversion,”
                the rule recommended by Microsoft is used. (See
                http://support.microsoft.com/kb/170559/EN-US/.)
              
The conversion rule works like this:
If the character is in both JIS X 0208 and NEC special characters, use the code point of JIS X 0208.
If the character is in both NEC special characters and IBM selected characters, use the code point of NEC special characters.
If the character is in both IBM selected characters and NEC selected — IBM extended characters, use the code point of IBM extended characters.
                The table shown at
                http://www.microsoft.com/globaldev/reference/dbcs/932.htm
                provides information about the Unicode values of
                cp932 characters. For
                cp932 table entries with characters
                under which a four-digit number appears, the number
                represents the corresponding Unicode
                (ucs2) encoding. For table entries
                with an underlined two-digit value appears, there is a
                range of cp932 character values that
                begin with those two digits. Clicking such a table entry
                takes you to a page that displays the Unicode value for
                each of the cp932 characters that
                begin with those digits.
              
The following links are of special interest. They correspond to the encodings for the following sets of characters:
NEC special characters:
http://www.microsoft.com/globaldev/reference/dbcs/932/932_87.htm
NEC selected — IBM extended characters:
http://www.microsoft.com/globaldev/reference/dbcs/932/932_ED.htm http://www.microsoft.com/globaldev/reference/dbcs/932/932_EE.htm
IBM selected characters:
http://www.microsoft.com/globaldev/reference/dbcs/932/932_FA.htm http://www.microsoft.com/globaldev/reference/dbcs/932/932_FB.htm http://www.microsoft.com/globaldev/reference/dbcs/932/932_FC.htm
            For some characters, conversion to and from
            ucs2 is different for
            sjis and cp932. The
            following tables illustrate these differences.
          
            Conversion to ucs2:
          
| sjis/cp932Value | sjis->ucs2Conversion | cp932->ucs2Conversion | 
| 5C | 005C | 005C | 
| 7E | 007E | 007E | 
| 815C | 2015 | 2015 | 
| 815F | 005C | FF3C | 
| 8160 | 301C | FF5E | 
| 8161 | 2016 | 2225 | 
| 817C | 2212 | FF0D | 
| 8191 | 00A2 | FFE0 | 
| 8192 | 00A3 | FFE1 | 
| 81CA | 00AC | FFE2 | 
            Conversion from ucs2:
          
| ucs2value | ucs2->sjisConversion | ucs2->cp932Conversion | 
| 005C | 815F | 5C | 
| 007E | 7E | 7E | 
| 00A2 | 8191 | 3F | 
| 00A3 | 8192 | 3F | 
| 00AC | 81CA | 3F | 
| 2015 | 815C | 815C | 
| 2016 | 8161 | 3F | 
| 2212 | 817C | 3F | 
| 2225 | 3F | 8161 | 
| 301C | 8160 | 3F | 
| FF0D | 3F | 817C | 
| FF3C | 3F | 815F | 
| FF5E | 3F | 8160 | 
| FFE0 | 3F | 8191 | 
| FFE1 | 3F | 8192 | 
| FFE2 | 3F | 81CA | 
            Users of any Japanese character sets should be aware that
            using
            --character-set-client-handshake
            (or
            --skip-character-set-client-handshake)
            has an important effect. See
            Section 5.1.2, “Server Command Options”.
          


User Comments
As of MySQL 4.1.14,
Please notice that for Traditional Chinese (BIG5), collation 'big5_chinese_ci' uses stroke count of the characters on ordering; while in Simplified Chinese (GB2312), collation 'gb2312_chinese_ci' uses Pinyin of the characters on ordering.
Add your own comment.