Awful

honggarae 07/11/2022 453

Introduction

To enable your computer to support more languages, 1 bytes of 0x80 to 0xFFFF range are usually used to represent 1 characters. For example: Chinese characters' in the Chinese operating system, use [0xD6, 0xD0] to store two bytes.

Awful

Different countries and regions have developed different standards, thereby generating the respective coding criteria of GB2312, GBK, GB18030, BIG5, SHIFT_JIS. These use multiple bytes to represent various Chinese character extension coding methods of a character, called ANSI encoding. In the Simplified Chinese Windows operating system, ANSI encoding represents GB2312 encoding; in Traditional Chinese Windows operating system, ANSI encoding represents BIG5; in Japanese Windows operating system, ANSI encoding represents JIS encoding.

Different ANSI codes are not compatible, when information is exchanged internationally, the text that belongs to two languages ​​is stored in the same paragraph ANSI encoded text. ANSI encoding indicates a byte when a character is used, indicating Chinese with two or four bytes.

ANSI encoding As the multi-character encoding format in China and some Asia Pacific, both Windows systems and OSX are native support. But even so, many foreign developers still ignore the ANSI encoding when developing notes or text, and only the global UTF-8 encoding is added.

Other character code

There are three types of text encoders in actual applications: ASCII, ANSI, and Unicode, where the ASCII code is the last two are most commonly used. Base.

ASCII code

The basis of the text encoding method is the ASCII code, which is a 7-bit coding standard, including 26 lowercase letters, 26 uppercase letters, 10 numbers, 32 Symbol, 33 control code and a space for a total of 128 pieces. Since the computer usually stores and exchanges and exchanges data information, many computer manufacturers have expanded the ASCII code, and 128 additional characters, such as ANSI, Unicode, and other character sets on the basis.

Unicode

For English, the ASCII code is sufficient to encode all characters, but for Chinese, two bytes must be used to represent a Chinese character, which means the way Chinese characters Customs is called double bytes. Although the double-byte can solve the mixed use of Chinese and English characters, it is necessary to transform through the character code, such as the mix of Chinese, Sino-Japanese and Japanese and Korean for different character systems. In order to solve this problem, many companies are united to develop a set of characters that can be applied to all countries around the world, whether oriental text or Western text, one byte by two bytes, this is Unicode.

Latest: Tianjin Bohai Chemical Workers College

Next: City Dam Site