【Engineering】Encode

Posted by 西维蜀黍 on 2019-09-20, Last Modified on 2024-05-02

ASCII is a simple form of Unicode text, but just one of many possible encodings and alphabets. Text from non-English-speaking sources may use very different letters, and may be encoded very differently when stored in files.

ASCII

ASCII, abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Most modern character-encoding schemes are based on ASCII, although they support many additional characters.

Unicode(字符集)

Unicode is a computing industry standard for the consistent encoding, representation, and handling of text expressed in most of the world’s writing systems.

The standard is maintained by the Unicode Consortium, and as of May 2019 the most recent version, Unicode 12.1, contains a repertoire of 137,994 characters covering 150 modern and historic scripts, as well as multiple symbol sets and emoji. The character repertoire of the Unicode Standard is synchronized with ISO/IEC 10646, and both are code-for-code identical.

Character Encodings(字符编码方式) - UTF-8, UTF-16, and UTF-32

Unicode can be implemented by different character encodings. The Unicode standard defines UTF-8, UTF-16, and UTF-32, and several other encodings are in use.

The most commonly used encodings are UTF-8, UTF-16, and UCS-2 (without full support for Unicode), a precursor of UTF-16; GB18030 is standardized in China and implements Unicode fully, while not an official Unicode standard.

Reference

  • Learning Python
  • Wikipedia