To display an HTML page correctly, the browser must know what character-set to use.
The character-set for the early world wide web was ASCII. ASCII supports the numbers from 0-9, the uppercase and lowercase English alphabet, and some special characters.
The Unicode ConsortiumThe Unicode Consortium develops the Unicode Standard. Their goal is to replace the existing character-sets with its standard Unicode Transformation Format (UTF).
The Unicode Consortium cooperates with the leading standards development organizations, like ISO, W3C, and ECMA.
Unicode can be implemented by different character-sets. The most commonly used encodings are UTF-8 and UTF-16:
|UTF-8||A character in UTF8 can be from 1 to 4 bytes long. UTF-8 can represent any character in the Unicode standard. UTF-8 is backwards compatible with ASCII. UTF-8 is the preferred encoding for e-mail and web pages|
|UTF-16||16-bit Unicode Transformation Format is a variable-length character encoding for Unicode, capable of encoding the entire Unicode repertoire. UTF-16 is used in major operating systems and environments, like Microsoft Windows 2000/XP/2003/Vista/CE and the Java and .NET byte code environments|
Tip: All HTML 4 processors already support UTF-8, and all XHTML and XML processors support UTF-8 and UTF-16!