Characters
Characters can also be represented in binary. Characters are usually grouped together in a character set. A character set includes:
- alphanumeric data (letters and numbers)
- symbols (*, &, : etc.)
- control characters (Backspace, Horizontal tab, Escape etc.)
ASCII
ASCII was originally developed for basic computers and printers. It uses a 7-bit code to represent characters.
As more computers began to work with 8-bit groups of data, ASCII was written as 8 bits. The most significant bit was sometimes used as a parity bit to perform a parity check (a form of error checking). Other computers set the most significant bit to 0.
So ASCII represents 128 characters (the equivalent of 7 bits) with 8 bits rather than 256.
For example, the ASCII code for lower case z is 122 and is shown below:
Parity Bit/Eighth Bit | 64 | 32 | 16 | 8 | 4 | 2 | 1 |
0 | 1 | 1 | 1 | 1 | 0 | 1 | 0 |
Parity Bit/Eighth Bit |
64 |
32 |
16 |
8 |
4 |
2 |
1 |
0 |
1 |
1 |
1 |
1 |
0 |
1 |
0 |
Extended ASCII
It is possible to use the most significant bit of an 8-bit byte to allow ASCII to represent 256 characters. This is known as extended ASCII. There are different versions of extended ASCII in use.
Limitation of ASCII
The 128 or 256 character limits of ASCII and Extended ASCII limits the number of character sets that can be held. Representing the character sets for several different language structures is not possible in ASCII, there are just not enough available characters.
Unicode
Unicode is a universal character set. It is aimed to include all the characters needed for any writing system or language.
The first code point positions in Unicode use 16 bits to represent the most commonly used characters in a number of languages. This Basic Multilingual Plane allows for 65,536 characters.
Additional supplementary planes allow around one million other code point positions to be used. As of Version 14.0, released in September 2021, the Unicode Standard contains 144, 697 characters.