Characters

Characters can also be represented in binary. Characters are usually grouped together in a character set. A character set includes:

alphanumeric data (letters and numbers)
symbols (*, &, : etc.)
control characters (Backspace, Horizontal tab, Escape etc.)

ASCII

ASCII was originally developed for basic computers and printers. It uses a 7-bit code to represent characters.

As more computers began to work with 8-bit groups of data, ASCII was written as 8 bits. The most significant bit was sometimes used as a parity bit to perform a parity check (a form of error checking). Other computers set the most significant bit to 0.

So ASCII represents 128 characters (the equivalent of 7 bits) with 8 bits rather than 256.

For example, the ASCII code for lower case z is 122 and is shown below:

Parity Bit/Eighth Bit	64	32	16	8	4	2	1
0	1	1	1	1	0	1	0

Parity Bit/Eighth Bit
64
32
16
8
4
2
1

0
1
1
1
1
0
1
0

Extended ASCII

Limitation of ASCII

The 128 or 256 character limits of ASCII and Extended ASCII limits the number of character sets that can be held. Representing the character sets for several different language structures is not possible in ASCII, there are just not enough available characters.

Unicode

Unicode is a universal character set. It is aimed to include all the characters needed for any writing system or language.

The first code point positions in Unicode use 16 bits to represent the most commonly used characters in a number of languages. This Basic Multilingual Plane allows for 65,536 characters.

Additional supplementary planes allow around one million other code point positions to be used. As of Version 14.0, released in September 2021, the Unicode Standard contains 144, 697 characters.

Next up

Test your understanding