ASCII


Imagine you have some text that you want inside ~a computer~, such as the following:

aaAH! computers!

This is complicated by the fact that computers aren't very smart and don't know what letters are.


All digital information must eventually be stored as a series of 0s and 1s. This is known as binary, with each 0 or 1 being a "bit." In order to store and display text, we have to somehow convert all of our complicated human symbols into binary.

We call this process encoding. Roughly, we may say encoding is "taking information stored in one format, and turning it into another." There are essentially an infinite number of ways to do that, and computer scientists spend a lot of time arguing about them.


It's important to recognize that there's a somewhat fundamental obstacle with representing things in binary—there are no spaces to separate things! Say for instance you started writing out the alphabet like this, counting up in binary:

LetterCode
a0
b1
c10
......

This starts to work, for instance if we just see a 1 that's clearly b, and just a 0 is a, but what about 10? Does that mean ba or c? We aren't allowed to do something like 1 0 10 = bac, it has to all be in one long line like 1010.


This introduces us to two fundamental questions any text encoding must answer:


ASCII* is a simple and widely recognized English text* encoding that fits these requirements. It's old and has some historic quirks, but for now we'll pretend it's perfect.

It solves our problem in the most straightforward way: just make everything the exact same size. It turns out that if we use 7 bits per character, we can represent all uppercase and lowercase letters, along with some important extras like spaces, punctuation, and special characters.

For ~computer science reasons~*, we usually add another 0 to the beginning to bring this up to a nice even 8 bits. So, to find where one letter begins and another ends in our long string of binary, we can simply go forward and backward 8 bits and we'll always land on a clean boundary.

LetterCode
a01100001
b01100010
c01100011
......

So, using ASCII, our text from earlier becomes:


=

or written out a little more clearly,

That's it! We've successfully taken some text and converted it to binary using ASCII encoding. You're now ready to move on to Huffman Coding, which packs everything a bit tighter. You can also play around with ASCII by changing the text displayed above: