Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

1Learning Outcomes

2Digital data

Data live all around us.

2.1Example: Storing data as digital

The real world is analog—everything you hear and see and smell is all analog. For example, real numbers are a great way to represent the world, but in order for us to use a computer to work with these numbers, we typically need to convert or find equivalent numbers that can be represented digitally.

"Four sequential line graphs demonstrate the process of converting an original analog wave into a digital representation through time discretization and amplitude quantization. The diagrams illustrate how continuous signals are sampled at specific intervals and mapped to discrete values to create a final digital data set."

Figure 1:Translating an analog signal to a digital representation.

In order to convert analog data to digital data, we must do two things:

  1. Sample: We ask the signal at every time step: “What’s your value?” This usually occurs at a regular interval. For example, for music on CDs, that’s 44,100 times a second we’re asking it what its height is.

  2. Quantize: Because the height might come out at some fractional number, we need to divide it up in its amplitude using a “yardstick.” We divide it up into a 16-bit number, which is 216=65,5362^{16} = 65,536 possible tick marks. Then, the sample “snaps” to the closest tick mark.

When we’re all done, we have a set of 16-bit samples that we can work with. There is a lot of engineering that goes into this process. In other classes, you will learn how to sample signals, build analog-to-digital converters, and more. In this class, we focus on designing systems to represent real numbers with a limited number of bits.

2.2Example: Inherently digital data

Not all digital data are necessarily boring analog; sometimes you can create art, music, or videos completely without any analog reference. For example, the software POV-Ray is a rendering software that creates beautiful digital images that existed only in the artist’s head. Nowadays, there are entire fields of artificial intelligence around generating digital images and video, often entirely from digital data sources.

The Last Guardian, Johnny Yip
My First CGSphere, Robert McGregor
"A digital illustration depicts an underwater scene with a large marine reptile swimming past a shipwreck on a coral reef. Shafts of sunlight  enetrate the deep blue water, illuminating a massive school of fish above  the wreckage."

3Bits, Bytes, and Nibbles

A bit is a binary digit. It takes on the value 0 or 1. We use the phrases binary string, bitstring, bit sequence, etc. to refer to sequences of binary digits. For example, the set of length-four binary strings refers to the 24=162^4 = 16 bitstrings 0000, 0001, 0010, ..., 1111.

A byte is a bitstring of length 8. We will find that it is useful to have a standard grouping of bits, so that groups of bits can represent more information. A byte can represent 28=2562^8 = 256 things.

How should we colloquially discuss bytes? Instead of always writing out eight bits (and having to say, “zero zero one zero one one one one” for 00101111), we can write two hexadecimal digits for shorthand (and simply say 2F). Read the next section to learn about how to convert between hexadecimal vs binary values, and why having a hexadecimal shorthand is useful.

If you’re curious, 4 bits is called a “nibble” (or “nybble”) and can represent 24=162^4 = 16 things. This is equivalent to one hexadecimal digit.

4BIG IDEA: Bits can represent anything!

The big idea in this first lecture is:

Bits can represent anything.

Logical Values: Commonly, 0 is false and 1 is true.

Characters: We have 26 characters (A-Z). If we use 5 bits, 25=322^5 = 32, so we can have a bit pattern for each character, with six left over for other information.

Colors: HTML color codes are 24-bit (3-byte) representations. Figure 2 shows the HTML color code for California Gold, 0xFDB515. You will read more about hexadecimal and binary in the next section.

"A diagram features a horizontal gold abstraction line separating the word Numeral at the top from the word Number at the bottom. This visual layout reinforces the caption by positioning numerals as the symbolic representations above the line and numbers as the underlying abstract concepts below it."

Figure 2:HTML Color Codes

Locations/Addresses: IPv4 and IPv6 are 32-bit and 64-bit representations of device addresses on the Internet, also known as Internet Procotol addresses. Read more about IP Addreses if you’re curious.

Many types of data You can even represent emotions, like “happy” as 00 or “grumpy” as 01. We note that a 2-bit representation is likely not sufficient for representing the diverse range of human emotions. In fact, attempts to quantify human emotions (often for the purpose of processing data via computers) is a huge area of research. What are the implications of using computers to sample and discretize human experience? For more, we recommend you look into sociotechnical coursework that explores the human contexts and ethics of data.

5Anything you can itemize, you can digitize

The big idea of this lecture to memorize:

With N bits, you can represent at most 2N2^N things.

Put another way, you can represent kk things in at minimum NN bits, where N=log2kN = \lceil \log_2 k \rceil.

How many bits are needed to represent lowercase letters in English?

There are 26 lowercase letters in the English language: a, b, ..., z.

log2(26)=log10(26)/log2(2)4.7\log_2 (26) = \log_{10}(26)/\log_2(2) \approx 4.7

We therefore need at least 5 bits.

Double check: 5 bits represents 25=322^5 = 32 things, so we can definitely represent 26 letters (and six other things, if you want). 32 is the smallest power of 2 bigger than the number of things we want to store.