Arrays

1Learning Outcomes¶

Declare and initialize C arrays.
Understand that C arrays should be treated as contiguous blocks of memory, not as pointers. Array names are synonymous with the location of the first element in the array.
Translate array indexing into pointer arithmetic followed by a dereference operation.
Decay arrays to pointers when used as formal parameters for function definitions or arguments to functions.

🎥 Lecture Video

Meet the jewelry making community [CS61C FA20] Lecture 05.1 - C Memory Management: Dynamic Memory Allocation s"

From 9:36 onwards: Arrays are not pointers example

We continue our exploration of memory by studying C arrays. On the surface, C arrays seem fairly similar to what you might recognize from Java. In this section, we learn that arrays in C are neither variables nor pointers. When used in C statements, array names often behave like pointer variable names, for reasons we will describe shortly.

To declare an array of two elements without initializing its values, we can use the below statement. This statement declares a block of memory large enough to hold two contiguous ints. It does not initialize values, so we can assume elements contain garbage:

int arr_unitialized[2];

To initialize and declare an array of two elements 795 and 635, in that order:

int arr2[] = {795, 635};

or equivalently

int arr2[2] = {795, 635};

Square-bracket indexing is one way to access elements of the array. Like many languages, C specifies zero-indexed arrays:

arr2[0]; // 795

2Array indexing uses pointer arithmetic¶

Is there another way to access array elements? Yes, otherwise we would not have been so cryptic earlier.

Square-bracket indexing for C arrays is what we call “syntactic sugar”–meaning, it exists for human readability, but the C compiler will translate it to two operations: pointer arithmetic followed by dereference:

The expression arr[i] is equivalent to the expression *(arr+i). The latter treats the array name arr as a pointer, increments it, then dereferences.

2.1Example¶

Suppose that when compiled, Program 1 below produces the memory layout in Figure 1. q is a pointer to a 32-bit unsigned integer, while arr is an array, i.e., a 24-byte contiguous block of 32-bit unsigned integers.

1
2
3
4
5
6
7
8
9
10
#include <stdio.h>

int main () {
  uint32_t arr[] = {50, 60, 70}; // 32-bit unsigned array
  uint32_t *q = arr;

  printf("    *q: %d is %d\n", *q, q[0]);
  printf("*(q+1): %d is %d\n", *(q+1), q[1]);
  printf("*(q-1): %d is %d\n", *(q-1), q[-1]);
}

"TODO" — Figure 1:Memory layout for Program 1.

Because square-indexing is syntactic sugar for pointer arithmetic and dereference:

Line 4: The pointer q points to an unsigned 32-bit integer at address 0x100, which is 50. Print *q: 50 is 50.
Line 5: Incrementing q points to the next 32-bit unsigned integer. If q points to the unsigned 32-bit integer at address 0x100, then incrementing q points to the next 32-bit unsigned integer at address 0x104, which is 60. Print *(q+1): 60 is 60.
Line 6: Because square-bracket indexing is syntactic sugar, negative indexing does not produce any error. Instead, decrementing q points to the previous 32-bit unsigned integer at address 0x9c, which is an unknown value. This line would likely print garbage, e.g., *(q-1): 32490 is 32490.

3Arrays are not pointers¶

From K&R:

There is one difference between an array name [(such as a)] and a pointer [(such as pa)] that must be kept in mind. A pointer is a variable, so pa=a and pa++ are legal. But an array name is not a variable; constructions like a=pa and a++ are illegal.

Also from K&R:

The name of an array is a synonym for the location of the initial element.

Pointers and arrays therefore differ in how they behave with the address operator, &. Consider Program 2:^[1]

1
2
3
4
5
6
7
8
9
10
11
12
13
int *p, *q, x;
int a[4];
p = &x;
q = a + 1;

*p = 1;
printf("*p:%d, p:%x, &p:%x\n", *p, p, &p);

*q = 2;
printf("*q:%d, q:%x, &q:%x\n", *q, q, &q);

*a = 3;
printf("*a:%d, a:%x, &a:%x\n", *a, a, &a);

With the memory layout in Figure 2, the output is:

*p:1, p:108, &p:100
*q:2, q:110, &q:104
*a:3, a:10c, &a:10c

The address of the array a is the address of the array itself, i.e., the address of the large contiguous memory block of ints!

Show Explanation

We discuss multiple declaration in a previous section.

Line 3: The int pointer, p, is initialized to the address of the int variable x .
- Line 6: Take the value p points to; set it to 1.
- *p dereferences p and gets the value at address 0x108, which is 1.
- p is a pointer variable; p’s value is an address, which is 0x108.
- &p is the address of the variable p, which is 0x100.
Line 4: The int pointer q is initialized to the result of a + 1, which is pointer arithmetic! In the expression, the array name a is the address of the first element in a; incrementing by one yields the address of the second element of a, at 0x110.
- Line 9: Take the value q points to; set it to 2.
- *q dereferences q and gets the value at address 0x110, which is 2.
- q is a pointer variable; q’s value is an address, which is 0x110.
- &q is the address of the variable q, which is 0x104.
Line 2: The array a is a memory block of 4 ints. The array starts at address 0x10c, which is also the address of its first element.
- Line 12: The array name a is the address of the first element in a; the statement *a = 3; gets this value and sets it to 3.
- *a is pointer arithmetic followed by a dereference. The array name a is the address of the first element in a; dereferencing gets the element itself, which is 3.
- a is the address of the first element in a by definition, which is 0x10c.
- &a gets the address of the array a, which is 0x10c.^[2]

4Array names “decay” with functions¶

When used with functions, arrays decay to pointers in two ways. We use Program 3 below as an example.

1
2
3
4
5
6
7
8
9
int bar(int arr[], size_t nelems){
   … arr[…] … 
}
int main(void) {
    int a[5], b[10];
    … 
    bar(a, 5);
    …
}

1. When used as formal parameters for function definitions. On Line 2 of Program 3, the definition int arr[] is syntactic sugar for the definition int *arr. We recommend using the latter where possible to avoid confusion.

2. When passed in as arguments to function calls. On Line 7 of Program 3, the argument a is an array but decays to a pointer when the function is called. This decay effectively passes in the address of a as the first argument of bar.

5`sizeof` with arrays¶

We’ve discussed sizeof many times. For arrays, the compile-time operator will evaluate to the size of the array, in bytes.^[2] This observation informs the behavior of Program 4:

1
2
3
4
5
6
7
8
9
10
11
void mystery(short arr[], int len) {
    printf("%d ", len);
    printf("%d\n", sizeof(arr));
}

int main() {
    short nums[] = {1, 2, 3, 99, 100};
    printf("%d ", sizeof(nums));
    mystery(nums, sizeof(nums)/sizeof(short));
    return 0;
}

Show Answer

Print output: 0 5 8

In Line 10, sizeof(nums) is in array’s declared scope. Evaluate to the total array size of five shorts, i.e., 10.
In Line 4, the value len is the result of evaluating sizeof(nums)/sizeof(short) in main, i.e., 10/2 = 5.
In Line 8, arr is a function parameter. The formal declaration short arr[] is syntactic sugar for short *arr, so arr is a pointer. The size of a pointer is 64 bits, so sizeof(arr) is 8.

In practice, C programmers will commonly use sizeof(nums)/sizeof(short) to count the number of elements in the array nums. Note that nums must be declared in the same scope, otherwise it decays to a pointer.

6Arrays are primitive! Reminders¶

Hopefully this section has convinced you that arrays are relatively primitive constructs:

Array declarations set aside contiguous blocks in memory.
Array names are synonymous with the location of the first element in the array.
Arrays decay to pointers when used as function parameters or function arguments.

We close with a few final reminders of how this primitive nature begets responsible C practices.

Reminder 1

Keep array sizes in constants where possible.

Instead of code that manages multiple copies of integer constants,

int i, arr[10];
for(i = 0; i < 10; i++) { ... }

choose to declare a “single source of truth”:

const int ARRAY_SIZE = 10;
int i, a[ARRAY_SIZE];
for(i = 0; i < ARRAY_SIZE; i++) { ... }

Reminder 2

Array bounds are not checked during element access.

Element accessing is just pointer arithmetic with a dereference, so it is very easy to accidentally access off the end of an array. Can you find the subtle bug in this code?

const int N = 100;
int foo[N];
int i;
...
for(i = 0; i <= N; ++i) {
   foo[i] = 0;
}

Improper access off the end of an array is referred to as buffer overflow,^[3]. This very common bug can corrupt other parts of the program, including internal C data. Buffer overflow exploits are security vulnerabilities that can crash programs

Footnotes¶

%d: signed decimal, %x: hex. Wikipedia
↩
We thought long and hard about how to explain &a and sizeof(a) (it involved sitting in a dark room with loud music). Both operations likely boil down to reasonable C design. After all, there must be some way to refer to the address and the size of an array. Instead of erroring, these two expressions are likely the only exception to treating array names as synonymous with the address of the first element. If you, the reader, have a better explanation, we’d love to use it. Submit a pull request!
↩↩
Take Computer Security to learn more! Wikipedia
↩