5.4. Pointers and arrays¶
Pointers are not arrays and arrays are not pointers. However, much confusion arises between them because arrays in expressions often behave like pointers. The term you’ll often see is that arrays decay into pointers.
Things don’t really “decay” in C or C++, but the name of an array resolves to an address. Given the following array:
int data[5] = {3, 5, 8, 13, 21};
The variable data[0]
stores the value 3
.
The variable data
stores the address of element 0.
Pointers store addresses, so the same of an array and a pointer both store the same kinds of data.
Any array type will implicitly convert to a pointer of the type stored in the array. The pointer is constructed to point to the first element of the array. This conversion happens whenever arrays are used in an expression where arrays are not expected, but pointers are:
#include <iostream>
int main() {
int a[3] = {13, 21, 35};
int* p = a;
std::cout << sizeof a << '\n' // prints size of array
<< sizeof p << '\n'; // prints size of a pointer
for(int n: a) { // okay: arrays can be used in range-for loops
std::cout << n << ' '; // prints elements of the array
}
// for(int n: p) { // error: no range for looping on a pointer
// arrays and pointers share the same semantics
std::cout << '\n'
<< *a << '\n' // prints the first element
<< *p << '\n' // same
<< *(a + 1) << ' ' << a[1] << '\n' // prints the second element twice
<< *(p + 1) << ' ' << p[1] << '\n'; // same
}
This behavior applies to function calling as well:
#include <iostream>
// print first element of array using pointer dereference
void g(int (&a)[3]) {
std::cout << *a << '\n';
}
// print first element of array using array semantics through pointer
void f(int* p) {
std::cout << p[0] << '\n';
}
int main() {
int a[3] = {13, 21, 35};
int* p = a;
// where arrays are acceptable, but pointers aren't, only arrays may be used
g(a); // okay: function takes an array by reference
// g(p); // error: pointers do not implicitly convert to arrays
// where pointers are acceptable, but arrays aren't, both may be used:
f(a); // okay: function takes a pointer
f(p); // okay: function takes a pointer
}
This example shows many of the concepts we have discussed so far in one spot.
5.4.1. Array indexing pitfalls¶
Pitfall #1
Arrays perform absolutely no bounds checking.
Read that again.
Good.
Now consider that no compiler will complain about this code:
1int* p = int[3];
2p[0] = 3; // OK
3p[2] = 5; // OK
4p[99] = 8; // oops! where did we write this?
5p[-7] = 8; // or this!
No compiler will inform you that on line 4 we just wrote an 8
at a location 96 positions past the end of the array.
Nor will it inform you that on line 5, we just wrote to a location
7 positions before the beginning of the array.
Most pointer examples you see will never attempt to use operator[]
to index a pointer that is not an array.
This is a good thing, but as you might expect, if you make a mistake,
the compiler has nothing to offer:
int n = 5;
int* p = &n;
int x = p[99] + 2;
Even with all compiler warnings enabled, most compilers will emit nothing at all.
No compiler will inform you that
we just accessed a piece of memory 98 ints
past the one you own.
Whatever is stored there, we then added 2 to it
and assigned that value to y
.
The compiler doesn’t even know p
is a pointer to just one int
.
Most programmers know better than to make errors this large. We’re just demonstrating here that even big mistakes can be completely ignored by the compiler. What is for more common is an off by one error where your array index or pointer address is wrong only by 1. Accessing even a single byte outside your valid memory bounds is still an error and one of the most common errors in C and C++ programs.
Pitfall #2
From the standard:
The definition of the subscript
operator[]
is thatE1[E2]
is identical to(*((E1)+(E2)))
. Because of the conversion rules that apply to the binaryoperator+
, ifE1
is an array object (equivalently, a pointer to the initial element of an array object) andE2
is an integer,E1[E2]
designates theE2
-th element ofE1
(counting from zero).
Note
What the standard doesn’t repeat here is that addition commutes, that is
\(a+b = b+a\).
A side-effect of this fact is that for any array and index pair a[i]
,
then a[i]
must be equivalent to i[a]
.
Although the standard does not strictly prohibit this syntax, doesn’t mean you should use it.
This pitfall is only a problem when using arrays of type int
with easily confused variable names.
The lesson: use variables appropriate for the scope.
In this case, perhaps a single letter (a) for the array was too short.
5.4.2. Arrays of type char
¶
In the C language,
the abstract idea of a string is implemented with an array of characters.
Arrays of char
that are null terminated are commonly called C strings.
In older C and C++ code using C strings, it’s common to see code that uses the null terminator in the C string as a loop exit condition:
Code like this can fail if the source string contains any embedded null characters. The risk is that this code works fine 99% of the time, but fails when working with character data from an uncontrolled source (a network or socket interface, for example).
Try This!
Run the previous example, but modify it, replacing the ‘Hello World’ with ‘Hello\0World’. What happens?
What warnings does the compiler display?
More to Explore
MITRE Common Weakness Enumerations Off by one error