Types

In C++, objects, references, functions, and expressions all have a property called type, which both restricts the operations that are permitted for those entities and provides semantic meaning to the otherwise generic sequences of bits.

A type is a collection of values. For example, the Boolean type consists of the values true and false. The integers also form a type. An integer is a simple type because its values contain no subparts. In comparison, a bank account object is a compound type. A bank account typically contains several pieces of information such as name, address, account number, and account balance.

The C++ type system consists of the following types:

Fundamental representations

Machines view stored data as sequences of bits that can be manipulated by specific instructions. These instructions include shifts, logical operations, integer and floating arithmetic and more. While machines manipulate these bit sequences efficiently, people do not.

High level languages introduce abstractions to simplify working with data. Abstractions allow the data stored in memory to be managed not as a sequence of bits, but as an integer, a floating point number, a boolean, a character, or something else. In C++, the abstractions for the built in numeric types (plus the types void and nullptr_t) are referred to as fundamental types.

It is natural to think that the types defined for the computer are the same used in mathematics. That assumption would be incorrect. It is important to understand that all of the numeric types are abstractions, that is they are approximations of the basic concepts learned in school.

What exactly do we mean? The unsigned integer type represents the set of whole numbers. In mathematics, this set is infinite: given any number \(N\) a new integer \(M\) can be generated by simply setting \(M = N + 1\). However, this basic axiom does not apply to integers stored on a computer because and integer on a computer occupies a fixed size: that is, an unsigned is stored in a finite number of bits. Another way to say this is that every numeric fundamental type has both a minimum and maximum allowed value. This situation is very different from what you are used to in math class.

One consequence of a fixed size is that basic axioms of mathematics are not always true. For example, the ordinary associative law of addition:

\[(x + y) + z = x + (y + z)\]

is satisfied only if both \(|x + y| <= max int\) and \(|y + z| <= max int\).

In practice, the upper limit of an unsigned is so large that it is not a limiting factor in many ordinary numeric calculations. When the upper limit is exceeded, the results are typically dramatic. Overflow errors are typically obvious. Overflow errors may be more common when the type chosen is too small for its intended purpose.

A slightly more involved example is the problem of how to represent signed integers. In mathematics, negative numbers are represented by prefixing them with a minus (”−”) sign. However, on a computer, there are only bit sequences. Once again, programmers are faced with the problem of abstraction. What is the ‘best’ way to represent a signed integer that is both efficient and unambiguous?

As it turns out, there is no ‘best’ way. In fact, there are many ways to solve this problem. They all have their own trade-offs, but over time, three commonly used representations have been used:

  • Sign and magnitude

  • One’s complement

  • Two’s complement

While the two’s complement representation is now nearly universal, this was not always the case. In fact, during the 60’s and 70’s, debates raged about the best number format representations.

Sign and magnitude

Arguably the easiest to understand as the representation is very similar to how signed numbers are represented in mathematics. The number’s sign is stored in a sign bit: setting that bit (often the most significant bit) to 0 means a positive number or positive zero, and setting it to 1 is for a negative number or negative zero. The remaining bits in the number indicate the magnitude (or absolute value).

In eight bits the magnitude can range from 0000000 (0) to 1111111 (127). Numbers ranging from −127 to +127 can be represented once the sign bit (the eighth bit) is added. For example, −43 encoded in an eight-bit byte is 10101011 while 43 is 00101011. A consequence of using signed magnitude representation is that there are two ways to represent zero, 00000000 (0) and 10000000 (−0).

Some early binary computers (e.g., IBM 7090) use this representation for integers, perhaps because of its natural relation to common usage. However, it is slower and requires more complicated hardware than one’s complement or two’s complement representations.

Signed magnitude remains the most common way of representing the exponent in floating point values. It is the official IEEE floating point number format.

The program on the run tab isn’t meant to be completely understood, but does demonstrate the bits stored in the exponent and mantissa for some floats.

Feel free to modify main() and provide your own values.

int main(int argc, char** argv) {
  puts(" x = S   exp     mantissa");
  print_float(0.1);

The important item here is that even today, floating point numbers have two different representations for 0, a side effect of the sign and magnitude representation.

The following C program [Aspnes2014] prints the sign, exponent, and mantissa of a few small numbers.

Try This!

Run and carefully examine the results of the previous program.

How is the value of 0.1 different?

Try to list at least 1 error this might cause in your programs?

Try other values and see which ones have exact floating point representations and which do not. The numbers that may have exact representations are called the dyadic rationals.

One’s complement

Alternatively, a system known as ones’ complement can be used to represent negative numbers. The ones’ complement form of a negative binary number is the bitwise NOT applied to each bit in the number. That is, each negative number is the “complement” of its positive counterpart.

C++ provides a complement operator ~ for this purpose. The complement of the 5 bit binary number 11100, is 00011, which is the number 3.

Note that like sign-and-magnitude representation, ones’ complement has two representations of 0: 00000000 (+0) and 11111111 (−0).

A few of the 4 bit one’s complement integers are:

Decimal

-7

-2

-1

-0

0

1

2

7

Binary

1000

1101

1110

1111

0000

0001

0010

0111

One’s complement is important both historically, and because it is used to generate two’s complement numbers. No modern computers store one’s complement signed integers.

Two’s complement

A variation of one’s complement that avoids the “two zeroes problem” is two’s complement. In two’s complement, negative numbers are represented by the bit pattern which is one greater (in an unsigned sense) than the one’s complement of the positive value. A short 3 bit table comparing one’s and two’s complement looks like this:

3-bit pattern

100

101

110

111

000

001

010

011

One’s complement

-3

-2

-1

-0

0

1

2

3

Two’s complement

-4

-3

-2

-1

0

1

2

3

Unsigned value

4

5

6

7

0

1

2

3

Negating a number is done by inverting all the bits (in other words taking the one’s complement) and then adding one to that result.

Virtually all modern computers store signed integer values using the two’s complement representation. The following program demonstrates the consistency of the two’s complement representation.

A bitset is a simple way to see the sequence of ones and zeros for an integral type.

A bitset is a templated type that must be initialized with a size:

std::bitset<8> x;

The size determines the number of bits stored and doesn’t need to match the size of the variable. A value can be provided when declared:

std::bitset<4> x = 9;
auto y = std::bitset<4>(9);

Overflow

Why bother with all this obscure discussion about type representation now? Because it is very useful to know how numbers are actually stored when debugging your code. Sometimes a value cannot be represented in the limited number of bits allowed. Examples:

unsigned, 3 bits:    8 would require at least 4 bits (1000)
sign mag., 4 bits:   8 would require at least 5 bits (01000)

When a value cannot be represented in the number of bits allowed, we say that overflow has occurred. Overflow occurs when doing arithmetic operations.

example:          3 bit unsigned representation

  011 (3)
+ 110 (6)
---------
   ?  (9)     it would require 4 bits (1001) to represent
              the value 9 in unsigned rep.

Mistakes happen. Someday, you will write some code that will overflow the amount of space allocated for the type. It helps to understand what is going on if you recognize what overflow looks like for various types.

There is a standard C header to define the exact size of an integral type:

#include <cstdint>

Once included, a family of fixed size integral types are available:

int8_t
uint8_t

int16_t
uint16_t

and many others. Use these types like any other standard type you are already familiar with.

Where int sizes may vary from machine to machine, the size of int8_t is guaranteed to be an 8 bit signed integer always.

For this reason, the programming guidelines of many organizations prefer fixed size integral types to the generic int and long.

Ask yourself what is the largest number we can store in an 8 bit signed integer, then predict the output of the following program before you run it.

This video the representation of int on most computers.

  • How data is stored in computer’s memory.

  • Size and range of int

  • Signed and unsigned int

  • How negative numbers are stored in binary.

Overflow involving signed integral types overflows into the most significant bit, effectively changing the sign, as in the preceding example.

Unsigned integers do not, technically, overflow in that they do not become negative. They are after all, unsigned. They do however still ‘wrap around’ and the result can be that adding two values can result in a value smaller than the sum of the two values.

Try This!

Change the types in the previous program from int8_t to uint8_t. With no other changes what do you expect the output to be?

Keeping the types as uint8_t, change the value of x to 256 and run the program again.

What do you expect to see? What did you see?

If any of this was suprising, consider making more changes:

  • Try other unsigned types, short, or char

  • What happens if you assign -1 to a variable of unsigned type?

Preventing overflow errors

The C and C++ compilers do not check math overflow for you. You can turn on compiler warnings that can inform you about possible overflow or type conversion problems. A program can report when it happens at runtime, but by that point, the error has already occurred. It is generally preferred to check for possible overflow before attempting a calculation that might overflow.

The following program checks for addition overflow and reports an error if addition overflow would occur.

#include <iostream>
#include <limits>
#include <string>

int main () {
  int x = std::numeric_limits<int>::max();
  int y = x - 9;

  if (std::numeric_limits<int>::max() - x < y) {
    std::cerr << "addition failed: result is too big\n";
  } else {
    // addition is safe
    std::cout << "x+y = " << (x+y) << '\n';
  }

}

Similarly, checks for multiplication overflow and exponentiation overflow could use the following checks:

if (std::numeric_limits<int>::max() / x < y) {
 std::cerr << "multiplication failed: result is too big\n";
}

// number of bits in uint32_t
const num_bits = 32;
if (log2(base)*exponent > sizeof(uint32_t) * num_bits) {
  std::cerr << "exponentiation failed: result is too big\n";
}

Compound types

Most of the compound types will be covered in greater detail later in this book. Those that aren’t covered later are discussed now.

Array types

An array is a block of memory that holds one or more objects of a given type. Declare an array by giving the type of object the array holds followed by the array name and the size in square brackets:

int a[3];              // array of 3 ints
int b[3] = {4, 5, 6};  // array of 3 ints
int c[]  = {1, 2, 3};  // array of 3 ints
char name[64];         // array of 64 characters

Arrays can be constructed from any fundamental type, pointers, pointers to members, classes, enumerations, or from other arrays (in which case the array is said to be multi-dimensional). Arrays cannot be constructed from references, functions, or abstract class types.

Objects of array type cannot be modified as a whole: even though they are lvalues (e.g. an address of array can be taken), they cannot appear on the left hand side of an assignment operator

int a[3] = {4, 5, 6};  // array of 3 ints with initial values
int (*b)[3] = &a;      // OK to make an array of pointers using an address
int c[3];              // array of 3 ints with default initial values
c = a;                 // Error. Can't assign to an array
c[0] = a[0];           // OK.

It’s easy to forget that arrays always supply a default value if one is not provided.

This is true even if only part of a multi-dimensional array is initialized with values. Consider the statement:

int a[2][4] = {{1,2,3,4}};

What is declared?

What is initialized?

Then run it.

Reference types

A reference type declares a named variable as an alias to an already-existing object or function.

References are a C++ addition - one of the few types not present in C.

A reference is required to be initialized to refer to a valid object or function. There are no references to void and no references to references.

int a = 3;
int& r1 = a; // r1 is a reference to a
r1 = 72;     // changes the value of a
int& r2;     // Error. r2 must refer to something

Once initialized, a reference always refers to the same object. The value of the object may change, but the address referred to may not.

Note

C++ 11 introduced a new kind of reference, an rvalue reference. We will cover this when we get into classes. All of the references discussed until then will be lvalue references.

The type std::size_t

There exists an implementation defined typedef std::size_t. The type std::size_t represents the maximum number of bytes that can be stored for an object of any type (including array).

In order to use size_t you need include the header cstdlib.

This means that size_t is guaranteed to always be big enough to use safely as an index in any array. This doesn’t mean you can’t access an invalid element of an array, only that the index can be increased without worrying about the index variable overflowing (see the previous discussion about overflow).

The purpose of size_t is to relieve the programmer from having to worry about which of the predefined unsigned types is used to represent sizes.

Code that assumes sizeof yields an unsigned int is not as portable as code that assumes it yields a size_t.

size_t is commonly used for array indexing and loop counting. Programs that use other types, such as unsigned int, for array indexing may fail, for example, on 64-bit systems when the index exceeds UINT_MAX or if it relies on 32-bit modular arithmetic.

So, if you must write a hand-rolled loop to loop though a container that returns its size, then prefer this:

for (size_t i = 0; i < foo.size(); ++i)

over this:

for (int i = 0; i < foo.size(); ++i)

As a programmer, you need to use caution if the variable i is to be used for anything other than an index — for example, in an arithmetic expression. Avoid mixing signed and unsigned types as the results can be surprising. Also be aware that C++ uses signed integers for array subscripts and the standard library uses unsigned integers for container subscripts. This makes absolute consistency in all situations impossible.

Later on, we will cover techniques that improve on iterating through data even more.

Try This!

What do you think the output of the following program will be? If you have access to another computer, try compiling in ‘32 bit’ mode

Run it and check your assumptions.

What sizes are different? Why?

What are the implications of these differences when writing code that needs to run on both?

Displaying numbers

The standard library provides facilities to modify the base for integers stored in memory. You can change the base using setbase, or change how a number is displayed using dec, using hex, using oct,

These functions are all I/O manipulators. They may be called with an expression such as

out << std::hex

for any out of type basic_ostream or with an expression such as

in >> std::hex

for any in of type basic_istream. For example:

std::cout << "The number 42 in octal:   " << std::oct << 42 << '\n'
          << "The number 42 in decimal: " << std::dec << 42 << '\n'
          << "The number 42 in hex:     " << std::hex << 42 << '\n';
int n;
std::istringstream("2A") >> std::hex >> n;
std::cout << std::dec << "Parsing \"2A\" as hex gives " << n << '\n';
// the output base is sticky until changed
std::cout << std::hex << "42 as hex gives " << 42
          << " and 21 as hex gives " << 21 << '\n';

This example displays a simple table of the ASCII characters in 3 different bases.

Self Check

Fix all the errors in the code below:


More to Explore

You have attempted of activities on this page