Types¶
In C++, objects, references, functions, and expressions all have a property called type, which both restricts the operations that are permitted for those entities and provides semantic meaning to the otherwise generic sequences of bits.
A type is a collection of values.
For example, the Boolean type consists of the values true
and
false
.
The integers also form a type.
An integer is a simple type
because its values contain no subparts.
In comparison, a bank account object is a compound type.
A bank account typically contains several pieces of
information such as name, address, account number, and account
balance.
The C++ type system consists of the following types:
-
The type
void
The type
nullptr_t
Arithmetic types
Floating point types (
float
,double
, andlong double
)Integral types
The
bool
type (true & false)The character types:
narrow character types (
char
,signed char
,unsigned char
)wide character types (
char16_t
,char32_t
,wchar_t
)
signed integer types (
short int
,int
,long int
,long long int
)unsigned integer types (
unsigned short int
,unsigned int
,unsigned long int
,unsigned long long int
)
Compound types
Fundamental representations¶
Machines view stored data as sequences of bits that can be manipulated by specific instructions. These instructions include shifts, logical operations, integer and floating arithmetic and more. While machines manipulate these bit sequences efficiently, people do not.
High level languages introduce
abstractions to simplify
working with data.
Abstractions allow the data stored in memory to be managed
not as a sequence of bits, but as an integer,
a floating point number, a boolean, a character, or something else.
In C++, the abstractions for the built in numeric types
(plus the types void
and nullptr_t
)
are referred to as fundamental types.
It is natural to think that the types defined for the computer are the same used in mathematics. That assumption would be incorrect. It is important to understand that all of the numeric types are abstractions, that is they are approximations of the basic concepts learned in school.
What exactly do we mean?
The unsigned integer type represents the set of whole numbers.
In mathematics, this set is infinite:
given any number \(N\) a new integer
\(M\) can be generated by simply setting \(M = N + 1\).
However, this basic axiom does not apply to integers stored on a computer
because and integer on a computer occupies a fixed size: that is, an unsigned
is stored
in a finite number of bits.
Another way to say this is that every numeric fundamental type has both a
minimum and maximum allowed value.
This situation is very different from what you are used to in math class.
One consequence of a fixed size is that basic axioms of mathematics are not always true. For example, the ordinary associative law of addition:
is satisfied only if both \(|x + y| <= max int\) and \(|y + z| <= max int\).
In practice, the upper limit of an unsigned
is so large that it
is not a limiting factor in many ordinary numeric calculations.
When the upper limit is exceeded, the results are typically dramatic.
Overflow errors are typically obvious.
Overflow errors may be more common when the type chosen is too small
for its intended purpose.
A slightly more involved example is the problem of how to represent signed integers. In mathematics, negative numbers are represented by prefixing them with a minus (”−”) sign. However, on a computer, there are only bit sequences. Once again, programmers are faced with the problem of abstraction. What is the ‘best’ way to represent a signed integer that is both efficient and unambiguous?
As it turns out, there is no ‘best’ way. In fact, there are many ways to solve this problem. They all have their own trade-offs, but over time, three commonly used representations have been used:
Sign and magnitude
One’s complement
Two’s complement
While the two’s complement representation is now nearly universal, this was not always the case. In fact, during the 60’s and 70’s, debates raged about the best number format representations.
Sign and magnitude¶
Arguably the easiest to understand as the representation is very similar to how signed numbers are represented in mathematics. The number’s sign is stored in a sign bit: setting that bit (often the most significant bit) to 0 means a positive number or positive zero, and setting it to 1 is for a negative number or negative zero. The remaining bits in the number indicate the magnitude (or absolute value).
In eight bits the magnitude can range from 0000000
(0) to 1111111
(127).
Numbers ranging from −127 to +127 can be represented once the sign bit (the eighth bit) is added.
For example, −43 encoded in an eight-bit byte is 10101011
while 43 is 00101011
.
A consequence of using signed magnitude representation is that there are two ways to represent zero,
00000000
(0) and 10000000
(−0).
Some early binary computers (e.g., IBM 7090) use this representation for integers, perhaps because of its natural relation to common usage. However, it is slower and requires more complicated hardware than one’s complement or two’s complement representations.
Signed magnitude remains the most common way of representing the exponent in floating point values. It is the official IEEE floating point number format.
The program on the run tab isn’t meant to be completely understood, but does demonstrate the bits stored in the exponent and mantissa for some floats.
Feel free to modify main()
and provide your
own values.
int main(int argc, char** argv) {
puts(" x = S exp mantissa");
print_float(0.1);
The important item here is that even today,
floating point numbers have two different representations for 0
,
a side effect of the sign and magnitude representation.
The following C program [Aspnes2014] prints the sign, exponent, and mantissa of a few small numbers.
Try This!
Run and carefully examine the results of the previous program.
How is the value of 0.1
different?
Try to list at least 1 error this might cause in your programs?
Try other values and see which ones have exact floating point representations and which do not. The numbers that may have exact representations are called the dyadic rationals.
One’s complement¶
Alternatively, a system known as ones’ complement can be used to represent negative numbers. The ones’ complement form of a negative binary number is the bitwise NOT applied to each bit in the number. That is, each negative number is the “complement” of its positive counterpart.
C++ provides a complement operator ~
for this purpose.
The complement of the 5 bit binary number 11100
, is 00011
, which is the number 3.
Note that like sign-and-magnitude representation, ones’ complement has two representations
of 0: 00000000
(+0) and 11111111
(−0).
A few of the 4 bit one’s complement integers are:
Decimal |
-7 |
-2 |
-1 |
-0 |
0 |
1 |
2 |
7 |
---|---|---|---|---|---|---|---|---|
Binary |
1000 |
1101 |
1110 |
1111 |
0000 |
0001 |
0010 |
0111 |
One’s complement is important both historically, and because it is used to generate two’s complement numbers. No modern computers store one’s complement signed integers.
Two’s complement¶
A variation of one’s complement that avoids the “two zeroes problem” is two’s complement. In two’s complement, negative numbers are represented by the bit pattern which is one greater (in an unsigned sense) than the one’s complement of the positive value. A short 3 bit table comparing one’s and two’s complement looks like this:
3-bit pattern |
100 |
101 |
110 |
111 |
000 |
001 |
010 |
011 |
---|---|---|---|---|---|---|---|---|
One’s complement |
-3 |
-2 |
-1 |
-0 |
0 |
1 |
2 |
3 |
Two’s complement |
-4 |
-3 |
-2 |
-1 |
0 |
1 |
2 |
3 |
Unsigned value |
4 |
5 |
6 |
7 |
0 |
1 |
2 |
3 |
Negating a number is done by inverting all the bits (in other words taking the one’s complement) and then adding one to that result.
Virtually all modern computers store signed integer values using the two’s complement representation. The following program demonstrates the consistency of the two’s complement representation.
A bitset is a simple way to see the sequence of ones and zeros for an integral type.
A bitset
is a templated type that must be
initialized with a size:
std::bitset<8> x;
The size determines the number of bits stored and doesn’t need to match the size of the variable. A value can be provided when declared:
std::bitset<4> x = 9;
auto y = std::bitset<4>(9);
Overflow¶
Why bother with all this obscure discussion about type representation now? Because it is very useful to know how numbers are actually stored when debugging your code. Sometimes a value cannot be represented in the limited number of bits allowed. Examples:
unsigned, 3 bits: 8 would require at least 4 bits (1000)
sign mag., 4 bits: 8 would require at least 5 bits (01000)
When a value cannot be represented in the number of bits allowed, we say that overflow has occurred. Overflow occurs when doing arithmetic operations.
example: 3 bit unsigned representation
011 (3)
+ 110 (6)
---------
? (9) it would require 4 bits (1001) to represent
the value 9 in unsigned rep.
Mistakes happen. Someday, you will write some code that will overflow the amount of space allocated for the type. It helps to understand what is going on if you recognize what overflow looks like for various types.
There is a standard C header to define the exact size of an integral type:
#include <cstdint>
Once included, a family of fixed size integral types are available:
int8_t
uint8_t
int16_t
uint16_t
and many others. Use these types like any other standard type you are already familiar with.
Where int
sizes may vary from machine to machine,
the size of int8_t
is guaranteed to be an 8 bit
signed integer always.
For this reason, the programming guidelines of many
organizations prefer fixed size integral types to the
generic int
and long
.
Ask yourself what is the largest number we can store in an 8 bit signed integer, then predict the output of the following program before you run it.
This video the representation of int on most computers.
How data is stored in computer’s memory.
Size and range of int
Signed and unsigned int
How negative numbers are stored in binary.
Overflow involving signed integral types overflows into the most significant bit, effectively changing the sign, as in the preceding example.
Unsigned integers do not, technically, overflow in that they do not become negative. They are after all, unsigned. They do however still ‘wrap around’ and the result can be that adding two values can result in a value smaller than the sum of the two values.
Try This!
Change the types in the previous program from int8_t
to uint8_t
.
With no other changes what do you expect the output to be?
Keeping the types as uint8_t
,
change the value of x
to 256 and run the program again.
What do you expect to see? What did you see?
If any of this was suprising, consider making more changes:
Try other unsigned types, short, or char
What happens if you assign
-1
to a variable of unsigned type?
Preventing overflow errors¶
The C and C++ compilers do not check math overflow for you. You can turn on compiler warnings that can inform you about possible overflow or type conversion problems. A program can report when it happens at runtime, but by that point, the error has already occurred. It is generally preferred to check for possible overflow before attempting a calculation that might overflow.
The following program checks for addition overflow and reports an error if addition overflow would occur.
#include <iostream>
#include <limits>
#include <string>
int main () {
int x = std::numeric_limits<int>::max();
int y = x - 9;
if (std::numeric_limits<int>::max() - x < y) {
std::cerr << "addition failed: result is too big\n";
} else {
// addition is safe
std::cout << "x+y = " << (x+y) << '\n';
}
}
Similarly, checks for multiplication overflow and exponentiation overflow could use the following checks:
if (std::numeric_limits<int>::max() / x < y) {
std::cerr << "multiplication failed: result is too big\n";
}
// number of bits in uint32_t
const num_bits = 32;
if (log2(base)*exponent > sizeof(uint32_t) * num_bits) {
std::cerr << "exponentiation failed: result is too big\n";
}
Compound types¶
Most of the compound types will be covered in greater detail later in this book. Those that aren’t covered later are discussed now.
Array types¶
An array is a block of memory that holds one or more objects of a given type. Declare an array by giving the type of object the array holds followed by the array name and the size in square brackets:
int a[3]; // array of 3 ints
int b[3] = {4, 5, 6}; // array of 3 ints
int c[] = {1, 2, 3}; // array of 3 ints
char name[64]; // array of 64 characters
Arrays can be constructed from any fundamental type, pointers, pointers to members, classes, enumerations, or from other arrays (in which case the array is said to be multi-dimensional). Arrays cannot be constructed from references, functions, or abstract class types.
Objects of array type cannot be modified as a whole: even though they are lvalues (e.g. an address of array can be taken), they cannot appear on the left hand side of an assignment operator
int a[3] = {4, 5, 6}; // array of 3 ints with initial values
int (*b)[3] = &a; // OK to make an array of pointers using an address
int c[3]; // array of 3 ints with default initial values
c = a; // Error. Can't assign to an array
c[0] = a[0]; // OK.
It’s easy to forget that arrays always supply a default value if one is not provided.
This is true even if only part of a multi-dimensional array is initialized with values. Consider the statement:
int a[2][4] = {{1,2,3,4}};
What is declared?
What is initialized?
Then run it.
Reference types¶
A reference type declares a named variable as an alias to an already-existing object or function.
References are a C++ addition - one of the few types not present in C.
A reference is required to be initialized to refer to a valid object or function.
There are no references to void
and no references to references.
int a = 3;
int& r1 = a; // r1 is a reference to a
r1 = 72; // changes the value of a
int& r2; // Error. r2 must refer to something
Once initialized, a reference always refers to the same object. The value of the object may change, but the address referred to may not.
Note
C++ 11 introduced a new kind of reference, an rvalue reference. We will cover this when we get into classes. All of the references discussed until then will be lvalue references.
The type std::size_t
¶
There exists an implementation defined typedef
std::size_t.
The type std::size_t
represents the maximum number of bytes
that can be stored for an object of any type (including array).
In order to use size_t you need include the header cstdlib
.
This means that size_t
is guaranteed to always be big enough to use safely
as an index in any array.
This doesn’t mean you can’t access an invalid element of an array,
only that the index can be increased without worrying about the index
variable overflowing (see the previous discussion about overflow).
The purpose of size_t
is to relieve the programmer from having
to worry about which of the predefined unsigned types is used to represent sizes.
Code that assumes sizeof
yields an unsigned int
is not as portable
as code that assumes it yields a size_t
.
size_t
is commonly used for array indexing and loop counting.
Programs that use other types, such as unsigned int
,
for array indexing may fail, for example,
on 64-bit systems when the index exceeds UINT_MAX
or
if it relies on 32-bit modular arithmetic.
So, if you must write a hand-rolled loop to loop though a container that returns its size, then prefer this:
for (size_t i = 0; i < foo.size(); ++i)
over this:
for (int i = 0; i < foo.size(); ++i)
As a programmer, you need to use caution if the variable i
is to be used for anything
other than an index — for example, in an arithmetic expression.
Avoid mixing signed and unsigned types as the results can be surprising.
Also be aware that C++ uses signed integers for array subscripts and
the standard library uses unsigned integers for container subscripts.
This makes absolute consistency in all situations impossible.
Later on, we will cover techniques that improve on iterating through data even more.
Try This!
What do you think the output of the following program will be? If you have access to another computer, try compiling in ‘32 bit’ mode
Run it and check your assumptions.
What sizes are different? Why?
What are the implications of these differences when writing code that needs to run on both?
Displaying numbers¶
The standard library provides facilities to modify the base for integers stored in memory. You can change the base using setbase, or change how a number is displayed using dec, using hex, using oct,
These functions are all I/O manipulators. They may be called with an expression such as
out << std::hex
for any out of type basic_ostream or with an expression such as
in >> std::hex
for any in of type basic_istream. For example:
std::cout << "The number 42 in octal: " << std::oct << 42 << '\n'
<< "The number 42 in decimal: " << std::dec << 42 << '\n'
<< "The number 42 in hex: " << std::hex << 42 << '\n';
int n;
std::istringstream("2A") >> std::hex >> n;
std::cout << std::dec << "Parsing \"2A\" as hex gives " << n << '\n';
// the output base is sticky until changed
std::cout << std::hex << "42 as hex gives " << 42
<< " and 21 as hex gives " << 21 << '\n';
This example displays a simple table of the ASCII characters in 3 different bases.
Self Check
-
sc-1-8: Drag the definition from the left and drop it on the correct concept on the right. Click the "Check Me" button to see if you are correct
Review the summaries above.
- Specifying the type and name for a variable
- declaring a variable
- A whole number
- integer
- A name associated with a memory location.
- variable
- An expression that is either true or false
- bool
- int 3;
- Declarations must always include a name
- const double pi;
- Constant declarations must always include a value
- x = 21;
- Declarations must always include a type. As written, this is assignment, not declaration
- char* s = "hello";
- Correct
- constexpr int min = 1;
- Correct
sc-1-9: Which declarations are valid?
-
sc-1-10: Drag the definition from the left and drop it on the correct concept on the right. Click the "Check Me" button to see if you are correct.
Review the summaries above.
- Setting the value of a variable the first time
- initialize
- An operator that returns the remainder
- modulus
- a type used to represent decimal values
- double
- changing the type of a variable
- casting
sc-1-11: Given the following:
#include <cstdint>
int main() {
int8_t x = 128, y = 1;
auto z = x+y;
}
What is value stored in z
?
sc-1-12: Given the following:
int x = ~-1;
What is value stored in x
?
Fix all the errors in the code below:
sc-1-15: Given the following:
#include <iostream>
int main () {
char x[2][3] = {{'a','b','c'}};
std::cout << x[0][1] << '\n';
}
What is value displayed?
More to Explore
From cppreference.com
types and std::size_t
typedef and type aliases
ISO CPP Super FAQ: Floating point questions
What every computer scientist should know about floating-point arithmetic
Dyadic rationals on Wikipedia
CPP Core Guidelines: Arithmetic
An interesting alternative to explore, Google Protocol Buffers use variable length zig-zag encoding