1.10. Collections¶

In addition to the numeric, Boolean, and character types, C++ also offers built-in collection types. A collection data type is a grouping of some number of other data items (possibly only zero or one) that have some shared significance or need to be operated upon together.

Arrays, vectors, strings, sets, and hash tables are among these useful C++ collection types.

1.10.1. Arrays¶

What is an Array? An array is a data structure consisting of an ordered collection of data elements, of identical type in which each element can be identified by an array index. More technically, an array data structure is an ordered arrangement of values located at equally spaced addresses in contiguous computer memory.

NOTE: A C++ array is always stored in contiguous memory. C++ arrays can be allocated in two different ways:

1. statically allocated in which the array size is fixed at compile-time and cannot change

2. dynamically allocated in which pointers are used in the allocation process so the size can change at run-time

In modern C++, the statically allocated array is typically used in situations when speed is essential or where hardware constraints exist, and a data structure called a vector is typically used when more flexibility is required.

Because C++ stores variables directly, each element of a C++ array must be of identical data type. The indices for arrays start counting with 0.

The use of arrays permits us to utilize an ordered set of memory locations that we can then manipulate as a single entity, and that at the same time gives us direct access to each individual component.

Why use an Array?

C++ is a language often used for real-time or low-level processing where speed is essential and/or there is only a fixed amount of space available.

The fact that array elements are stored in memory in contiguous memory locations making look-up via index very, very fast. In computing, a word is the unit of data used by a particular processor design, such as 32 or 64 bits. For example, an array of 100 integer variables, with indices 0 through 99, might be stored as 100 words at memory addresses 20000, 20004, 20008, … 20396. The element with index i would be located at the address 20000 + 4 × i.

Statically allocated C++ arrays must be declared with both a type and a size at compile-time.

double darray[4];
int iarray[10];
char arr2[3000];


It is also possible to initialize statically allocated arrays at compile time, in which case the number of items determines the array’s size.

int arr[] = {1, 2, 3, 4};  // This is size 4.
char arr2[] = {'a', 'b', 'c'}; // This is size 3.
string arr3[] = {"this", "is", "an", "array"}; // This is size 4.


Note that an array with a single element is not the same type as the atomic type, so the following are not the same.

double darray[] = {1.2};  // must use index to access value
double ddouble = 1.2;     // can't use index to access value


Be Cautious with Arrays

The speed and low-level control that arrays offer us as programmers is powerful… and dangerous. C++ is designed for speed, and using a C++ array will help you better understand the trade-offs inherent in this.

Here are examples of iteration.

C++ is designed for speed. C++ will not only let you iterate beyond either end of an array, but it will let you change the values beyond either end of the array with sometimes catastrophic results.

The phrase, “be careful what you wish for” is a great one to remember when programming in C++. Because C++ will generally try to do everything you ask for.

The speed of C++ comes at the cost of minimal to no error checking. Sometimes this can have perplexing results such as in the next example.

You should use an array when you have a need for speed or you need to work with hardware constraints. Otherwise, you may want to consider using another collection data type, the vector.

Q-1: In the above example, what happened to otherdata[ ] in C++?

• Nothing. Everything is fine.
• Actually, there is a problem. Look carefully.
• All data was automatically reinitialized.
• No. C++ just does what you tell it to do.
• I have no idea. Please give me a hint.
• Try again. One of these is indeed correct. Look at the memory addresses.
• The first loop went out of bounds and wrote over the values in otherdata.
• Right!
• none of the above
• One of the above is indeed correct.

Q-2: What is the correct way to declare an array in C++?

• int myarray(5);
• Check the characters at the end of the array! Right now that is a function!
• myarray[5];
• You are forgetting something important!
• int myarray[5];
• Good work!
• None of the above.
• Check the characters at the end of the array!

1.10.2. Vectors¶

Vectors use a dynamically allocated array to store their elements, so they can change size, and they have other friendly features as well. Because they use a dynamically allocated array, they use contiguous storage locations which means that their elements can be accessed and traversed, and they can also be accessed randomly using indexes. However, vectors are dynamically sized, so their size can change automatically. A new element can be inserted into or deleted from any part of a vector, and automatic reallocation for other existing items in the vector will be applied. Vectors are homogeneous, so every element in the vector must be of the same type.

Vectors are a class that is available through a library called the Standard Template Library (STL), and one uses a < > notation to indicate the data type of the elements. In order to use vectors, One needs to include the vector library.

#include <vector>

Common C++ Vector Operators

Vector Operation

Use

Explanation

[ ]

myvector[i]

access value of element at index i

=

myvector[i]=value

assign value to element at index i

push_back

myvect.push_back(item)

Appends item to the far end of the vector

pop_back

myvect.pop_back()

Deletes last item (from far end) of the vector

insert

myvect.insert(i, item)

Inserts an item at index i

erase

myvect.erase(i)

Erases an element from index i

size

myvect.size()

Returns the actual size used by elements

capacity

myvect.capacity()

Returns the size of allocated storage capacity

reserve

myvect.reserve(amount)

Request a change in capacity to amount

A very common programming task is to grow a vector using the push_back() method to append to the vector as we see in the next example. Because vectors can change size, vectors typically allocate some extra storage to accommodate for possible growth. Thus the vector typically has an actual capacity greater than the storage size strictly needed to contain its elements.

1.10.2.1. Matching¶

Q-3: Match the Vector operations with their corresponding explination. Feedback shows incorrect matches.
• [ ]
• Accesses value of an element.
• =
• Assigns value to an element.
• push_back
• Appends item to the end of the vector.
• pop_back
• Deletes last item of the vector.
• insert
• Injects an item into the vector.
• erase
• Deletes an element from the choosen index.
• size
• Returns the actual capacity used by elements.
• capacity
• Returns the ammount of allocated storage space.
• reserve
• Request a change in space to amount

In the above example, the use of reserve was optional. However, it is a good idea to use it before growing a vector in this way because it will save time. Because vectors are stored in underlying arrays which require contiguous memory, every time the vector’s size gets too large for the capacity, the entire vector must be moved to a larger location in memory, and all that copying takes time. In a typical implementation, the capacity is doubled each time. as in the example that follows.

Remembering that C++ is designed for speed, not protection, we will likely not be surprised by the following:

Q-4: Which of the following is the biggest difference between a C++ array and a C++ vector?

• Vectors can change size.
• Yes, however, there are more benefits to using vectors.
• Vectors offer many more features and protections than arrays.
• Not all of the protections of arrays are offered by vectors; one can still iterate off of either end.
• Vectors don't use contiguous memory, so elements can be inserted.
• No. Although elements can be inserted in vectors, they do require contiguous memory.
• more than one of the above
• Right! Good job!
• none of the above
• One of the above is indeed correct.

Q-5: What good is the reserve method in a vector?

• Nothing. It is completely optional.
• It is optional but it does serve a purpose. Try again.
• Using it will save time if you know the maximum size needed.
• Right!
• It is required so memory can be allocated.
• It is not required.
• none of the above
• One of the above is indeed correct.

1.10.3. Strings¶

Strings are sequential collections of zero or more characters such as letters, numbers and other symbols. There are actually two types of strings in C++ . The C++ string or just string from the <string> library is the more modern type. The old style C-string which is essentially an array of char type. The char type itself is actually distinct from both types of strings.

char cppchar = 'a';   // char values use single quotes
string cppstring = "Hello World!";  // C++ strings use double quotes
char cstring[] = {"Hello World!"};    // C-string or char array uses double quotes


In older versions of C++, you must use a char array to work with filenames. In modern C++ (from C++11 onward), however, you can use a C++ string for everything. Since C++ strings are so much nicer, I would not recommend using C-strings unless you have a reason.

Because strings are sequences, all of the typical sequence operations work as you would expect. In addition, the string library offers a huge number of methods, some of the most useful of which are shown in Table 4.

Q-6: What is the correct definition of c-strings?

• An array of characters that ends with a terminating null character. i.e. \0
• Correct! a c-string is different from a string
• A sequential data structure consisting of zero or more characters
• Close, but that is the definition of a string, not a c-string
• A data structure consisting of an ordered collection of data elements of identical type in which each element can be identified by an array index.
• Sorry, thats not a string or a c-string
• sequence container storing data of a single type that is stored in a dynamically allocated array which can change in size.
• No, that's a vector
Table 4: String Methods Provided in C++

Method Name

Use

Explanation

[ ]

astring[i]

access value of character at index i

=

astring[i]=value

change value of character at index i

+

string1 + astring2

concatenate strings

append

astring.append(string)

Append to string the end of the string

push_back

astring.push_back(char)

Appends a character to the end of the string

pop_back

astring.pop_back()

Deletes the last character from the end of the string

insert

astring.insert(i, string)

Inserts a string at a specific index

erase

astring.erase(i, j)

Erases an element from one index to another

find

astring.find(item)

Returns the index of the first occurrence of item

size

astring.size()

Returns the size of the string

1.10.3.1. Matching¶

Q-7: Match the String operations with their corresponding explination. Feedback shows incorrect matches.
• [ ]
• Accesses value of an element.
• find
• Returns the index of the first occurrence of item.
• =
• Assigns value to an element.
• push_back
• Adjoins a character to the end of the string.
• pop_back
• Removes the last character from the end of the string.
• insert
• Injects a string at a specific index.
• erase
• Deletes an element from one index to another.
• size
• Returns the capacity of the string.
• +
• connects strings.
• append
• Adjoins a string to the end of the string.

Check your understanding by completing the following question.

Q-8: Drag each data type to its' corresponding C++ initialization syntax. Feedback shows incorrect matches.
• char
• 'a'
• char array
• {'a'}
• string
• "a"

1.10.4. Hash Tables¶

A hash table is a collection of associated pairs of items where each pair consists of a key and a value. Hash tables are often called the more general term map because the associated hash function “maps” the key to the value. Nevertheless, it is better to use the more precise term, hash table because other kinds of maps are sometimes implemented with a different underlying data structure.

Each hash table has a hash function which given the key as input to the hash function returns the location of the associated value as the output. This makes look up fast.

In C++, the unordered_map implements the hash table, and the <unordered_map> library must be included as follows:

#include <unordered_map>


The syntax for hash table access is much like we might expect except that instead of using the index of the item for look-up, we use the key. Hash tables can be initialized with key-value pairs and key-value pairs can also be added later as we see in the following example. In C++, the keyword first is used for the key, and second is used for the associated value.

It is important to note that hash tables are organized by the location given by the hash function rather than being in any particular order with respect to the keys. This makes look-up extremely fast. Hence, although it is possible to iterate through a hash table, it is an odd thing to do because the data is not typically stored sequentially. Iterators of an unordered_map are implemented using pointers to point to elements of the value type as we see in the following example.

Hash Tables have both methods and operators. Table 7 describes them, and the session shows them in action.

Table 7: Important Hash Table Operators Provided in C++

Operator

Use

Explanation

[ ]

mymap[k]

Returns the value associated with k, otherwise throws error

count

mymap.count(key)

Returns true if key is in mymap, false otherwise

erase

mymap.erase(key)

Removes the entry from mymap

begin

mymap.begin()

An iterator to the first element in mymap

end

mymap.end(key)

An iterator pointing to past-the-end element of mymap

1.10.4.1. Matching¶

Q-9: Match the Hash Table operations with their corresponding explination. Feedback shows incorrect matches.
• [ ]
• Returns the value associated with the key, otherwise throws error.
• erase
• Deletes the entry from the hash table.
• count
• Returns true if key is in the hash table, and false otherwise.
• begin
• An iterator to the first element in the hash table.
• end
• An iterator pointing to past-the-end element of the hash table.

1.10.5. Unordered Sets¶

An unordered_set is an unordered collection of zero or more unique C++ data values of a particular type. To use unordered_sets, you import unordered_set from the Standard template library with #include <unorderd_set>.

Unordered_sets allow for fast retrieval of individual elements based on their value. In an unordered_set, the value of an element is at the same time its key, that identifies it uniquely. Keys are immutable, therefore, the elements in an unordered_set cannot be modified once in the container - However, they can be inserted and removed.

Unordered sets do not allow duplicates and are initialized using comma-delimited values enclosed in curly braces. The collection can be assigned to a variable as shown below.

set<int> mySet = {3, 6, 4, 78, 10}


Unordered sets support a number of methods that should be familiar to those who have worked with sets in a mathematics setting. Table 6 provides a summary. Examples of their use follow.

Table 6: Methods Provided by Sets in C++

Method Name

Use

Explanation

union

set_union()

Returns a new set with all elements from both sets

intersection

set_intersection()

Returns a new set with only those elements common to both sets

difference

set_difference()

Returns a new set with all items from first set not in second

add

aset.insert(item)

remove

aset.erase(item)

Removes item from the set

clear

aset.clear()

Removes all elements from the set

The code below is an example of a program that can detect if a specific char is in an unordered set.

the find method used for a conditional in Checker compares each item in the set with the given parameter until there is a match. the set.find(letter) == set.end() section means that if find cannot find the letter before reaching the end of the set, then letter is not contained in the set.

1.10.5.1. Matching¶

Q-10: Match the Unordered Sets operations with their corresponding explination. Feedback shows incorrect matches.
• union
• Returns a new set with all elements from both sets.
• intersection
• Returns a new set with only those elements common to both sets.
• difference
• Returns a new set with all items from first set not in second.
• Adds item to the set.
• remove
• erases item from the set.
• clear
• Removes all elements from the set.

Q-11: Which C++ structure is the best choice for a group of ordered data of a fixed length?

• array
• Correct!
• hash table
• No. hash tables are not ordered.
• string
• A string would only work for character data. Try again.
• vector
• There is a better choice given that the group is fixed length
• more than one of the above
• Only of the above is best.
Q-12: Drag each data type to its' corresponding C++ initialization syntax. Feedback shows incorrect matches.
• Array
• {“What”, “am”, “I”, "am"}
• Set
• {“What”, “am”, “I”}
• String
• “What am I”
• Hash Table
• { {“What”, “am I”} }