February 09 (Monday) February 13 (Friday)

February 11, 2004 (Wednesday)

Chapter 4: Organizing programs and data (cont'd)

Multifile compilation issues in C++ (K&M § 4.3 -- 4.5)

Although the program discussed during the last lecture was relatively small, we can break it up into several source modules which can be compiled separately and linked together to form a single executable. This is similar to what we did with the C program earlier. You can download and unzip the following zip file to see how the textbook modularizes the program across several modules.

Include files

As in C, #include files form the basis of `declaration communication' between source modules. Structure and function declarations are placed in header files -- any source modules that uses the structures and/or functions must then include the corresponding header file.

#include "filename.h"
...

We use the .h extension and enclose our own header files in quotation marks when including them in a source file. As in C, we include the header file in both the source module that defines the functions and the source modules that invoke the functions so that the consistency between the function invocation and function definition can be ensured.

One problem that can occur in C++ (and in C too, for that matter) is one of multiple inclusion, in which the same header file gets included multiple times, thereby resulting in potential problems during compilation. To get around this problem, we use a guard at the top of each header file that we create:

#ifndef GUARD_filename_h
#define GUARD_filename_h
...
#endif // GUARD_filename_h

This uses the preprocessor to test whether the GUARD_filename_h macro has been #defined. If not, then the header file is included, as normal, and the first thing that the include file does is to #define GUARD_filename_h so that if there is an attempt by the compiler to re-include this file (either directly or indirectly) in the same source module, the first #ifndef will prevent this. We must end the #ifndef with a corresponding #endif preprocessor directive at the end of the header file.

In header files, you should never use using declarations. Instead, within your header files you should always fully qualify the types and objects used (e.g. by prefacing all standard namespace entities with std::). All using declarations should be done only in the context of source modules where their effect is limited to the source file alone. If you put using declarations in your header file, you are making the tacit assumption that other developers who include your header files will also want the accompanying using declarations of your header file in their source module -- this is not always the case and can lead to naming conflicts.

Chapter 5: Using sequential containers and analyzing strings

This chapter discusses further operations that you can perform on vectors and how to iterate over the elements in a vector using iterators. Some of the efficiency concerns with vector objects are discussed and a new container, the list is also introduced.

Partitioning a `vector`

Say we want to partition our vector of Student_info objects into two separate groups: those that pass the course and those that fail. We introduce the fgrade() predicate function to determine whether or not a student has passed the course:

// predicate to determine whether a student failed
bool fgrade(const Student_info& s)
{
	return grade(s) < 60;
}
fgrade.cpp

This function calls the grade(const Student_info &) function and returns true if the student has a failing grade. The function has been added to the grade.cpp source module described earlier. We can then use this function to write another function that will partition a given vector of students into those who passed and those who failed:

#include <vector>
#include "Student_info.h"
#include "grade.h"

using std::vector;

// separate passing and failing student records: first try
vector<Student_info> extract_fails(vector<Student_info>& students)
{
	vector<Student_info> pass, fail;

	for (vector<Student_info>::size_type i = 0;
	     i != students.size(); ++i)
		if (fgrade(students[i]))
			fail.push_back(students[i]);
		else
			pass.push_back(students[i]);

	students = pass;
	return fail;
}
fails_vec1.cpp

The extract_fails() function iterates sequentially over all the elements in the students vector and calls the fgrade() function on each student. If the student has a failing grade, the student's record is stored in a fail vector; otherwise, it is stored in a pass vector -- both of these vectors are local to the extract_fails() function. After the loop has completed, the pass vector is assigned to the students parameter. This parameter was passed by reference, therefore, any changes to this parameter will be reflected by the calling function (main() in this case). Finally, the function returns the vector of students who failed the course as the return value for this function.

This function can be called in our main() program as follows.

#include <algorithm>
#include <vector>
#include <iostream>
#include <string>

#include "Student_info.h"
#include "grade.h"

//driver program for grade partitioning examples

using std::cin;
using std::cout;
using std::endl;
using std::sort;
using std::string;
using std::vector;

vector<Student_info> extract_fails(vector<Student_info>& v);

int main()
{
        vector<Student_info> vs;
        Student_info s;
        while (read(cin, s))
                vs.push_back(s);

        sort(vs.begin(), vs.end(), compare);

	vector<Student_info> fails = extract_fails(vs);

	for (vector<Student_info>::size_type i = 0; i < fails.size(); ++i)
		cout << fails[i].name << " " << grade(fails[i]) << endl;

	return 0;
}
ext_main.cpp

The `vector::erase()` method

Unfortunately, in our first attempt at the extract_fails() function, the pass and fail vectors hold duplicate copies of the elements in the original students vector. As a result, just prior to the students = pass; assignment, the function has essentially two copies of the vector storing the students. This is wasteful use of memory. We can do better by removing the actual failures from the list as we encounter them during our iteration as demonstrated below:

#include <vector>
#include "Student_info.h"
#include "grade.h"

using std::vector;

// second try: correct but potentially slow
vector<Student_info> extract_fails(vector<Student_info>& students)
{
	vector<Student_info> fail;
	vector<Student_info>::size_type i = 0;

	// invariant: elements [0,i) of `students' represent passing grades
	while (i != students.size()) {
		if (fgrade(students[i])) {
			fail.push_back(students[i]);
			students.erase(students.begin() + i);
		} else
			++i;
	}
	return fail;
}
fails_vec2.cpp

This above function will no longer keep two entire copies of the vector in memory. Instead, as the loop encounters a failing student, it will store the student's record in a fail vector and the remove that student record from the students vector. After the loop has finished, the students vector, which was passed by reference, will contain all the passing students and the fail vector will be returned by the function.

There are a few subtle points with the above code:

We cannot erase elements from a vector by merely supplying an index. Instead, we can compute an offset relative to the value returned by the students.begin() method. This method, which we saw earlier when sorting the list of students, returns a value that denotes the first element in the list. By adding i to this value, we will get a value that that denotes the ith element of the list. We can then supply this value to the erase() method of the students vector, thereby removing the element from the vector.
When we erase elements from the vector, all the elements after that erased element have to be moved back one place to fill in the gap left by the deleted element. As a result, after we erase() an element we should not increment our index variable i. If we did, then i would not reference the element after the one we just deleted. To ensure that i indexes the correct element, we increment i only when we did not remove an element from the vector.
As a result of having to shift elements during deletion of a vector element, this technique is quite slow if we have a high number of failing students in a very large class.
We must recompute the size() of the students vector on each iteration of the while loop. If we cached the number return by the size() method in another variable, say num, and then proceeded to delete students from the array, then num would no longer be an accurate count of how many students were in the vector. As a result, the while loop would run off the end of the students vector and would access elements in the vector that are out of bounds.

Iterators

Because extract_fails() is always accessing the elements sequentially, we can use another type called an iterator to access the student records individually and achieve the same behaviour as before. This is demonstrated by the following modified version of the extract_fails() function.

#include <vector>
#include "Student_info.h"
#include "grade.h"

using std::vector;

// version 3: iterators but no indexing; still potentially slow
vector<Student_info> extract_fails(vector<Student_info>& students)
{
	vector<Student_info> fail;
	vector<Student_info>::iterator iter = students.begin();

	while (iter != students.end()) {
		if (fgrade(*iter)) {
			fail.push_back(*iter);
			iter = students.erase(iter);
		} else
			++iter;
	}
	return fail;
}
fails_iters.cpp

Note that we have gotten rid of the i index variable entirely, but we are still able to access the elements of the vector sequentially. The use of an iterator rather than an index variable further reinforces the sequential nature of how we access the elements of the vector. You should note the following features of the above function:

In order to sequentially traverse the list, we create an iterator variable type called iter, as follows:
```
vector<Student_info>::iterator iter = students.begin();
```
The iterator is initialized to the first element (i.e. students.begin()). We then use this iterator to step through all the elements in the students vector by using a while loop.
If the student has a failing grade, we store it in the fail vector and then remove the student record from the students vector using erase(). Note that the erase() method takes the iterator as its argument. If the student didn't fail, we update the iterator by incrementing it. This moves the iterator to the next element in the vector. When iter reaches one passed the end of the vector, the conditional iter != students.end() will be false and the loop will terminate. Remember that students.end() denotes the position that is one passed the end of the vector.
To access the Student_info record denoted by the iterator, we can dereference the iterator variable in much the same way that we dereference a pointer. For example:
```
if (fgrade(*iter)) {
	fail.push_back(*iter);
	...
}
```
This is analogous to the previous version of extract_fails() when we used an index variable to access the elements of the vector:
```
if (fgrade(students[i])) {
	fail.push_back(students[i]);
	...
}
```
As another example, we can iterate over all the students in the vector and display their names as follows:
```
for (vector<Students_info>::const_iterator iter = students.begin();
     iter != students.end();
     ++iter) {
	cout << (*iter).name << endl;
}
```
As you recall from C, we must do the dereference before the dot operator due to precedence. C++ also supports C's -> operator to make the above syntax less clumsy:
```
	cout << iter->name << endl;
```
Note that we create a const_iterator, instead of a regular iterator because the above code fragment does not actually change the contents of the student vector. Whenever you are just reading the elements of a container (and not modifying the container), you should use a const_iterator.
After we erase() an element, our iterator (and any iterators denoting elements after it in the vector) become invalid. Fortunately, the erase() method will return an iterator value which denotes the element following the element we just deleted. Whenever we erase an element, we update iter by assigning it the return value from the erase() method.
```
iter = students.erase(iter);
```
As before, we must recompute the end() iterator on each iteration of the loop. If we had written the code as:
```
vector<Student_info>::iterator iter = students.begin(),
				     end_iter = students.end();

while (iter != end_iter) {
	...
```
then this introduces a subtle bug because whenever an element is erased from a vector, the iterator that denotes the erased element as well as all the iterators that denote elements after it (including end_iter in the above case) become invalid. Therefore, in the conditional of the while loop we reevaluate the end() iterator. Incidentally, whenever we add elements to a vector (using insert(), for example), the insertion may result in the entire vector being moved to another part of memory. As a result all iterators that denote elements in the original vector will become invalid.

The `list` type

While the above versions of extract_fails() are correct, there is a problem with its efficiency because erasure from (or insertion into) the middle of a vector can be very inefficient due to all the element shifting that must take place. Fortunately, the standard C++ library supports a list container which allows for rapid insertion into and deletion from anywhere in the container, thereby mitigating the inefficient behaviour of the vector.

Here is the main() function that changes the implementation to use a list container instead of a vector container:

#include <list>
#include <string>

#include "grade.h"
#include "Student_info.h"

using std::cin;
using std::cout;
using std::endl;
using std::list;
using std::string;

list<Student_info> extract_fails(list<Student_info>& v);

int main()
{
        list<Student_info> vs;
        Student_info s;

        while (read(cin, s))
                vs.push_back(s);

        vs.sort(compare);

	list<Student_info> fails = extract_fails(vs);

	list<Student_info>::iterator i;

	for (i = fails.begin(); i != fails.end(); ++i)
		cout << i->name << " " << grade(*i) << endl;

	return 0;
}
list_main.cpp

You'll note that the main() function is very similar to the original main() function that we saw earlier except that the vector has been replaced with a list and that we are using iterators instead of an index variable to step through the elements of the list containing the failing students. We cannot use the global list container to sort the list of students. However, the list type has its own sort() member function which can be used to sort the students stored in the vs list.

Here is the implementation of the extract_fails() function using the list container. Note, again, that its implementation is identical to the vector version that used iterators, except that vector has been replaced by list.

#include <list>
#include "Student_info.h"
#include "grade.h"

using std::list;
// version 4: use `list' instead of `vector'
list<Student_info> extract_fails(list<Student_info>& students)
{
	list<Student_info> fail;
	list<Student_info>::iterator iter = students.begin();

	while (iter != students.end()) {
		if (fgrade(*iter)) {
			fail.push_back(*iter);
			iter = students.erase(iter);
		} else
			++iter;
	}
	return fail;
}
fails_list.cpp

Unfortunately, while vectors allow random access via the indexing operator or by adding a number to an iterator, e.g.

students.begin() + i

the list type does not support such random access techniques.

The choice of whether to use a list or a vector container depends primarily on how the container is to be used. We'll discuss advantages and disadvantages of both containers in the next lecture.

Last modified: February 11, 2004 17:02:11 NST (Wednesday)