Main

February 11, 2004 (Wednesday)

Chapter 4: Organizing programs and data (cont'd)

Multifile compilation issues in C++ (K&M § 4.3 -- 4.5)

Although the program discussed during the last lecture was relatively small, we can break it up into several source modules which can be compiled separately and linked together to form a single executable. This is similar to what we did with the C program earlier. You can download and unzip the following zip file to see how the textbook modularizes the program across several modules.

Include files

As in C, #include files form the basis of `declaration communication' between source modules. Structure and function declarations are placed in header files -- any source modules that uses the structures and/or functions must then include the corresponding header file.

#include "filename.h"
...

We use the .h extension and enclose our own header files in quotation marks when including them in a source file. As in C, we include the header file in both the source module that defines the functions and the source modules that invoke the functions so that the consistency between the function invocation and function definition can be ensured.

One problem that can occur in C++ (and in C too, for that matter) is one of multiple inclusion, in which the same header file gets included multiple times, thereby resulting in potential problems during compilation. To get around this problem, we use a guard at the top of each header file that we create:

#ifndef GUARD_filename_h
#define GUARD_filename_h
...
#endif // GUARD_filename_h

This uses the preprocessor to test whether the GUARD_filename_h macro has been #defined. If not, then the header file is included, as normal, and the first thing that the include file does is to #define GUARD_filename_h so that if there is an attempt by the compiler to re-include this file (either directly or indirectly) in the same source module, the first #ifndef will prevent this. We must end the #ifndef with a corresponding #endif preprocessor directive at the end of the header file.

In header files, you should never use using declarations. Instead, within your header files you should always fully qualify the types and objects used (e.g. by prefacing all standard namespace entities with std::). All using declarations should be done only in the context of source modules where their effect is limited to the source file alone. If you put using declarations in your header file, you are making the tacit assumption that other developers who include your header files will also want the accompanying using declarations of your header file in their source module -- this is not always the case and can lead to naming conflicts.

Chapter 5: Using sequential containers and analyzing strings

This chapter discusses further operations that you can perform on vectors and how to iterate over the elements in a vector using iterators. Some of the efficiency concerns with vector objects are discussed and a new container, the list is also introduced.

Partitioning a vector

Say we want to partition our vector of Student_info objects into two separate groups: those that pass the course and those that fail. We introduce the fgrade() predicate function to determine whether or not a student has passed the course:

// predicate to determine whether a student failed
bool fgrade(const Student_info& s)
{
	return grade(s) < 60;
}
fgrade.cpp

This function calls the grade(const Student_info &) function and returns true if the student has a failing grade. The function has been added to the grade.cpp source module described earlier. We can then use this function to write another function that will partition a given vector of students into those who passed and those who failed:

#include <vector>
#include "Student_info.h"
#include "grade.h"

using std::vector;

// separate passing and failing student records: first try
vector<Student_info> extract_fails(vector<Student_info>& students)
{
	vector<Student_info> pass, fail;

	for (vector<Student_info>::size_type i = 0;
	     i != students.size(); ++i)
		if (fgrade(students[i]))
			fail.push_back(students[i]);
		else
			pass.push_back(students[i]);

	students = pass;
	return fail;
}
fails_vec1.cpp

The extract_fails() function iterates sequentially over all the elements in the students vector and calls the fgrade() function on each student. If the student has a failing grade, the student's record is stored in a fail vector; otherwise, it is stored in a pass vector -- both of these vectors are local to the extract_fails() function. After the loop has completed, the pass vector is assigned to the students parameter. This parameter was passed by reference, therefore, any changes to this parameter will be reflected by the calling function (main() in this case). Finally, the function returns the vector of students who failed the course as the return value for this function.

This function can be called in our main() program as follows.

#include <algorithm>
#include <vector>
#include <iostream>
#include <string>

#include "Student_info.h"
#include "grade.h"

//driver program for grade partitioning examples

using std::cin;
using std::cout;
using std::endl;
using std::sort;
using std::string;
using std::vector;

vector<Student_info> extract_fails(vector<Student_info>& v);

int main()
{
        vector<Student_info> vs;
        Student_info s;
        while (read(cin, s))
                vs.push_back(s);

        sort(vs.begin(), vs.end(), compare);

	vector<Student_info> fails = extract_fails(vs);

	for (vector<Student_info>::size_type i = 0; i < fails.size(); ++i)
		cout << fails[i].name << " " << grade(fails[i]) << endl;

	return 0;
}
ext_main.cpp

The vector::erase() method

Unfortunately, in our first attempt at the extract_fails() function, the pass and fail vectors hold duplicate copies of the elements in the original students vector. As a result, just prior to the students = pass; assignment, the function has essentially two copies of the vector storing the students. This is wasteful use of memory. We can do better by removing the actual failures from the list as we encounter them during our iteration as demonstrated below:

#include <vector>
#include "Student_info.h"
#include "grade.h"

using std::vector;

// second try: correct but potentially slow
vector<Student_info> extract_fails(vector<Student_info>& students)
{
	vector<Student_info> fail;
	vector<Student_info>::size_type i = 0;

	// invariant: elements [0,i) of `students' represent passing grades
	while (i != students.size()) {
		if (fgrade(students[i])) {
			fail.push_back(students[i]);
			students.erase(students.begin() + i);
		} else
			++i;
	}
	return fail;
}
fails_vec2.cpp

This above function will no longer keep two entire copies of the vector in memory. Instead, as the loop encounters a failing student, it will store the student's record in a fail vector and the remove that student record from the students vector. After the loop has finished, the students vector, which was passed by reference, will contain all the passing students and the fail vector will be returned by the function.

There are a few subtle points with the above code:

Iterators

Because extract_fails() is always accessing the elements sequentially, we can use another type called an iterator to access the student records individually and achieve the same behaviour as before. This is demonstrated by the following modified version of the extract_fails() function.

#include <vector>
#include "Student_info.h"
#include "grade.h"

using std::vector;

// version 3: iterators but no indexing; still potentially slow
vector<Student_info> extract_fails(vector<Student_info>& students)
{
	vector<Student_info> fail;
	vector<Student_info>::iterator iter = students.begin();

	while (iter != students.end()) {
		if (fgrade(*iter)) {
			fail.push_back(*iter);
			iter = students.erase(iter);
		} else
			++iter;
	}
	return fail;
}
fails_iters.cpp

Note that we have gotten rid of the i index variable entirely, but we are still able to access the elements of the vector sequentially. The use of an iterator rather than an index variable further reinforces the sequential nature of how we access the elements of the vector. You should note the following features of the above function:

The list type

While the above versions of extract_fails() are correct, there is a problem with its efficiency because erasure from (or insertion into) the middle of a vector can be very inefficient due to all the element shifting that must take place. Fortunately, the standard C++ library supports a list container which allows for rapid insertion into and deletion from anywhere in the container, thereby mitigating the inefficient behaviour of the vector.

Here is the main() function that changes the implementation to use a list container instead of a vector container:

#include <list>
#include <string>

#include "grade.h"
#include "Student_info.h"

using std::cin;
using std::cout;
using std::endl;
using std::list;
using std::string;

list<Student_info> extract_fails(list<Student_info>& v);

int main()
{
        list<Student_info> vs;
        Student_info s;

        while (read(cin, s))
                vs.push_back(s);

        vs.sort(compare);

	list<Student_info> fails = extract_fails(vs);

	list<Student_info>::iterator i;

	for (i = fails.begin(); i != fails.end(); ++i)
		cout << i->name << " " << grade(*i) << endl;

	return 0;
}
list_main.cpp

You'll note that the main() function is very similar to the original main() function that we saw earlier except that the vector has been replaced with a list and that we are using iterators instead of an index variable to step through the elements of the list containing the failing students. We cannot use the global list container to sort the list of students. However, the list type has its own sort() member function which can be used to sort the students stored in the vs list.

Here is the implementation of the extract_fails() function using the list container. Note, again, that its implementation is identical to the vector version that used iterators, except that vector has been replaced by list.

#include <list>
#include "Student_info.h"
#include "grade.h"

using std::list;
// version 4: use `list' instead of `vector'
list<Student_info> extract_fails(list<Student_info>& students)
{
	list<Student_info> fail;
	list<Student_info>::iterator iter = students.begin();

	while (iter != students.end()) {
		if (fgrade(*iter)) {
			fail.push_back(*iter);
			iter = students.erase(iter);
		} else
			++iter;
	}
	return fail;
}
fails_list.cpp

Unfortunately, while vectors allow random access via the indexing operator or by adding a number to an iterator, e.g.

students.begin() + i

the list type does not support such random access techniques.

The choice of whether to use a list or a vector container depends primarily on how the container is to be used. We'll discuss advantages and disadvantages of both containers in the next lecture.


Last modified: February 11, 2004 17:02:11 NST (Wednesday)