Main

February 16, 2004 (Monday)

Chapter 6: Using library algorithms (cont'd)

There are several other functions in the <algorithm> header, some of which are demonstrated by the following program. The purpose of the program is to open two files containing a list of words -- /usr/share/dict/words, which contains a list of several dictionary words, and custom which is assume to contain words that the user wants included in the standard dictionary. After storing all the words from both files into a vector, it then extracts those words that end in x and that have at least one upper case character. It then partitions these words into short words (words with less than eight characters) and long words and displays them.

#include	<iostream>
#include	<algorithm>
#include	<vector>
#include	<fstream>
#include	<cctype>
#include	<stdexcept>

using std::vector;
using std::string;
using std::ifstream;
using std::endl;
using std::cerr;
using std::cout;
using std::invalid_argument;
using std::transform;

void read_words(const char *file, vector<string> &words)
{
	ifstream in(file);

	if (! in)
		throw invalid_argument("Invalid file \"" + string(file) + "\"");

	string word;
	while (in >> word)
		words.push_back(word);
}

bool no_uppercase_chars(const string &str)
{
	return find_if(str.begin(), str.end(), isupper) == str.end();
}

bool not_ends_in_x(const string &str)
{
	return *(str.end() - 1) != 'x';
}

bool short_string(const string &str)
{
	return str.size() < 8;
}

string to_uppercase(const string &str)
{
	// A better way to do this would be to use transform() ...
 	string	ret;
 	string::const_iterator	it;
 	for (it = str.begin(); it != str.end(); it++) 
 		ret += toupper(*it);
 	return ret;
}

void show_words(const vector<string> &words)
{
	vector<string>::const_iterator it;
	for (it = words.begin(); it != words.end(); it++) 
		cout << *it << endl;
}

int
main()
{
	vector<string> standard;
	vector<string> custom;

	try {
		read_words("/usr/share/dict/words", standard);
		read_words("custom", custom);
	} catch (invalid_argument e) {
		cerr << e.what() << endl;
		return -1;
	}

	// Create a new vector which consists of all the strings
	// in 'standard' and all the strings in 'custom'.
	//
	vector<string> words(standard);

	copy(custom.begin(), custom.end(), back_inserter(words));
	// or equivalently...
	//       words.insert(words.end(), custom.begin(), custom.end());

	// Remove all those strings from 'words' that do not end with
	// the letter 'x'.
	//
	words.erase(remove_if(words.begin(), words.end(), not_ends_in_x),
		    words.end());

	// or equivalently...
	// 	vector<string>::iterator start_erase;
	// 	start_erase = remove_if(words.begin(), words.end(),
	// 			        not_ends_in_x);
	// 	words.erase(start_erase, words.end());

	// Create a new vector 'capitalized' that will contain all those
	// strings in 'words' that contain at least one capitalized letter.
	// Note that vector 'words' is unmodified.
	//
	vector<string> capitalized;
	remove_copy_if(words.begin(), words.end(), back_inserter(capitalized),
			no_uppercase_chars);

	// Finally create a new vector 'capitalized' which contains all the
	// strings of 'words', but capitalized.  Note that we are doing
	// the transformation `in place'.  We could have sent the
	// output to another vector using a back_inserter().
	//
	transform(capitalized.begin(), capitalized.end(),
		  capitalized.begin(), to_uppercase);

	// Partition the 'capitalized' vector into short strings and
	// long strings.
	//
	vector<string>::iterator boundary =
		stable_partition(capitalized.begin(), capitalized.end(),
				 short_string);

	// Store the long words in its own vector and remove them
	// from the capitalized vector.
	//
	vector<string> long_words(boundary, capitalized.end());
	capitalized.erase(boundary, capitalized.end());

	// Show both vectors.
	//
	cout << "Short words ending in 'x'" << endl;
	show_words(capitalized);
	cout << endl << "Long words ending in 'x'" << endl;
	show_words(long_words);
}
alg.cpp

Using ifstream for file input

The <fstream> header contains the declaration for the ifstream object which can be used to open files in much the same way that fopen() opens files in C. Although the ifstream object is introduced in Chapter 10, its usage pretty straightforward. When creating our fstream object, in, in the read_words() function above, we supply the constructor with a char * value (not a String value). We then use the in object in a boolean context in order to determine if the file was successfully opened. If not, we throw an exception invalid_argument in this case and the execution of the read_words() function will terminate. The exception is caught and handled by the main() function.

If in denotes a file that was successfully opened, then read_words() will read each word from the file and store them in the supplied words vector. Note that we use the in object exactly as we use any other istream object (such as cin). For example, we can use the >> operator to do input.

The remove_if() function

After storing words from both the standard /usr/share/dict/words and custom files into the vector words, we use the remove_if() function to remove all those words that do not end in x from the words vector:

words.erase(remove_if(words.begin(), words.end(), not_ends_in_x),
	    words.end());

Note that there are two things happening here:

First, we call the remove_if() function. The first two iterator parameters denote a range of elements to test and the third function is a predicate that is used to test all elements in the given range. remove_if() will step through each of the elements from the beginning of the range up to (but not including) the position denoted by the second iterator. If the element does not satisfy the predicate, then it is left alone. If it does satisfy the predicate, then its position is remembered and remove_if() will copy to this location the next element in the vector that does not satisfy the predicate. The end effect is that all those elements that do not satisfy the predicate will be at the front part of the vector. Upon completion remove_if() function will return an iterator that denotes the position one past the initial portion of the vector that contains all the elements that did not satisfy the predicate.

Second, we then pass this iterator and the words.end() iterator to the erase() method to actually delete them. There are a couple of important points to remember here:

Note that the not_ends_in_x() predicate supplied to the remove_if() function uses the end() iterator as if it were a pointer. It subtracts one from this iterator and dereferences it in order to test the last character of the string.

The remove_copy_if() function

After we have removed all the words from the words vector that do not end in x, we create a new vector called capitalized that we will use to store all the elements from the words vector that have at least one uppercase letter.

In order to do this we use the remove_copy_if() function. This function takes four parameters: the first two, as usual, are iterator arguments that denote the range upon which the function operates, the third parameter denotes the place to which we want to store the resulting elements (we use the back_inserter() call which allows the remove_copy_if() function to append new elements to the initially empty capitalized vector). The final parameter to the remove_copy_if() function is a predicate function that denotes what elements are to be "removed" (i.e. not copied to our target vector) from the given range.

The remove_copy_if() function, like remove_if(), will iterate over all the elements in the given range. If it finds an element that does not satisfy the predicate, it will copy it to the destination container (the capitalized vector, in this case). If the element does satisfy the predicate, it is ignored. Note that unlike the remove_if() predicate, the remove_copy_if() predicate does no rearranging of the original vector -- the words vector will not be modified in any way by remove_copy_if().

The transform() function

Next, we want to change all the words in the capitalized vector so that all their respective letters are uppercase. We can do this with the transform() function which, again, takes two iterator arguments denoting the range and a third argument denoting where to store the results and a fourth argument which is a function to be applied to each element in the range. Note that in this case, instead of using a back_inserter() in order to store the results in yet another vector, we can use capitalized.begin() in order to store the results of the transformation back into the original vector.

Upon completion of the transform() function, all the words in the capitalized vector will now consist entirely of uppercase characters. Note that the to_uppercase function is implemented using a for loop. It could, however, also be implemented using another call to transform().

The stable_partition() function

Finally, we want to partition the capitalized vector into those strings that are shorter than eight characters and those strings that are eight characters or longer. This function will iterate over all the elements in the vector and move all those elements that satisfy the predicate to the front part of the vector; the rest will be placed in the back part of the vector. The function will return an iterator that denotes the start of the first element that did not satisfy the vector.

We create a new vector, called long_words and initialize it with the iterator returned by the stable_partition() function and the capitalized.end() iterator. This has the effect of storing all the words that were eight characters or longer in the long_words vector. We then remove these words from the original capitalized vector, leaving only the short words in this vector. Finally, we call the show_words() function for the capitalized and long_words vectors.

There is also a function called partition() which can also be used to partition a vector into two separate groups according to a predicate; however, it does not ensure that the original order of the elements within each group will be preserved.


Last modified: February 20, 2004 02:01:08 NST (Friday)