Main

February 27, 2004 (Friday)

Chapter 6: Using library algorithms (conclusion)

Comparing Grading Schemes

Sections 6.2 and 6.3 of the textbook demonstrate some of the aspects of the algorithm functions discussed earlier using the the student grading example. Some of the important points in these sections include the following:

An important aspect that you should remember about the functions in the <algorithm> header is that they can rearrange or modify the elements within a container but they do not change the size of the container itself. Also notice how the functions in the <algorithm> header can make programming less tedious. Before you create a for or while loop to iterate over the elements in a container, you should first check if there is an appropriate <algorithm> function that can do the job for you.

Chapter 7: Using associative containers

Up until now, all of our arrays have been indexed using integers. The C++ Standard Library provides an implementation of an array-like class that allows us to use almost any type as the index (or key) in an array. This class is known as the associative array or map. Using the map class, it is possible to create arrays that store values that can be indexed using non-integer values. One common type to use as the index is the string.

A word counting program

Consider the following program which uses a map to keep track the number of times each word occurs on standard input:

#include <iostream>
#include <map>
#include <string>

using std::cin;
using std::cout;
using std::endl;
using std::map;
using std::string;

int main()
{
	string s;
	map<string, int> counters; // store each word and an associated counter

	// read the input, keeping track of each word and how often we see it
	while (cin >> s)
		++counters[s];

	// write the words and associated counts
	for (map<string, int>::const_iterator it = counters.begin();
	     it != counters.end(); ++it) {
		cout << it->first << "\t" << it->second << endl;
	}
	return 0;
}
wc.cpp

Creating and storing data in a map

The code creates a map structure by specifying the type of the key used to index the elements of the map and the type of the elements (known as values) stored by the map. Note that unlike vector, the map class requires two type arguments:

map<string, int> counters;

The counters object behaves similarly to an array except that instead of using integer indices, it uses strings. Each string index will map to a corresponding int value. For example, when we write:

++counters[s];

the value of counters[s] will be an integer representing how many times we have encountered the word s in the input. If this is the first time we have encountered the word, then counters[s] will be default initialized to 0. The pre-increment operator will then increase this value by one, thereby keeping track of how many times the word has been encountered.

Iterating and display keys/values in a map

The technique used to iterate over a map is identical to that used for other containers: we create an iterator of the appropriate type

map<string, int>::const_iterator

We use this iterator in a for loop to display the contents of the counters map. Notice how we display the data denoted by the iterator. Maps store their information in key/value pairs. As a result, when we dereference a map iterator, we get a pair as a result. The first element of this pair (which is the key, in this case), is denoted by first; and the second (which is the value associated with the key) is denoted by second. Therefore, we can access the key and the value by writing it->first and it->second, respectively, as is done in the body of the loop above. Note that the key value returned by it->first will always be const.

The elements in the map are stored in sorted key order so as to allow for efficient access of the values of the map. As a result, when we iterate over the map, the words will be displayed in alphabetical order. Any type that can be compared using the < operator can be used as the key of a map structure. Note that a map is different from a hash but they both perform a similar function. Unlike a map, a hash does not store its data in any particular order. We'll talk about hashes when we discuss Perl.

A cross reference table

The following program demonstrates a more complicated type used as the value of a map. The program keeps track of the line numbers on which each word in standard input occurs. Because a word may occur on several lines, we must maintain a vector of integers for each word. This type forms the value of the map; the key is the word itself.

#include <map>
#include <iostream>
#include <string>
#include <vector>

#include "split.h"

using std::cin;            using std::cout;
using std::endl;           using std::getline;
using std::istream;        using std::string;
using std::vector;         using std::map;

// find all the lines that refer to each word in the input
map<string, vector<int> >
xref(istream& in, vector<string> find_words(const string&) = split)
{
	string line;
	int line_number = 0;
	map<string, vector<int> > ret;

	// read the next line
	while (getline(in, line)) {
		++line_number;

		// break the input line into words
		vector<string> words = find_words(line);

		// remember that each word occurs on the current line
		for (vector<string>::const_iterator it = words.begin();
		     it != words.end(); ++it)
			ret[*it].push_back(line_number);
	}
	return ret;
}

int main()
{
	// call `xref' using `split' by default
	map<string, vector<int> > ret = xref(cin);

	// write the results
	for (map<string, vector<int> >::const_iterator it = ret.begin();
	     it != ret.end(); ++it) {
		// write the word
		cout << it->first << " occurs on line(s): ";

		// followed by one or more line numbers
		vector<int>::const_iterator line_it = it->second.begin();
		cout << *line_it;	// write the first line number

		++line_it;
		// write the rest of the line numbers, if any
		while (line_it != it->second.end()) {
			cout << ", " << *line_it;
			++line_it;
		}
		// write a new line to separate each word from the next
		cout << endl;
	}

	return 0;
}
xref.cpp

Syntax considerations

Note that the type of the map has a space separating the two right-angled brackets:

map<string, vector<int> > ret = xref(cin);
                       ^

This space is compulsory, since without it, the resulting >> token would be interpreted as a right shift operator.

The above definition defines ret to be a map type that associates a word (string) with a vector of integers. This vector of integers stores all the line numbers upon which the word occurs in standard input.

Default function arguments

When calling the xref() function, we have the option of not specifying the second argument (find_words) because the xref() function defines a default value for this argument:

map<string, vector<int> >
xref(istream& in, vector<string> find_words(const string&) = split)
{ ...

If we don't specify a second argument when invoking this function:

xref(cin);	// 'find_words' parameter is automatically set to 'split'

then the find_words argument will automatically be set to split by default. If we do specify a second argument:

xref(cin, find_urls);	// 'find_words' parameter is set to 'find_urls'

then the find_urls argument would override the default argument and the xref() function would use the find_urls function to extract words from the standard input (the `words' in this case would actually be URLs).

If a function's argument has a default argument, then all arguments after it must have a default argument too. For example:

int func(int i, string j = "default", double k = 1.23)
{
	...
}

int main()
{
	func(1);		// i = 1, j = "default", k = 1.23
	func(2, "string");	// i = 2, j = "string", k = 1.23
	func(3, "string", 4.5);	// i = 3, j = "string", k = 4.5
}

The following function definition is therefore invalid:

int func(int i, string j = "default", double k)	// Invalid!  (k has no default)
{
	...
}

It can be corrected by either giving the k parameter a default value or removing j's default value or by swapping the positions of the string and double parameters.

Updating the cross references

In order to keep track of the line numbers on which each word occurs, the xref() function will read each line from standard input, incrementing a variable line_number each time. It then extracts all the words from the line using the find_words() parameter function. As it iterates over all these words (using a conventional iterator construct), it adds the current line number to the appropriate entry of the map using the following statement:

ret[*it].push_back(line_number);

This statement dereferences the it iterator, yielding the current word (of type string) from the words vector under consideration. When used as an index (or key) in the ret map (i.e. ret[*it]), we get the corresponding value, which for this map is of type vector<int>. We then store the line number on the end of this vector using the push_back() method of the vector class.

A simple spell checker (not in book)

The following code demonstrates a very elementary spell checker. It first calls the read_dictionary() function which reads in all the dictionary words from /usr/share/dict/words and stores the lowercased version of these words in the dictionary map which is then returned by this function. This map is used to associate strings with boolean values — a true value means that the word is in the dictionary. There are no false values stored in the dictionary map.

#include	<iostream>
#include	<string>
#include	<map>
#include	<fstream>
#include	<stdexcept>
#include	<iomanip>

using std::string;		using std::map;
using std::ifstream;		using std::invalid_argument;
using std::cerr;		using std::cout;
using std::endl;		using std::setw;

const string dictionary_file = "/usr/share/dict/words";

string
lowercase(const string &str)
{
	// This could be better done using transform().
	string	ret;
	string::const_iterator	it;
	for (it = str.begin(); it != str.end(); it++) 
		ret += tolower(*it);
	return ret;
}

map<string, bool>
read_dictionary()
{
	ifstream in(dictionary_file.c_str());
	map<string, bool> dictionary;

	if (! in)
		throw invalid_argument("Invalid: \"" + dictionary_file + "\"");

	string word;
	while (in >> word)
		dictionary[lowercase(word)] = true;

	return dictionary;
}

bool non_alpha(char c)
{
	return !isalpha(c);
}

void
remove_non_alphas(string &word)
{
	word.erase(remove_if(word.begin(), word.end(), non_alpha), word.end());
}

map<string, int>
spell_check_file(const char *file, const map<string, bool> &dictionary)
{
	ifstream in(file);

	if (! in)
		throw invalid_argument("Invalid file \"" + string(file) + "\"");

	string word;
	map<string, int> misspellings;
	while (in >> word) {
		remove_non_alphas(word);
		if (word.empty())
			continue;
		if (dictionary.find(lowercase(word)) == dictionary.end())
			misspellings[word] ++;
	}
	return misspellings;
}

int
main(int argc, char **argv)
{
	if (argc != 2) {
		cerr << "Must specify a file name" << endl;
		return 1;
	}

	try {
		map<string, bool> dict = read_dictionary();
		map<string, int> misspellings = spell_check_file(argv[1], dict);
		map<string, int>::const_iterator it;
		for (it = misspellings.begin(); it != misspellings.end(); it++) {
			cout << setw(15) << it->first
			     << " was misspelled "
			     << it->second
			     << " time" << (it->second == 1 ? "" : "s")
			     << endl;
		}
	} catch (invalid_argument e) {
		cerr << e.what() << endl;
		return 1;
	}

	return 0;
}
sc.cpp

The c_str() method

For historical reasons, the ifstream constructor used in the read_dictionary() function takes a char* parameter and not a string. Fortunately, we can convert the dictionary_file string variable to a char* by using the string class's c_str() member function. This method is described in further detail in chapters 10 and 12 of the textbook.

Almost everything else in the above program uses code that we have seen before: The lowercase() function uses an iterator to convert a given string to lowercase (this could be more easily done using the transform() algorithm function). The remove_non_alphas() function uses the erase() algorithm function and the remove_if() function with the predicate non_alpha() to delete all nonalphabetic characters (e.g. numbers, punctuation marks etc.) from the given string.

The empty() method

After reading each word from the file and removing all non alphabetic characters, the spell_check_file() function uses the empty() method of the string class to determine if the string has any characters left. If not, then it reads in the next word. The empty() function is implemented by several other container classes including vector and list and simply returns true if the container doesn't have any elements or false otherwise.

Finding elements in a map

The purpose of the spell_check_file() function is to create and return a map that stores all the words that were misspelled in the file as well as how many times each word was misspelled. The most important thing to note about the spell_check_file() function is how it tests for the existence of a key within a map:

if (dictionary.find(lowercase(word)) == dictionary.end()) { ...

The find() method will check for the existence of the given key in the map. If it is found it will return an iterator denoting the key/value pair within the map. Otherwise, it will return an iterator denoting one passed the end of the map (i.e. the end() iterator).

If you just want to read an element from a map without modifying the map itself, do not use the [] operator. Doing so would create a (default initialized) value for that key, thereby modifying the map when all you wanted to do was just read from it. For example, if instead of find() we had written:

if (dictionary[lowercase(word)] != true) { ...

The the dictionary[lowercase(word)] expression would implicitly attempt to add a new key/value pair to the dictionary map (the key would be the lowercased version of word and the value would be the default boolean value of false). However, because the dictionary map was defined to be a const parameter to the spell_check_file() function, the above code would generate a compilation error. Therefore, when extracting key/value pairs from a map or when querying a map for the existence of a value for a given key, never use the [] operator — this operator should only be used for assigning elements to a map.

Note that another way to implement this program is to store all the dictionary words in a vector and then use the find() algorithm function (discussed in the previous chapter) to search for the words in the dictionary. However, this will be less efficient, because the find() function scans linearly over the container elements. Accessing strings that are keys in a map is more efficient.

Generating sentences and map performance

Section 7.4 of the textbook discusses how to use maps to generate sentences given a grammar. This section demonstrates a slightly more complicated use of the map class. It also describes the concept of recursion (with which you should already be familiar) and the do ... while loop construct, which we've seen before.

The chapter concludes with a section describing the relative merits of the map class when compared with hash tables. While maps can be somewhat slower that conventional hash tables, they are more than adequate for most purposes. Unlike hash tables, maps are not sensitive to the hashing function which hash tables must use to determine where a key/value pair is to be stored. Also, unlike a hash table, a map will store its key/value pairs in a strictly enforced order.


Last modified: February 27, 2004 15:04:17 NST (Friday)