# February 27, 2004 (Friday)

## Chapter 6: Using library algorithms (conclusion)

Sections 6.2 and 6.3 of the textbook demonstrate some of the aspects of the algorithm functions discussed earlier using the the student grading example. Some of the important points in these sections include the following:

• Passing an overloaded function name to an `<algorithm>` function is not possible due to the ambiguity — you must disambiguate the function by creating another function with a unique name that calls the original function. This is typically known as a wrapper function. If the original function raised an exception, then the wrapper function would be a good place to catch the exception and handle it gracefully.

In the book's student grading example, there were three `grade()` functions:

```grade(const Student_Info &);
grade(double midterm, double final, double homework);
grade(double midterm, double final, const vector<double>& homework);
```
In the following code fragment, `grade_aux()` is a wrapper function for first `grade()` function above. Note that it catches the exception thrown by the original `grade()` function.

With the `grade_aux()` function defined, the `median_analysis()` function can then use the `transform()` function defined in the `<algorithm>` header to iterate over all the students in the vector, compute the grades for each student and store them in the local `grades` vector. The median of all the overall grades is returned by the function.

```double grade_aux(const Student_info& s)
{
try {
} catch (domain_error) {
}
}

// this version works fine
double median_analysis(const vector<Student_info>& students)
{

transform(students.begin(), students.end(),
}
a1.cpp
```
• We can pass functions to other functions as demonstrated by the `write_analysis()` function defined below.
```void write_analysis(ostream& out, const string& name,
double analysis(const vector<Student_info>&),
const vector<Student_info>& did,
const vector<Student_info>& didnt)
{
out << name << ": median(did) = " << analysis(did) <<
", median(didnt) = " << analysis(didnt) << endl;
}
a2.cpp
```

Note that the third argument is declared as:

```double analysis(const vector<Student_info>&)
```

In this context, this is declaring `analysis` to be a pointer to a function that returns a `double` and takes a constant reference to a `vector` of `Student_info` objects as a parameter. We can then call the `write_analysis()` function using the `median_analysis` function defined earlier:

```write_analysis(cout, "median", median_analysis, did, didnt);
```

Incidentally, this style of declaring pointers to functions in parameter lists works in some C compilers as well (e.g. `g++`).

• There is another header called `<numeric>` that contains a useful function called `accumulate()` that can be used for totalling numbers. For example:
```vector<double> v;
// ... store some numbers in v

// Compute the total.
double total = accumulate(v.begin(), v.end(), 0.0)
```

Note that the type of the last parameter, which denotes the starting value of the summation, is important — it will dictate the type of the result. Had we used `0` instead of `0.0` the result of the function would have been an `int`.

An important aspect that you should remember about the functions in the `<algorithm>` header is that they can rearrange or modify the elements within a container but they do not change the size of the container itself. Also notice how the functions in the `<algorithm>` header can make programming less tedious. Before you create a `for` or `while` loop to iterate over the elements in a container, you should first check if there is an appropriate `<algorithm>` function that can do the job for you.

## Chapter 7: Using associative containers

Up until now, all of our arrays have been indexed using integers. The C++ Standard Library provides an implementation of an array-like class that allows us to use almost any type as the index (or key) in an array. This class is known as the associative array or `map`. Using the `map` class, it is possible to create arrays that store values that can be indexed using non-integer values. One common type to use as the index is the `string`.

### A word counting program

Consider the following program which uses a `map` to keep track the number of times each word occurs on standard input:

```#include <iostream>
#include <map>
#include <string>

using std::cin;
using std::cout;
using std::endl;
using std::map;
using std::string;

int main()
{
string s;
map<string, int> counters; // store each word and an associated counter

// read the input, keeping track of each word and how often we see it
while (cin >> s)
++counters[s];

// write the words and associated counts
for (map<string, int>::const_iterator it = counters.begin();
it != counters.end(); ++it) {
cout << it->first << "\t" << it->second << endl;
}
return 0;
}
wc.cpp
```

#### Creating and storing data in a `map`

The code creates a `map` structure by specifying the type of the key used to index the elements of the `map` and the type of the elements (known as values) stored by the `map`. Note that unlike `vector`, the `map` class requires two type arguments:

```map<string, int> counters;
```

The `counters` object behaves similarly to an array except that instead of using integer indices, it uses strings. Each `string` index will map to a corresponding `int` value. For example, when we write:

```++counters[s];
```

the value of `counters[s]` will be an integer representing how many times we have encountered the word `s` in the input. If this is the first time we have encountered the word, then `counters[s]` will be default initialized to 0. The pre-increment operator will then increase this value by one, thereby keeping track of how many times the word has been encountered.

#### Iterating and display keys/values in a `map`

The technique used to iterate over a `map` is identical to that used for other containers: we create an iterator of the appropriate type

```map<string, int>::const_iterator
```

We use this iterator in a `for` loop to display the contents of the `counters` map. Notice how we display the data denoted by the iterator. Maps store their information in key/value pairs. As a result, when we dereference a `map` iterator, we get a `pair` as a result. The first element of this pair (which is the key, in this case), is denoted by `first`; and the second (which is the value associated with the key) is denoted by `second`. Therefore, we can access the key and the value by writing `it->first` and `it->second`, respectively, as is done in the body of the loop above. Note that the key value returned by `it->first` will always be `const`.

The elements in the `map` are stored in sorted key order so as to allow for efficient access of the values of the `map`. As a result, when we iterate over the `map`, the words will be displayed in alphabetical order. Any type that can be compared using the `<` operator can be used as the key of a `map` structure. Note that a `map` is different from a hash but they both perform a similar function. Unlike a `map`, a hash does not store its data in any particular order. We'll talk about hashes when we discuss Perl.

### A cross reference table

The following program demonstrates a more complicated type used as the value of a `map`. The program keeps track of the line numbers on which each word in standard input occurs. Because a word may occur on several lines, we must maintain a `vector` of integers for each word. This type forms the value of the map; the key is the word itself.

```#include <map>
#include <iostream>
#include <string>
#include <vector>

#include "split.h"

using std::cin;            using std::cout;
using std::endl;           using std::getline;
using std::istream;        using std::string;
using std::vector;         using std::map;

// find all the lines that refer to each word in the input
map<string, vector<int> >
xref(istream& in, vector<string> find_words(const string&) = split)
{
string line;
int line_number = 0;
map<string, vector<int> > ret;

while (getline(in, line)) {
++line_number;

// break the input line into words
vector<string> words = find_words(line);

// remember that each word occurs on the current line
for (vector<string>::const_iterator it = words.begin();
it != words.end(); ++it)
ret[*it].push_back(line_number);
}
return ret;
}

int main()
{
// call `xref' using `split' by default
map<string, vector<int> > ret = xref(cin);

// write the results
for (map<string, vector<int> >::const_iterator it = ret.begin();
it != ret.end(); ++it) {
// write the word
cout << it->first << " occurs on line(s): ";

// followed by one or more line numbers
vector<int>::const_iterator line_it = it->second.begin();
cout << *line_it;	// write the first line number

++line_it;
// write the rest of the line numbers, if any
while (line_it != it->second.end()) {
cout << ", " << *line_it;
++line_it;
}
// write a new line to separate each word from the next
cout << endl;
}

return 0;
}
xref.cpp
```

#### Syntax considerations

Note that the type of the `map` has a space separating the two right-angled brackets:

```map<string, vector<int> > ret = xref(cin);
^
```

This space is compulsory, since without it, the resulting `>>` token would be interpreted as a right shift operator.

The above definition defines `ret` to be a `map` type that associates a word (`string`) with a `vector` of integers. This vector of integers stores all the line numbers upon which the word occurs in standard input.

#### Default function arguments

When calling the `xref()` function, we have the option of not specifying the second argument (`find_words`) because the `xref()` function defines a default value for this argument:

```map<string, vector<int> >
xref(istream& in, vector<string> find_words(const string&) = split)
{ ...
```

If we don't specify a second argument when invoking this function:

```xref(cin);	// 'find_words' parameter is automatically set to 'split'
```

then the `find_words` argument will automatically be set to `split` by default. If we do specify a second argument:

```xref(cin, find_urls);	// 'find_words' parameter is set to 'find_urls'
```

then the `find_urls` argument would override the default argument and the `xref()` function would use the `find_urls` function to extract words from the standard input (the `words' in this case would actually be URLs).

If a function's argument has a default argument, then all arguments after it must have a default argument too. For example:

```int func(int i, string j = "default", double k = 1.23)
{
...
}

int main()
{
func(1);		// i = 1, j = "default", k = 1.23
func(2, "string");	// i = 2, j = "string", k = 1.23
func(3, "string", 4.5);	// i = 3, j = "string", k = 4.5
}
```

The following function definition is therefore invalid:

```int func(int i, string j = "default", double k)	// Invalid!  (k has no default)
{
...
}
```

It can be corrected by either giving the `k` parameter a default value or removing `j`'s default value or by swapping the positions of the `string` and `double` parameters.

#### Updating the cross references

In order to keep track of the line numbers on which each word occurs, the `xref()` function will read each line from standard input, incrementing a variable `line_number` each time. It then extracts all the words from the line using the `find_words()` parameter function. As it iterates over all these words (using a conventional iterator construct), it adds the current line number to the appropriate entry of the `map` using the following statement:

```ret[*it].push_back(line_number);
```

This statement dereferences the `it` iterator, yielding the current word (of type `string`) from the `words` vector under consideration. When used as an index (or key) in the `ret` map (i.e. `ret[*it]`), we get the corresponding value, which for this map is of type `vector<int>`. We then store the line number on the end of this vector using the `push_back()` method of the `vector` class.

### A simple spell checker (not in book)

The following code demonstrates a very elementary spell checker. It first calls the `read_dictionary()` function which reads in all the dictionary words from `/usr/share/dict/words` and stores the lowercased version of these words in the `dictionary` map which is then returned by this function. This map is used to associate strings with boolean values — a `true` value means that the word is in the dictionary. There are no `false` values stored in the `dictionary` map.

```#include	<iostream>
#include	<string>
#include	<map>
#include	<fstream>
#include	<stdexcept>
#include	<iomanip>

using std::string;		using std::map;
using std::ifstream;		using std::invalid_argument;
using std::cerr;		using std::cout;
using std::endl;		using std::setw;

const string dictionary_file = "/usr/share/dict/words";

string
lowercase(const string &str)
{
// This could be better done using transform().
string	ret;
string::const_iterator	it;
for (it = str.begin(); it != str.end(); it++)
ret += tolower(*it);
return ret;
}

map<string, bool>
{
ifstream in(dictionary_file.c_str());
map<string, bool> dictionary;

if (! in)
throw invalid_argument("Invalid: \"" + dictionary_file + "\"");

string word;
while (in >> word)
dictionary[lowercase(word)] = true;

return dictionary;
}

bool non_alpha(char c)
{
return !isalpha(c);
}

void
remove_non_alphas(string &word)
{
word.erase(remove_if(word.begin(), word.end(), non_alpha), word.end());
}

map<string, int>
spell_check_file(const char *file, const map<string, bool> &dictionary)
{
ifstream in(file);

if (! in)
throw invalid_argument("Invalid file \"" + string(file) + "\"");

string word;
map<string, int> misspellings;
while (in >> word) {
remove_non_alphas(word);
if (word.empty())
continue;
if (dictionary.find(lowercase(word)) == dictionary.end())
misspellings[word] ++;
}
return misspellings;
}

int
main(int argc, char **argv)
{
if (argc != 2) {
cerr << "Must specify a file name" << endl;
return 1;
}

try {
map<string, int> misspellings = spell_check_file(argv[1], dict);
map<string, int>::const_iterator it;
for (it = misspellings.begin(); it != misspellings.end(); it++) {
cout << setw(15) << it->first
<< " was misspelled "
<< it->second
<< " time" << (it->second == 1 ? "" : "s")
<< endl;
}
} catch (invalid_argument e) {
cerr << e.what() << endl;
return 1;
}

return 0;
}
sc.cpp
```

#### The `c_str()` method

For historical reasons, the `ifstream` constructor used in the `read_dictionary()` function takes a `char*` parameter and not a `string`. Fortunately, we can convert the `dictionary_file` string variable to a `char*` by using the `string` class's `c_str()` member function. This method is described in further detail in chapters 10 and 12 of the textbook.

Almost everything else in the above program uses code that we have seen before: The `lowercase()` function uses an iterator to convert a given string to lowercase (this could be more easily done using the `transform()` algorithm function). The `remove_non_alphas()` function uses the `erase()` algorithm function and the `remove_if()` function with the predicate `non_alpha()` to delete all nonalphabetic characters (e.g. numbers, punctuation marks etc.) from the given string.

#### The `empty()` method

After reading each word from the file and removing all non alphabetic characters, the `spell_check_file()` function uses the `empty()` method of the `string` class to determine if the string has any characters left. If not, then it reads in the next word. The `empty()` function is implemented by several other container classes including `vector` and `list` and simply returns `true` if the container doesn't have any elements or `false` otherwise.

#### Finding elements in a `map`

The purpose of the `spell_check_file()` function is to create and return a `map` that stores all the words that were misspelled in the file as well as how many times each word was misspelled. The most important thing to note about the `spell_check_file()` function is how it tests for the existence of a key within a map:

```if (dictionary.find(lowercase(word)) == dictionary.end()) { ...
```

The `find()` method will check for the existence of the given key in the map. If it is found it will return an iterator denoting the key/value pair within the `map`. Otherwise, it will return an iterator denoting one passed the end of the `map` (i.e. the `end()` iterator).

If you just want to read an element from a `map` without modifying the `map` itself, do not use the `[]` operator. Doing so would create a (default initialized) value for that key, thereby modifying the `map` when all you wanted to do was just read from it. For example, if instead of `find()` we had written:

```if (dictionary[lowercase(word)] != true) { ...
```

The the `dictionary[lowercase(word)]` expression would implicitly attempt to add a new key/value pair to the `dictionary` map (the key would be the lowercased version of `word` and the value would be the default boolean value of `false`). However, because the `dictionary` map was defined to be a `const` parameter to the `spell_check_file()` function, the above code would generate a compilation error. Therefore, when extracting key/value pairs from a `map` or when querying a `map` for the existence of a value for a given key, never use the `[]` operator — this operator should only be used for assigning elements to a `map`.

Note that another way to implement this program is to store all the dictionary words in a `vector` and then use the `find()` algorithm function (discussed in the previous chapter) to search for the words in the dictionary. However, this will be less efficient, because the `find()` function scans linearly over the container elements. Accessing strings that are keys in a `map` is more efficient.

### Generating sentences and `map` performance

Section 7.4 of the textbook discusses how to use `map`s to generate sentences given a grammar. This section demonstrates a slightly more complicated use of the `map` class. It also describes the concept of recursion (with which you should already be familiar) and the `do ... while ` loop construct, which we've seen before.

The chapter concludes with a section describing the relative merits of the `map` class when compared with hash tables. While `map`s can be somewhat slower that conventional hash tables, they are more than adequate for most purposes. Unlike hash tables, `map`s are not sensitive to the hashing function which hash tables must use to determine where a key/value pair is to be stored. Also, unlike a hash table, a `map` will store its key/value pairs in a strictly enforced order.