February 20 (Friday) March 01 (Monday)
Sections 6.2 and 6.3 of the textbook demonstrate some of the aspects of the algorithm functions discussed earlier using the the student grading example. Some of the important points in these sections include the following:
<algorithm>
function is not possible due to the ambiguity — you must disambiguate the
function by creating another function with a unique name that calls the
original function. This is typically known as a wrapper function. If the
original function raised an exception, then the wrapper function would
be a good place to catch the exception and handle it gracefully.
In the book's student grading example, there were three grade()
functions:
In the following code fragment,grade(const Student_Info &); grade(double midterm, double final, double homework); grade(double midterm, double final, const vector<double>& homework);
grade_aux()
is a wrapper
function for first grade()
function above. Note that it
catches the exception thrown by the original grade()
function.
With the grade_aux()
function defined,
the median_analysis()
function can then
use the transform()
function defined in the
<algorithm>
header to iterate over all the students
in the vector, compute the grades for each student and store them in the
local grades
vector. The median of all the overall grades
is returned by the function.
double grade_aux(const Student_info& s)
{
try {
return grade(s);
} catch (domain_error) {
return grade(s.midterm, s.final, 0);
}
}
// this version works fine
double median_analysis(const vector<Student_info>& students)
{
vector<double> grades;
transform(students.begin(), students.end(),
back_inserter(grades), grade_aux);
return median(grades);
}
a1.cpp
write_analysis()
function defined below.
void write_analysis(ostream& out, const string& name,
double analysis(const vector<Student_info>&),
const vector<Student_info>& did,
const vector<Student_info>& didnt)
{
out << name << ": median(did) = " << analysis(did) <<
", median(didnt) = " << analysis(didnt) << endl;
}
a2.cpp
Note that the third argument is declared as:
double analysis(const vector<Student_info>&)
In this context, this is declaring analysis
to be a pointer
to a function that returns a double
and takes a constant
reference to a vector
of Student_info
objects
as a parameter. We can then call the write_analysis()
function using the median_analysis
function defined earlier:
write_analysis(cout, "median", median_analysis, did, didnt);
Incidentally, this style of declaring pointers to functions in parameter
lists works in some C compilers as well (e.g. g++
).
<numeric>
that contains
a useful function called accumulate()
that can be used
for totalling numbers. For example:
vector<double> v; // ... store some numbers in v // Compute the total. double total = accumulate(v.begin(), v.end(), 0.0)
Note that the type of the last parameter, which denotes the starting
value of the summation, is important — it will dictate the type of
the result. Had we used 0
instead of 0.0
the result of the function would have been an int
.
An important aspect that you should remember about the functions in
the <algorithm>
header is that they can rearrange
or modify the elements within a container but they do not change the
size of the container itself. Also notice how the functions in the
<algorithm>
header can make programming less tedious.
Before you create a for
or while
loop to
iterate over the elements in a container, you should first check if
there is an appropriate <algorithm>
function that
can do the job for you.
Up until now, all of our arrays have been indexed using integers.
The C++ Standard Library provides an implementation of an array-like class
that allows us to use almost any type as the index (or key) in an array.
This class is known as the associative array or map
.
Using the map
class, it is possible to create arrays that
store values that can be indexed using non-integer values. One common
type to use as the index is the string
.
Consider the following program which uses a map
to keep
track the number of times each word occurs on standard input:
#include <iostream>
#include <map>
#include <string>
using std::cin;
using std::cout;
using std::endl;
using std::map;
using std::string;
int main()
{
string s;
map<string, int> counters; // store each word and an associated counter
// read the input, keeping track of each word and how often we see it
while (cin >> s)
++counters[s];
// write the words and associated counts
for (map<string, int>::const_iterator it = counters.begin();
it != counters.end(); ++it) {
cout << it->first << "\t" << it->second << endl;
}
return 0;
}
wc.cpp
map
The code creates a map
structure by specifying the type
of the key used to index the elements of the map
and the type of the elements (known as values) stored by
the map
. Note that unlike vector
, the
map
class requires two type arguments:
map<string, int> counters;
The counters
object behaves similarly to an array
except that instead of using integer indices, it uses strings. Each
string
index will map to a corresponding int
value. For example, when we write:
++counters[s];
the value of counters[s]
will be an integer representing
how many times we have encountered the word s
in the
input. If this is the first time we have encountered the word,
then counters[s]
will be default initialized to 0.
The pre-increment operator will then increase this value by one,
thereby keeping track of how many times the word has been encountered.
map
The technique used to iterate over a map
is identical to that
used for other containers: we create an iterator of the appropriate
type
map<string, int>::const_iterator
We use this iterator in a for
loop to display the contents of
the counters
map. Notice how we display the data denoted
by the iterator. Maps store their information in key/value pairs.
As a result, when we dereference a map
iterator, we
get a pair
as a result. The first element of this pair
(which is the key, in this case), is denoted by first
;
and the second (which is the value associated with the key) is denoted
by second
. Therefore, we can access the key and the value
by writing it->first
and it->second
,
respectively, as is done in the body of the loop above. Note that
the key value returned by it->first
will always be
const
.
The elements in the map
are stored in sorted key order so
as to allow for efficient access of the values of the map
.
As a result, when we iterate over the map
, the words will be
displayed in alphabetical order. Any type that can be compared using the
<
operator can be used as the key of a map
structure. Note that a map
is different from a hash
but they both perform a similar function. Unlike a map
,
a hash does not store its data in any particular order. We'll talk
about hashes when we discuss Perl.
The following program demonstrates a more complicated type used
as the value of a map
. The program keeps track of
the line numbers on which each word in standard input occurs.
Because a word may occur on several lines, we must maintain a
vector
of integers for each word. This type forms
the value of the map; the key is the word itself.
#include <map>
#include <iostream>
#include <string>
#include <vector>
#include "split.h"
using std::cin; using std::cout;
using std::endl; using std::getline;
using std::istream; using std::string;
using std::vector; using std::map;
// find all the lines that refer to each word in the input
map<string, vector<int> >
xref(istream& in, vector<string> find_words(const string&) = split)
{
string line;
int line_number = 0;
map<string, vector<int> > ret;
// read the next line
while (getline(in, line)) {
++line_number;
// break the input line into words
vector<string> words = find_words(line);
// remember that each word occurs on the current line
for (vector<string>::const_iterator it = words.begin();
it != words.end(); ++it)
ret[*it].push_back(line_number);
}
return ret;
}
int main()
{
// call `xref' using `split' by default
map<string, vector<int> > ret = xref(cin);
// write the results
for (map<string, vector<int> >::const_iterator it = ret.begin();
it != ret.end(); ++it) {
// write the word
cout << it->first << " occurs on line(s): ";
// followed by one or more line numbers
vector<int>::const_iterator line_it = it->second.begin();
cout << *line_it; // write the first line number
++line_it;
// write the rest of the line numbers, if any
while (line_it != it->second.end()) {
cout << ", " << *line_it;
++line_it;
}
// write a new line to separate each word from the next
cout << endl;
}
return 0;
}
xref.cpp
Note that the type of the map
has a space separating the
two right-angled brackets:
map<string, vector<int> > ret = xref(cin); ^
This space is compulsory, since without it, the resulting
>>
token would be interpreted as a right shift operator.
The above definition defines ret
to be a map
type that associates a word (string
) with a vector
of integers. This vector of integers stores all the line numbers upon
which the word occurs in standard input.
When calling the xref()
function, we have the option of
not specifying the second argument (find_words
) because the
xref()
function defines a default value for this argument:
map<string, vector<int> > xref(istream& in, vector<string> find_words(const string&) = split) { ...
If we don't specify a second argument when invoking this function:
xref(cin); // 'find_words' parameter is automatically set to 'split'
then the find_words
argument will automatically be set to
split
by default. If we do specify a second argument:
xref(cin, find_urls); // 'find_words' parameter is set to 'find_urls'
then the find_urls
argument would override the default argument
and the xref()
function would use the find_urls
function to extract words from the standard input (the `words' in this
case would actually be URLs).
If a function's argument has a default argument, then all arguments after it must have a default argument too. For example:
int func(int i, string j = "default", double k = 1.23) { ... } int main() { func(1); // i = 1, j = "default", k = 1.23 func(2, "string"); // i = 2, j = "string", k = 1.23 func(3, "string", 4.5); // i = 3, j = "string", k = 4.5 }
The following function definition is therefore invalid:
int func(int i, string j = "default", double k) // Invalid! (k has no default) { ... }
It can be corrected by either giving the k
parameter a
default value or removing j
's default value or by swapping
the positions of the string
and double
parameters.
In order to keep track of the line numbers on which each word occurs,
the xref()
function will read each line from standard input,
incrementing a variable line_number
each time. It then
extracts all the words from the line using the find_words()
parameter function. As it iterates over all these words (using a
conventional iterator construct), it adds the current line number to the
appropriate entry of the map
using the following statement:
ret[*it].push_back(line_number);
This statement dereferences the it
iterator, yielding the
current word (of type string
) from the words
vector under consideration. When used as an index (or key) in
the ret
map (i.e. ret[*it]
),
we get the corresponding value, which for this map is of type
vector<int>
. We then store the line number on the
end of this vector using the push_back()
method of the
vector
class.
The following code demonstrates a very elementary spell checker. It first
calls the read_dictionary()
function which reads in all
the dictionary words from /usr/share/dict/words
and stores
the lowercased version of these words in the dictionary
map
which is then returned by this function. This map is used to associate
strings with boolean values — a true
value means that
the word is in the dictionary. There are no false
values stored in the dictionary
map.
#include <iostream>
#include <string>
#include <map>
#include <fstream>
#include <stdexcept>
#include <iomanip>
using std::string; using std::map;
using std::ifstream; using std::invalid_argument;
using std::cerr; using std::cout;
using std::endl; using std::setw;
const string dictionary_file = "/usr/share/dict/words";
string
lowercase(const string &str)
{
// This could be better done using transform().
string ret;
string::const_iterator it;
for (it = str.begin(); it != str.end(); it++)
ret += tolower(*it);
return ret;
}
map<string, bool>
read_dictionary()
{
ifstream in(dictionary_file.c_str());
map<string, bool> dictionary;
if (! in)
throw invalid_argument("Invalid: \"" + dictionary_file + "\"");
string word;
while (in >> word)
dictionary[lowercase(word)] = true;
return dictionary;
}
bool non_alpha(char c)
{
return !isalpha(c);
}
void
remove_non_alphas(string &word)
{
word.erase(remove_if(word.begin(), word.end(), non_alpha), word.end());
}
map<string, int>
spell_check_file(const char *file, const map<string, bool> &dictionary)
{
ifstream in(file);
if (! in)
throw invalid_argument("Invalid file \"" + string(file) + "\"");
string word;
map<string, int> misspellings;
while (in >> word) {
remove_non_alphas(word);
if (word.empty())
continue;
if (dictionary.find(lowercase(word)) == dictionary.end())
misspellings[word] ++;
}
return misspellings;
}
int
main(int argc, char **argv)
{
if (argc != 2) {
cerr << "Must specify a file name" << endl;
return 1;
}
try {
map<string, bool> dict = read_dictionary();
map<string, int> misspellings = spell_check_file(argv[1], dict);
map<string, int>::const_iterator it;
for (it = misspellings.begin(); it != misspellings.end(); it++) {
cout << setw(15) << it->first
<< " was misspelled "
<< it->second
<< " time" << (it->second == 1 ? "" : "s")
<< endl;
}
} catch (invalid_argument e) {
cerr << e.what() << endl;
return 1;
}
return 0;
}
sc.cpp
c_str()
method
For historical reasons, the ifstream
constructor used in
the read_dictionary()
function takes a char*
parameter and not a string
. Fortunately, we can convert
the dictionary_file
string variable to a char*
by using the string
class's c_str()
member
function. This method is described in further detail in chapters 10
and 12 of the textbook.
Almost everything else in the above program uses code that we have seen
before: The lowercase()
function uses an iterator
to convert a given string to lowercase (this could be more
easily done using the transform()
algorithm function). The
remove_non_alphas()
function uses the erase()
algorithm function and the remove_if()
function with the
predicate non_alpha()
to delete all nonalphabetic characters
(e.g. numbers, punctuation marks etc.) from the given
string.
empty()
method
After reading each word from the file and removing all non alphabetic
characters, the spell_check_file()
function uses the
empty()
method of the string
class to determine
if the string has any characters left. If not, then it reads in the next
word. The empty()
function is implemented by several other
container classes including vector
and list
and simply returns true
if the container doesn't have
any elements or false
otherwise.
map
The purpose of the spell_check_file()
function is to
create and return a map
that stores all the words
that were misspelled in the file as well as how many times each
word was misspelled. The most important thing to note about the
spell_check_file()
function is how it tests for the
existence of a key within a map:
if (dictionary.find(lowercase(word)) == dictionary.end()) { ...
The find()
method will check for the existence of the given
key in the map. If it is found it will return an iterator denoting
the key/value pair within the map
. Otherwise, it will
return an iterator denoting one passed the end of the map
(i.e. the end()
iterator).
If you just want to read an element from a map
without
modifying the map
itself, do not use the
[]
operator. Doing so would create a (default initialized)
value for that key, thereby modifying the map
when all
you wanted to do was just read from it. For example, if instead of
find()
we had written:
if (dictionary[lowercase(word)] != true) { ...
The the dictionary[lowercase(word)]
expression would
implicitly attempt to add a new key/value pair to the
dictionary
map (the key would be the lowercased
version of word
and the value would be the default
boolean value of false
). However, because the
dictionary
map was defined to be a const
parameter to the spell_check_file()
function, the above code
would generate a compilation error. Therefore, when extracting key/value
pairs from a map
or when querying a map
for
the existence of a value for a given key, never use the []
operator — this operator should only be used for assigning elements to
a map
.
Note that another way to implement this program is to store all
the dictionary words in a vector
and then use the
find()
algorithm function (discussed in the previous chapter)
to search for the words in the dictionary. However, this will be less
efficient, because the find()
function scans linearly
over the container elements. Accessing strings that are keys in a
map
is more efficient.
map
performance
Section 7.4 of the textbook discusses how to use map
s to
generate sentences given a grammar. This section demonstrates
a slightly more complicated use of the map
class.
It also describes the concept of recursion (with which you
should already be familiar) and the do ... while
loop
construct, which we've seen before.
The chapter concludes with a section describing the relative merits of
the map
class when compared with hash tables. While
map
s can be somewhat slower that conventional hash tables,
they are more than adequate for most purposes. Unlike hash tables,
map
s are not sensitive to the hashing function which
hash tables must use to determine where a key/value pair is to be stored.
Also, unlike a hash table, a map
will store its key/value
pairs in a strictly enforced order.
Last modified: February 27, 2004 15:04:17 NST (Friday)