Wednesday, March 19, 2003

Pure `virtual` functions and abstract base classes (K&M -- Chapter 15)

We've discussed the concept of virtual functions during the previous class. Such functions lead to the concept of run-time or dynamic binding, in which the actual function call is not know until run-time. Typically, a base class implements some default behaviour for the virtual function and each derived classes that does not override this function, will inherit the implementation as defined by the base class.

In many practical cases, however, it does not make sense for a base class to implement a virtual function. For example, in Assignment #6, we have a generic base class called Investment and two derived classes Cash and Stock. We are asked to implement a virtual function which will determine the value of a particular investment. However, for the Investment class, there is no way to calculate its value, because it is too generic -- there is no information available in the class from which a value can be calculated. Therefore, rather than implement a empty function for its value() method, we instead use a pure virtual function:

class Investment {
public:
	...
	virtual double value() const = 0;	// Pure virtual function.
	...
};

By assigning 0 to the function declaration, we are saying that it is not possible for the Investment class to determine its value -- the Investment class is simply too abstract to allow for a reasonable definition. As a result, because Investment has a pure virtual function, we say that this class is an abstract base class. It is not possible to create objects from an abstract base classes, the compiler will complain. Similarly, objects cannot be created from any derived classes that does not override all the pure virtual functions inherited from an abstract base class. Before a derived class can actually be instantiated, we must override all pure virtual functions as defined in its base class (or ancestor of its base class).

Note that we can still create pointers to the base Investment class, but we cannot create objects of type Investment:


#include	<iostream>

struct A {
	virtual void hello() = 0;
};

struct B : public A {
	virtual void hello() { std::cout << "I'm B" << std::endl; }
};

struct C : public A {
	virtual void hello() { std::cout << "I'm C" << std::endl; }
};

int
main()
{
	A  a_object	// Illegal!
		
	A *a;		// Okay.

	a = new A;	// Illegal!

	a = new B;	// Okay
	a->hello();	// "I'm B"

	a = new C;	// Okay
	a->hello();	// "I'm C"
}

Typically, the pointer to the base class (e.g. Investment in this case), will be used to point to objects of classes derived from the base class for the purposes of invoking virtual methods via the pointer.

Initializing base classes from derived classes

As mentioned above, in Assignment #6, we have a base Investment and two derived classes Cash and Stock. Our constructor for the investment class is fairly straightforward as all we had to do was initialize the name data member using the constructor's member initialization list. Remember that the member initialization list is the text in the constructor between the colon and the start of the constructor's body (the body is empty in the case of the Investment constructor:

class Investment {
public:
	Investment (const std::string &n) : name(n) { }
...
protected:
	std::string	name;
};

During construction of a derived class, we often want to pass parameters to the base class so that the base portions of the class can be initialized correctly. To do this, we use the member initialization list again. For example, to initialize a Cash object, we make a constructor that takes a string reference (along with details relevant to the Cash class), then we call the Investment constructor in the member initialization list of Cash's constructor, passing the string parameter to it:

class Cash : public Investment {
public:
	Cash (const std::string &n, const double a, const double r)
		: Investment(n), amount(a), rate(r) { }
...
};

This will correctly initialize the name data member in Investment base class. Note that even though the name data member is protected inside the base class (and not private), the Cash constructor could not initialize the name field directly. This is because the name attribute is inherited from the Investment class -- it is not an immediate attribute of the Cash class, therefore the Investment class is responsible for initializing it.

Forward declarations

In the Portfolio class of Assignment #6, a forward declaration is used to inform the compiler of the presence of a class called Investment. The syntax of this declaration is quite trivial:

class Investment;

Because the Portfolio class only uses a pointer to an Investment (i.e. std::vector<Investment *> investments), there is no need to actually #include the full investment.h header file. Instead, we can simply provide a forward declaration and the compiler will be happy. If portfolio.h had code or method declarations that required a full-fledged Investment object (as opposed to just a pointer or a reference), then we would have to actually #include the investment.h header file in portfolio.h.

In portfolio.cpp, however, because we are actually creating Investment objects (or more precisely, objects of classes derived from the Investment class) and we must call member functions in the Investment hierarchy in order to solve the assignment, the portfolio.cpp source file must #include "investment.h". Note that stock.h and cash.h also #include the Investment header, but multiple inclusion of the same header file is prevented because of the #ifndef guards at the top of every header file.

The usage of a forward declaration in Assignment #6 is pedantic, at best. More practically, forward declarations are required when you are creating mutually referential classes. If class A contains a pointer to class B and class B contains a pointer to class A, then a forward declaration will be necessary in order to break the chicken-and-the-egg problem that such mutually referential classes create:

class B;

class A {
	...
	B *b;
};

class B {
	...
	A *a;
};

Perl

As a scripting language Perl is quite good at string processing and for report generation. Its support for regular expressions (more on those later) allows us to write very sophisticated programs that are relatively compact when compared with their C and C++ counterparts. Unfortunately, this compactness (and the widespread use of default variables) can lead to Perl programs being quite cryptic, especially when viewed for the first time.

For details of the history of the language and some more information on what Perl is (and isn't) good for, check out Chapter 1 of the Schwartz and Phoenix (S&P) textbook

`Hello, World!` in Perl (S&P -- Chapter 1)

Unlike Java, C, C++, perl programs do not undergo the two separate compilation/execution steps. Instead, compilation takes places as part of the program's execution. This is a common feature of most so-called scripting languages -- you type your script into a file and you execute it directly. Any syntax errors in your script will be reported and the script will not run if recovery from the syntax errors was not possible.

As described on the first day of lectures, a simple Hello, world! program can be written as follows:


#!/usr/bin/perl -w

use strict;

print "Hello, world!\n"

Because perl scripts are executed directly, we must also set the execution bit of the file's permissions. For example if we call the above file hello.pl, we have to give the following command after we type in the file:

$ chmod u+x hello.pl

We can then see that the execute bit has been set by doing a long listing of the file itself:

$ ls -l hello.pl 
-rwx------    1 donald   cs-grad        57 Mar 19 12:57 hello.pl

We can then execute the script directly:

$ ./hello.pl 
Hello, world!

It is necessary to run the chmod command only once on a file containing a perl script. It is not necessary to do so each time you modify the file or each time you want to run the script.

The very first line (after the compulsory #! sequence) tells the operating system the location of the binary to use when running the script (in our case, the perl binary is in /usr/bin/perl). Note that the special character sequence #! must be the first two characters of the file.

You can bypass having to turn on the permission bit and using the #! line by running perl explicitly on the command line. For example, the following script:


use strict;

print "Hello, world!\n"

can be run directly without having to change its permissions:

$ perl hello2.pl 
Hello, world!

However, it is more common to add the special #!... line and change the permissions appropriately when writing and running perl scripts.

The -w perl option specified on the first line and the use strict; statement puts perl in ``paranoid'' mode. The -w option can display a lot (almost always) helpful warnings and use strict; forces us to declare our variables before we actually use them. These two options can help you save hours on your debugging and are especially helpful for Perl novices. Whenever perl generates warnings, you would be well advised to heed them.

Scalars (S&P -- Chapter 2)

The simplest data type in perl is the scalar. Scalars, quite simply are numbers or strings. For example, in the above program, "Hello, world!\n" is a scalar. The literals 255, 0xff and 0377 are numbers. The first one is decimal, then second is hexadecimal and the third is octal, all of which represent the same value (this technique of representing hexadecimal and octal numbers can also be used in C and C++). Unlike C and C++, you can also represent binary numbers directly in perl with the 0b prefix (e.g. 0b1111111). Internally, when representing a number, perl uses a type similar to the double type in C and C++.

When writing string literals, you can use either double quotes or single quotes, but there is a very big difference. When we use double quotes, we are allowing escape sequences (e.g. \n and \t) to be represented as they are in C and C++. \n and \t, when used inside a double quoted string in perl will represent the newline character and the tab character, respectively. As we saw above, "Hello, world!\n" has a newline character at the end. However, in the context of a single quoted string, these escape sequences are taken literally. Therefore, the last two characters of the perl string 'Hello, world!\n' are \ and n -- there is no newline character.

Note that unlike C and C++, perl does not have a concept of a character. For example, in perl 'c' is simply a string scalar of length one. It is equivalent to "c". This, of course, is not true in C and C++.

Perl also has a replication operator x which takes a string and a number. It causes the string to be replicated by the number of times specified. For example, the expression "=_-" x 5 will result in the string =_-=_-=_-=_-=_-.

Automatic conversions

One interesting thing about perl is that it automatically converts between numeric and string scalars as the context demands. For example, the operator +, as you would expect represents numeric addition and the operator . (that's a dot) is used for string concatenation. Therefore, the following perl statements are valid:

"123" + 4;
"123" . 4;

The first yields the scalar 127 as a result, whereas the second gives the scalar "1234". If you use a string that does not contain all digits during a numeric operation, then perl will do its best to convert the string to a number. For example, in perl, "123abc456" + "7" will give 130 as a result.

For a full list of operators in perl as well as their precedence and associativity, see page 32 of S&P. Of particular interest is that unlike C and C++, perl supports an exponentiation operator (**).

Last modified: Fri Mar 21 12:53:27 2003