Wednesday, April 09, 2003

Backreferences in regular expressions (from Chapter 7)

In regular expressions, it is possible to use backreferences to match a substring that was remembered (by using the parenthesis) earlier in the regular expression. For example, consider the following script:


#!/usr/bin/perl -w

use strict;

my $sentence = "This is is a string of text.";

print "Repeated word!: $`->$&<-$'\n" if $sentence =~ /\b(\w+)\s+\1/i;


The regular expression attempts to find the first occurrence of a repeated word in the sentence. In the specific example above, the regular expression will match the $sentence scalar -- the (\w+) will match the first is and the \1 will match the second is. Note that we use a word boundary anchor, \b. Otherwise, the sentence "This is a string of text." would match the regular expression because the is of This would match (\w+) and the \1 would match the word is that follows This.

The above example also demonstrates the use of the $`, $' and $& special perl variables. After a regular expression match is performed, $` matches everything in the string that occurred before the match, $' represents everything in the string that occurred after the match and $& represents the match itself. Therefore, the above program will display:

Repeated word!: This ->is is<- a string of text.

Note that backreferences \2, \3 etc. can also be used, just as long as there are an appropriate number of sub-expressions grouped by parenthesis. Attempting to use a backreference for which there isn't a corresponding grouping will cause an error.

Miscellaneous Perl

Using the null pattern during split

As we saw on Assignment #7, we can use the null pattern // to split a scalar into an array in which each element represents a character of the scalar. Therefore split //, "hello" will return an array containing the elements h e l l o.

The localtime function

The localtime function will determine the current local time. It returns an array consisting of the various attributes related to the current time including the time and date, the current week day, day of the year and whether or not daylight savings time is in effect. (See S&P p.164 for details.)

We can also provide localtime with a timestamp and it will return the appropriate time/date information for that timestamp. A timestamp is simply the number of seconds that have elapsed since the so-called epoch, which, for most UNIX systems, is the beginning of 1970.

If we evaluate the localtime function in scalar context (instead of array context), we will get a human readable form of the date. The following perl script demonstrates several uses of the localtime function:


#!/usr/bin/perl -w

use strict;

my @lt = localtime;
print "@lt\n";

print "The current time is ", scalar localtime, "\n";

print scalar localtime 2**31-1, "\n";


The above script displays

42 37 14 9 3 103 3 98 1
The current time is Wed Apr  9 14:37:42 2003
Mon Jan 18 23:44:07 2038

Miscellaneous C

Enumerated types (K&R § 2.3)

One concept in C which we did not yet cover is enumerated types. To motivate this concept, assume that we are writing a program which monitors the status of some sort of data acquisition device. The device can be in several states, including OK and FAIL. If the device's buffer is full or empty it could also be in states FULL or EMPTY, for example. If we want to associate each state with an integer, we can set the following #define macros in our program:

#define OK    0
#define FAIL  1
#define FULL  2
#define EMPTY 3

Needless to say, this is quite tedious and error prone, especially if we have a lot of states. If we wanted to insert a state between OK and FAIL, we would have to change all the subsequent #defines. Instead of doing this, we can simply introduce an enumerated type as demonstrated by the following program:


#include <stdio.h>

enum status { ST_OK, ST_FAIL, ST_FULL, ST_EMPTY };

int
main()
{
	enum status st = ST_OK;
	
	/* ... */
	switch (st)   {
		case ST_OK:
			break;
		case ST_FAIL:
			fprintf(stderr, "Error");
			break;
		case ST_FULL:
			/* ... */
			break;
		case ST_EMPTY:
			/* ... */
			break;
		default:
			printf("Hello");
			break;
	}
	return 0;
}


Each constant of the status enumerated type will be initialized for us (starting from zero). We can then add/insert/delete as many new states as we like and the compiler will keep track of the values for us. Note that it is common to prefix each symbolic constant of the enumerated type with a common string. In the above example, we use ST_. This creates a sort of artificial namespace which may lessen the conflicts between constants from different enumerated types.

If we don't want to have to say enum status each time we wish to define a status variable, we can use a typedef to declare the enumerated type as follows:


#include <stdio.h>

typedef enum { ST_OK, ST_FAIL, ST_FULL, ST_EMPTY } t_status;

int
main()
{
	t_status st = ST_OK;
	
	/* ... */
	switch (st)   {
		case ST_OK:
			break;
		case ST_FAIL:
			fprintf(stderr, "Error");
			break;
		case ST_FULL:
			/* ... */
			break;
		case ST_EMPTY:
			/* ... */
			break;
		default:
			printf("Hello");
			break;
	}
	return 0;
}


Internally variables of enumerate types are stored as simple integers and you can assign any integer value to an enumerated type variable without generating errors or warnings from the compiler. However, generally speaking, you should only assign values that were defined in the enumerated type's declaration.

Comments on the Final Exam

The final exam will cover the entire course (C, C++ and Perl). However, because the midterm was almost entirely C, the number of questions relating to C will be relatively small. The emphasis of the exam will be on C++ and Perl. There may also be a question or two related to all three languages, so you should be familiar with the similarities/differences and strengths/weaknesses of each language relative to one another.

The structure of the final exam will likely be similar to the midterm exam, but there will be more questions (maybe eight?). There may even be a few multiple choice questions. You should review all the assignments and their solutions and make sure you understand them. (Incidentally, if you find any mistakes in the solutions, please let me know.)

Material for which you are responsible

Questions from the final exam will be based upon:

C
The online notes as well as sections in K&R that are referenced by the online notes.
C++
Chapters 0 up to and including Chapter 13 of K&M. A few concepts from Chapter 15 (pure virtual functions, abstract base classes, forward declarations) are also fair game. You are also responsible for material covered in the online notes; however, most of the online notes for C++ consist primarily of C++ programs taken from the the textbook. All these programs are discussed in the textbook and were described in detail during class.
Perl
Chapters 1 up to and including 15 of S&P as well as the online notes. The online notes give further elaboration and more examples of the topics discussed in the above chapters. We did talk briefly about array slices and nongreedy quantifiers as well. These are mentioned in Chapter 17.
Miscellaneous
Again, make sure that you understand the assignments and their solutions. Anything covered in class (even if it isn't in the online notes or in any of the textbooks) could be asked on the final.

Instructor Availability and Other Miscellaneous Stuff


Good luck on your finals!

Last modified: Wed Apr 9 15:53:49 2003