Main

March 15, 2004 (Monday)

Perl

As a scripting language, Perl is quite good at string processing and for report generation. Its support for regular expressions (more on those later) allows us to write very sophisticated programs that are relatively compact when compared with their C and C++ counterparts. Unfortunately, this compactness (and the widespread use of default variables) can lead to Perl programs being quite cryptic, especially when viewed for the first time.

For details of the history of the language and some more information on what Perl is (and isn't) good for, check out Chapter 1 of the Schwartz and Phoenix (S&P) textbook

Hello, World! in Perl (S&P — Chapter 1)

Unlike Java, C, C++, Perl programs do not undergo the two separate compilation/execution steps. Instead, compilation takes places as part of the program's execution. This is a common feature of most so-called scripting languages — you type your script into a file and you execute it directly. Any syntax errors in your script will be reported and the script will not run if recovery from the syntax errors was not possible.

As described on the first day of lectures, a simple Hello, world! program can be written as follows:

#!/usr/bin/perl -w

use strict;

print "Hello, world!\n"
hello.pl

Note the absence of a main() function. Scripts are essentially just a sequence of commands that are parsed/executed sequentially, so there is no need for a main function.

Because Perl scripts are executed directly (i.e. there are no intermediate object files or executable generated), we must set the execution bit of the file's permissions. For example, if we call the above file hello.pl, we have to give the following command after we type in the file:

$ chmod u+x hello.pl

We can then see that the execute bit has been set by doing a long listing of the file itself:

$ ls -l hello.pl 
-rwx------    1 donald   cs-grad        57 Mar 19 12:57 hello.pl

We can then execute the script directly:

$ ./hello.pl 
Hello, world!

It is necessary to run the chmod command only once on a file containing a Perl script. It is not necessary to do so each time you modify the file or each time you want to run the script.

The very first line (after the compulsory #! sequence) tells the operating system the location of the binary to use when running the script (in our case, the Perl binary is in /usr/bin/perl). Note that the special character sequence #! must be the first two characters of the file.

You can bypass having to turn on the permission bit and using the #! line by running Perl explicitly on the command line. For example, the following script:

use strict;

print "Hello, world!\n"
hello2.pl

can be run directly without having to change its permissions:

$ perl hello2.pl 
Hello, world!

However, it is more common to add the special #!... line and change the permissions appropriately when writing and running Perl scripts.

The -w Perl option specified on the first line and the use strict; statement puts Perl in ``paranoid'' mode. The -w option can display a lot (almost always) helpful warnings and use strict; forces us to declare our variables before we actually use them. These two options can help you save hours on your debugging and are especially helpful for Perl novices. Whenever Perl generates warnings, you would be well advised to heed them.

Scalars (S&P — Chapter 2)

The simplest data type in Perl is the scalar. Scalars, quite simply are numbers or strings. For example, in the above program, "Hello, world!\n" is a scalar. The literals 255, 0xff and 0377 are numbers. The first one is decimal, then second is hexadecimal and the third is octal, all of which represent the same value (this technique of representing hexadecimal and octal numbers can also be used in C and C++). Unlike C and C++, you can also represent binary numbers directly in Perl with the 0b prefix (e.g. 0b1111111). Internally, when representing a number, Perl uses a type similar to the double type in C and C++.

When writing string literals, you can use either double quotes or single quotes, but there is a very big difference. When we use double quotes, we are allowing escape sequences (e.g. \n and \t) to be represented as they are in C and C++. \n and \t, when used inside a double quoted string in Perl will represent the newline character and the tab character, respectively. As we saw above, "Hello, world!\n" has a newline character at the end. However, in the context of a single quoted string, these escape sequences are taken literally. Therefore, the last two characters of the Perl string 'Hello, world!\n' are \ and n — there is no newline character in this string.

Note that unlike C and C++, Perl does not have a concept of a character. For example, in Perl 'c' is simply a string scalar of length one. It is equivalent to "c". This, of course, is not true in C and C++.

Perl also has a replication operator x which takes a string and a number. It causes the string to be replicated by the number of times specified. For example, the expression "=_-" x 5 will result in the string =_-=_-=_-=_-=_-.

Automatic conversions

One interesting thing about Perl is that it automatically converts between numeric and string scalars as the context demands. For example, the operator +, as you would expect represents numeric addition and the operator . (that's a dot) is used for string concatenation. Therefore, the following Perl statements are valid:

"123" + 4;
"123" . 4;

The first yields the scalar 127 as a result, whereas the second gives the scalar "1234". If you use a string that does not contain all digits during a numeric operation, then Perl will do its best to convert the string to a number. For example, in Perl, "123abc456" + "7" will give 130 as a result.

For a full list of operators in Perl as well as their precedence and associativity, see page 32 of S&P. Of particular interest is that unlike C and C++, Perl supports an exponentiation operator (**). As another example of automatic conversions consider the length function, which determines how many characters are in a scalar.

length 1234567;
length "abcdf";

Both invocations of the length function above return 7.

Variables

In Perl, the type of a variable is determined by its very first character. For scalars, the very first character is always a dollar sign ($). Other data types use characters such as @ and % as their prefix (more on these later). We can assign scalars to scalar variables and we can display them. For example:

#!/usr/bin/perl -w

use strict;

my $str = "Hello, world!";
my $num = 12345.678;

print "\$str is \"$str\" and \$num is $num\n";
var.pl

We assign two scalars to two variables. Note the use of the my keyword in Perl. Normally, Perl does not require that you declare your variables before you use them. Unfortunately, this can lead to very difficult-to-debug errors in your Perl scripts. The use strict; forces us to declare all of our variables before use. In order to declare a variable, we precede its first occurrence with the word my. As in C and C++ we can initialize the variable when we declare it, but we are not obligated to do so.

We could also use parenthesis combine the two assignments in one single statement as follows:

my ($str, $num) = ("Hello, world!", 12345.78);

The argument to the print is enclosed in double quotes. As a result, the \n code sequence at the end will be treated as a newline. We also specify the two variables inside the double quoted string as well. When Perl displays this string, the variable names will be substituted with their actual value — this process is called interpolation. The following output is the result:

$str is "Hello, world!" and $num is 12345.678

In order to literally display a dollar sign and double quote inside this string, we must escape these characters by placing a backslash before them.

Note that interpolation does not happen inside single quoted strings. Therefore, if we had used single quotes instead of double quotes around the argument to the print function, the output would have been:

\$str is \"$str\" and \$num is $num\n

Which is quite different from the output given above. There is no newline at the end of the string. The last two characters displayed are a backslash and an n. The backslashes which were used to escape special character (the $ and the ") inside the double quoted string are now displayed literally when used inside the single quoted string.

Conditionals

As expected, Perl supports the if control structure which is similar to C and C++:

#!/usr/bin/perl -w

use strict;

my $num = 123;
my $str = 5;

if ($num gt $str) {
	print "$num gt $str\n";
} else {
	print "$num le $str\n";
}

if ($num > $str) {
	print "$num > $str\n";
} else {
	print "$num <= $str\n";
}
if.pl

The braces around each of the blocks of code are required by Perl even though they each contain only one statement (this is different from C and C++). Also note that there are separate string and numeric comparison operators. When comparing numeric values, we use the traditional <, >, <=, >= and == relational operators. When comparing strings, we use the analogous lt, gt, le, ge and eq operators. Accidentally using the numeric comparison operator when comparing two strings is a common novice mistake in Perl.

The output from the above program is:

123 le 5
123 > 5

The string 123 is alphabetically less than (or equal to) 5 (since the character 1 is less than the character 5). However, the number 123 is obviously numerically greater than 5.

When a value is used in a conditional, the conditional is treated as false if the value is either 0, the empty string, or the special value undef. The string "0" is also treated as false. All other values are true.

while loops and input

Perl also supports iteration via the while control structure:

#!/usr/bin/perl -w

use strict;

print "Input a number: ";
chomp(my $num = <STDIN>);
my $sum;

while ($num) {
	$sum += $num--;
	print "Running total is $sum\n";
}
while.pl

The special notation <STDIN> means read a line from input. STDIN is analogous to stdin in C and cin in C++. It represents the standard input stream, typically the keyboard. As with C and C++, you can redirect standard input to a Perl script from a file by using the < redirection symbol on the command line, although as we'll see later, this is rarely done with Perl scripts.

The line of input assigned to the $num variable has a newline at the end of it. In order to get rid of the new line we use the chomp function. This function, when applied to a scalar variable will remove a newline character (if one exists) from the end of the variable. If no newline character is at the end, the function does nothing.


Last modified: April 15, 2004 20:39:19 NDT (Thursday)