Wednesday, March 26, 2003

Subroutines (S&P -- Chapter 4)

As in C and C++, perl has the concept of a function or subroutine, as they are called in perl. For example, here is a simple subroutine that adds two numbers and returns the result:


#!/usr/bin/perl -w

use strict;

sub sum {
	my ($num1, $num2) = @_;

	return $num1 + $num2;
}

print &sum(12, 43), "\n";

my ($v1, $v2) = (100, 10);
print &sum($v1, $v2), "\n";

my @a = (1, 2);
print &sum(@a), "\n";

There are several important points about the above subroutine that you should notice:

The definition of the subroutine is prefixed by the sub keyword and is given a name (sum in this example).
There is no return type specified Perl subroutines may or may not return values -- it is up to the subroutine writer to decide. The subroutine above returns a scalar as its result as indicated by the return operator.
Unlike C and C++, there is no parameter list. Instead, all the parameters are passed in via a default array which can be accessed inside the subroutine body. The name of this default array is @_ (not to be confused with the default scalar variable $_). We can refer to the first parameter by saying $_[0], the second parameter by saying $_[1] and so on. Again, do not confuse $_[0] with $_. They represent different scalars.
It is common in perl to extract the parameters from this default argument array by using a my list as follows:
```
my ($num1, $num2) = @_;
```
Note that $num1 and $num2 are local to the sum subroutine and cannot be accessed outside of it. This means that my variables are statically scoped. (Perl also supports dynamic scoping as well via the (poorly named) local operator, but we will not be discussing this. See page 63 of S&P for all the details if you are curious.)
We call the subroutine my using the subroutine's name prefixed by the & symbol and enclosing the parameters inside parenthesis. As can be seen by the above script, we can pass in literals, scalar variables, or even a list. The expected results happen in all cases.

The above subroutine is a bit restrictive in that it only sums up two numbers. We can make our subroutine more flexible by allowing it to sum up a variable number of arguments:


#!/usr/bin/perl -w

use strict;

sub sum {
	my $sum = 0;

	print "No numbers given!\n" if ! @_;
	for (@_) {
		$sum += $_;
	}
	$sum;
}

print &sum(12, 43, 67, 98), "\n";

my ($v1, $v2) = (100, 10);
print &sum($v1, $v2), "\n";

my @empty = ();
print &sum(@empty), "\n";

print &sum(1..100), "\n";

Again, there are several things to note about the above script:

In our subroutine body, we access the parameters directly from the @_ argument list via the for loop. The default scalar $_ will implicitly assume all the values (if any) in the parameter list and a running total is maintained via the sum local variable.
The return value of the subroutine will be the value of the last statement executed (which is $sum in the above code). We do not need to explicitly use the return keyword to return a value from a subroutine. We typically use return in a perl subroutine if we wish to exit the subroutine early (due to an error condition, for example).
We can call our subroutine with a list of literals, a list of scalar variables, an empty list or a range of integers specified by the range operator...
The statement:
```
	print "No numbers given!\n" if ! @_;
```
demonstrates a sort of 'postfix' form of the if statement and is quite commonly used in perl when the body of an if statement is a single line. (This notation can also be used for while loops as well.) Note that there are no parenthesis required around the condition being tested by the if. The condition, by the way, is the negation of the @_ parameter array used in list context. This condition will return true if the parameter array has zero elements.

Hashes (S&P -- Chapter 5)

Hashes in perl are roughly equivalent to the map class of C++ standard library. They associate scalar strings (called a key) to their respective values (typically, another scalar). Unlike the map class, perl's hashes are not self-ordering. By default, the order in which you place the elements into the hash may be different than the order in which you can retrieve them. Perl does provide a way to retrieve the keys in order, but you have to do so manually, as seen below:


#!/usr/bin/perl -w

use strict;

my %ip_to_host = ("134.153.48.1", "garfield", "134.153.48.2", "mirror",
	 "134.153.48.3", "phobos", );

$ip_to_host{"134.153.48.4"} = "deimos";
$ip_to_host{"134.153.48.10"} = "irma";

while (my ($ip, $host) = each %ip_to_host) {
	print "Hostname $host has IP address $ip\n";
}
print "\n";
delete $ip_to_host{"134.153.48.2"};	# Get rid of 'mirror' host.

# Display the hosts again, this time in sorted order.  Note
# that the sorting is lexicographic.  So host irma's IP
# address appears after garfield but before phobos
#
for (sort keys %ip_to_host) {
	print "Hostname $ip_to_host{$_} has IP address $_\n";
}

my %host_to_ip = reverse %ip_to_host;
printf "\nGarfield has ip address $host_to_ip{'garfield'}\n";

The above script demonstrates many aspects of hash types in perl.

We can initialize a hash variable in the same way that we initialize an array variable -- with a list literal. When used to initialize a hash variable, the elements occurring at even indices of the list represent the keys and their values are represented by the following element. When initializing a hash table this way, it is important that the list have an even number of elements.
Another, more popular, way of initializing a hash is to use the big arrow operator. For example, the above has could have been initialized more intuitively as follows
```
my %ip_to_host = ("134.153.48.1" => "garfield", "134.153.48.2" => "mirror",
	 "134.153.48.3" => "phobos", );
```
This way, it is much easier to see which element in the list is the key and which is the value. Of course specifying one key/value pair per line is also an effective way of initializing a hash:
```
my %ip_to_host = (
	"134.153.48.1" => "garfield",
	"134.153.48.2" => "mirror",
	"134.153.48.3" => "phobos",
);
```
We can also assign individual elements to the hash by using specifying the hash followed by the key in curly braces:
```
$ip_to_host{"134.153.48.4"} = "deimos";
$ip_to_host{"134.153.48.10"} = "irma";
```
Note that if we had used square brackets, then this would be treating the ip_to_host as a (different) array variable and not a hash. A warning would then be generated since the index is not entirely numeric.
The while loop demonstrates one way to iterate over the key/value pairs in a hash using the each function. The each function will return a two-element list consisting of successive key/value pairs stored in the hash. When it comes to the end of the hash, and empty list is returned and the while loop terminates. We assign the result of each list returned by each to the $ip and $host variables (in list context).
Adding/deleting values to/from a hash while iterating over it can cause unpredictable results. Don't do this.
We can remove key/value from the has by using the delete function as demonstrated by the above code. We can also check for whether a particular key exists in a hash by using the exists function in a similar way. The exists function will return true if the specified key exists.
The for loop demonstrates a way that we can iterate over the hash accessing the elements sorted by key. To do so we use the key function on the hash. This returns a list containing all the keys store in the hash table (a similar function called values returns a list containing all the values in the hash). We then sort this list using the sort function and iterate over this sorted list. Note that we are using the default variable $_ to store each key on each iteration through the loop.
Remember that sort does a lexicographical ordering on the list, so keys that contain numeric-like values (as above) will not be sorted numerically.
Finally, the above code demonstrates how we can turn a hash "inside-out" by turning the keys in the values and vice versa using the reverse function on the hash variable. Note that if all the values in the original hash table are not unique, then the resulting hash table will have fewer elements that the original.

One final note about hashes is that hashes are not interpolated inside double quotes, like scalars and arrays. Therefore, you cannot, for example, display the keys/values of a hash variable by saying print "The hash table is %h\n". The %h will be interpreted literally.

Last modified: Wed Mar 26 15:50:26 2003