Friday, March 28, 2003

Some brief notes about subroutine parameter passing

The following perl script demonstrates some features about parameter passing in perl:


#!/usr/bin/perl -w

use strict;

sub scalar_pass {
	my ($value) = @_;
	$value .= " World";
	$_[0] .= " There\n";
}

my $var = "Hello";
scalar_pass($var);
print $var;
#scalar_pass("Hi");  # Error! Modification of a read-only value attempted

sub list_pass {
	my ($var, @list) = @_;
	print "$var (@list)\n";  # displays 1 (2 3 4 5) 
}

my @l = (3, 4, 5);
list_pass (1, 2, @l);

In the scalar_pass method, we copy the parameters to the local $value variable and then append a string to it. This does not affect the actual parameter itself, which is $var in the first invocation of scalar_pass and the literal "Hi" in the second invocation. We then append another string to the $_[0] array element. This modifies the $var parameter in the first invocation, but causes an error on the second invocation of scalar_pass because the second invocation used a string literal which cannot be modified. Therefore, if you copy the values from the @_ parameter list to local variables you are achieving pass by value semantics. Using the elements @_ array directly (e.g. $_[0]) results in pass-by-reference semantics.
The second subroutine list_pass demonstrates perl's default tendency to 'flatten' lists that are passed in as actual parameters. The argument to the invocation of list_pass is the list (1, 2, 3, 4, 5). Inside the function itself, the assignment my ($var, @list) = @_; assigns the first scalar in the argument list to $var and the @list variable absorbs all the rest of the arguments, as shown by the output.

Input/Output (S&P -- Chapter 6)

There are several ways to read input from a file. One way mentioned in Assignment #7 is to using the <STDIN> operator in list context:

chomp (my @list = <STDIN>);

Unfortunately, if the file is large then the list array could consume quite a bit of memory. It is common to process files in perl on line at a time. To do this we can rely on the fact that the <STDIN> operator returns undef when all the input has been consumed:

while (defined (my $line = <STDIN>)) {
	chomp($line);
	print $line, "\n";
}

If we want, we can use the special default variable $_ to store each line of the input as we read it:

while (defined ($_ = <STDIN>)) {
	chomp($_);
	print "$_\n";
}

Perl allows you to expresses this more succinctly as:

while (<STDIN>) {
	chomp;
	print;
	print "\n";
}

When the line input operator, <STDIN>, is used in the context of a while condition, perl will assign the result of the line to the default variable $_ and that can be used inside the body of the loop. However, for many perl functions, (e.g. chomp, length and print), if you do not specify an argument, then they will work on $_, by default. Therefore, the call to chomp above is acting on $_. Similarly, the call to print will display the contents of the $_ variable.

Note that <STDIN> alone does not cause assignment to the default $_ variable.

The Diamond operator and command line arguments in perl

When doing input, many perl scripts use the diamond operator, <>, as demonstrated by the following script that counts word occurrences in a file:


#!/usr/bin/perl -w

use strict;

my %counter;

print "\@ARGV is (@ARGV)\n";

while (<>) {
	for my $word (split ' ') {
		$counter{$word} ++;
	}
}

for (sort { $counter{$b} <=> $counter{$a} } keys %counter) {
	print "'$_' occurred $counter{$_} time",
		$counter{$_} == 1 ? "\n" : "s\n";
}

The code demonstrates a few new features of perl that we haven't seen before.

When a perl program is started, the command line arguments are stored in the @ARGV array for us (note that variable names which are all upper case are typically 'special' variables in perl). This array serves the same purpose as the argv parameter to the main() function in C and C++ programs. Note that there is no need for a argc equivalent because the length of this array is easily determined by using @ARGV in scalar context.
Unlike C/C++, the first element of the @ARGV array is not the name of the program being run -- $ARGV[0] is actually the first argument on the command line. The special perl variable $0 stores the name of the program.
We are free to examine/modify @ARGV as we desire in our perl scripts. For example, we can shift values from the front of the array and/or examine the array as we see fit. In the above program we simply display the contents of the array inside parenthesis.
Next, we use the diamond operator to do input. What this operator does depends upon how the perl script was invoked:
- If there were arguments specified on the command line, e.g.
```
$ ./wc.pl arg1 arg2 arg3 ...
```
  then each of these arguments will be treated as file names. The first file will be opened and the diamond operator in the while condition will read each line from the this file and assign it to $_. The body of the while loop will then be executed. The next line from the first will be then be read in and the process repeated. When all the lines have been read in from the first file, it is closed and the second file is opened and treated the same way. This process continues until all the lines in all the files specified on the command line have been read.
- If there were no command line arguments, then the diamond operator would attempt to read input lines from standard input and assign each line to $_ as before. The diamond operator would return undef (and cause the while loop to terminate) when all the input lines have been read in.
If you were to invoke the script as:
```
$ ./wc.pl < arg1 arg2 arg3 
```
Then the lines of arg1 would essentially form the standard input for the program and dropped from the command line by the shell. @ARGV would then be set to qw/arg2 arg3/. The diamond operator would then read the lines from the arg2 and arg3 files -- the lines in arg1 would be ignored by the diamond operator. (The lines from the arg1 file could still be read by reading from standard input (i.e. <STDIN>).
The body of the while loop simply splits each line of the input using the space character as a delimiter. Note that because we are not specifying the second argument to split, split will operate on the default $_ variable.
Finally, we display the words in decreasing order of occurrence (the most commonly occurring word will be displayed first followed by the second most commonly occurring word etc.). The important part of the loop is the contents of the for parenthesis:
```
sort { $counter{$b} <=> $counter{$a} } keys %counter
```
This line of code demonstrates two new concepts:
- It demonstrates how to numerically sort an array of numbers. Remember that sort, be default, does a lexicographical ordering. We can do a numeric sort by specifying a custom comparison function to the sort function. We do so by directly embedding the comparison function between the sort function name and the array to be sorted. For example:
```
my @array = (4, 3, 7, 1, 2);
print join(",", sort { $a <=> $b } @array), "\n"
```
  This anonymous comparison function will be called many times by the sort function. The $a and $b variables will be set to the two values that the sort function wishes to compare. We want our comparison function to return -1 if the first argument is less than the second. The comparison function should return 0 if the first argument is equal to the second. 1 should be returned if the first argument is greater than the second. Fortunately, perl has an operator that works on two numbers that does exactly this -- the <=> operator, which is sometimes called the spaceship operator. If we wanted to sort the numbers in decreasing order, we simply swap the $a and $b. This is what we do in our word count script.
- The sort line above also demonstrates how we can sort a hash by value instead of by key. Inside our comparison function, $a and $b are going to be set to the keys of the hash. In the context of our script, these keys are words that were encountered in the input. We then use these words to determine the number of times each word occurred. We can do this by giving our %counter hash the appropriate keys which will be $a and $b in our comparison function.

Last modified: Fri Mar 28 17:12:32 2003