The following perl script demonstrates some features about parameter passing in perl:
#!/usr/bin/perl -w use strict; sub scalar_pass { my ($value) = @_; $value .= " World"; $_[0] .= " There\n"; } my $var = "Hello"; scalar_pass($var); print $var; #scalar_pass("Hi"); # Error! Modification of a read-only value attempted sub list_pass { my ($var, @list) = @_; print "$var (@list)\n"; # displays 1 (2 3 4 5) } my @l = (3, 4, 5); list_pass (1, 2, @l);
scalar_pass
method, we copy the parameters
to the local $value
variable and then append a string to
it. This does not affect the actual parameter itself, which is
$var
in the first invocation of scalar_pass
and the literal "Hi"
in the second invocation. We
then append another string to the $_[0]
array element.
This modifies the $var
parameter in the first invocation,
but causes an error on the second invocation of scalar_pass
because the second invocation used a string literal which cannot
be modified. Therefore, if you copy the values from the @_
parameter list to local variables you are achieving pass by value
semantics. Using the elements @_
array directly
(e.g. $_[0]
) results in pass-by-reference semantics.
list_pass
demonstrates perl's
default tendency to 'flatten' lists that are passed in as actual
parameters. The argument to the invocation of list_pass
is the list (1, 2, 3, 4, 5)
. Inside the function
itself, the assignment my ($var, @list) = @_;
assigns
the first scalar in the argument list to $var
and the
@list
variable absorbs all the rest of the arguments,
as shown by the output.
There are several ways to read input from a file. One way mentioned
in Assignment #7 is to using the <STDIN>
operator
in list context:
chomp (my @list = <STDIN>);
Unfortunately, if the file is large then the list array could
consume quite a bit of memory. It is common to process files in
perl on line at a time. To do this we can rely on the fact that the
<STDIN>
operator returns undef
when
all the input has been consumed:
while (defined (my $line = <STDIN>)) { chomp($line); print $line, "\n"; }
If we want, we can use the special default variable $_
to store each line of the input as we read it:
while (defined ($_ = <STDIN>)) { chomp($_); print "$_\n"; }
Perl allows you to expresses this more succinctly as:
while (<STDIN>) { chomp; print; print "\n"; }
When the line input operator, <STDIN>, is used in the context
of a while
condition, perl will assign the result of the
line to the default variable $_
and that can be used
inside the body of the loop. However, for many perl functions,
(e.g. chomp
, length
and print
),
if you do not specify an argument, then they will work on $_
,
by default. Therefore, the call to chomp
above is acting
on $_
. Similarly, the call to print
will display
the contents of the $_
variable.
Note that <STDIN> alone does not cause assignment to the
default $_
variable.
When doing input, many perl scripts use the diamond operator,
<>
, as demonstrated by the following script that
counts word occurrences in a file:
#!/usr/bin/perl -w use strict; my %counter; print "\@ARGV is (@ARGV)\n"; while (<>) { for my $word (split ' ') { $counter{$word} ++; } } for (sort { $counter{$b} <=> $counter{$a} } keys %counter) { print "'$_' occurred $counter{$_} time", $counter{$_} == 1 ? "\n" : "s\n"; }
The code demonstrates a few new features of perl that we haven't seen before.
@ARGV
array for us (note that variable names
which are all upper case are typically 'special' variables in perl).
This array serves the same purpose as the argv
parameter to
the main()
function in C and C++ programs. Note that there
is no need for a argc
equivalent because the length of this
array is easily determined by using @ARGV
in scalar context.
@ARGV
array is
not the name of the program being run -- $ARGV[0]
is actually the first argument on the command line. The special perl
variable $0
stores the name of the program.
@ARGV
as we desire in
our perl scripts. For example, we can shift
values from the
front of the array and/or examine the array as we see fit. In the above
program we simply display the contents of the array inside parenthesis.
$ ./wc.pl arg1 arg2 arg3 ...
then each of these arguments will be treated as file names. The first
file will be opened and the diamond operator in the while
condition will read each line from the this file and assign it to
$_
. The body of the while
loop will then
be executed. The next line from the first will be then be read in and
the process repeated. When all the lines have been read in from the
first file, it is closed and the second file is opened and treated the
same way. This process continues until all the lines in all the files
specified on the command line have been read.
$_
as before. The diamond operator
would return undef
(and cause the while
loop
to terminate) when all the input lines have been read in.
$ ./wc.pl < arg1 arg2 arg3
Then the lines of arg1
would essentially form the standard
input for the program and dropped from the command line by the shell.
@ARGV
would then be set to qw/arg2 arg3/
.
The diamond operator would then read the lines from the arg2
and arg3
files -- the lines in arg1
would be
ignored by the diamond operator. (The lines from the arg1
file could still be read by reading from standard input (i.e.
<STDIN>).
while
loop simply split
s
each line of the input using the space character as a delimiter. Note
that because we are not specifying the second argument to split
,
split
will operate on the default $_
variable.
for
parenthesis:
sort { $counter{$b} <=> $counter{$a} } keys %counter
This line of code demonstrates two new concepts:
sort
, be default, does a lexicographical
ordering. We can do a numeric sort by specifying a custom comparison
function to the sort
function. We do so by directly
embedding the comparison function between the sort
function
name and the array to be sorted. For example:
my @array = (4, 3, 7, 1, 2); print join(",", sort { $a <=> $b } @array), "\n"
This anonymous comparison function will be called many times by the
sort
function. The $a
and $b
variables will be set to the two values that the sort
function wishes to compare. We want our comparison function to return -1
if the first argument is less than the second. The comparison function
should return 0 if the first argument is equal to the second. 1 should be
returned if the first argument is greater than the second. Fortunately,
perl has an operator that works on two numbers that does exactly this
-- the <=>
operator, which is sometimes called the
spaceship operator. If we wanted to sort the numbers in decreasing
order, we simply swap the $a
and $b
. This
is what we do in our word count script.
sort
line above also demonstrates how we can sort
a hash by value
instead of by key. Inside our comparison
function, $a
and $b
are going to be set to
the keys of the hash. In the context of our script, these keys are
words that were encountered in the input. We then use these words to
determine the number of times each word occurred. We can do this by
giving our %counter
hash the appropriate keys which will
be $a
and $b
in our comparison function.
Last modified: Fri Mar 28 17:12:32 2003