Developing
a Perl Routine
Randal L. Schwartz
I was cruising the Perl newsgroups the other day, and found the
following request. It appears to be a homework problem from a university
assignment, so I won't embarrass the original poster by including
his name. (Normally, I try to give credit to the source of inspiration
for one of my columns, so if you want your name in lights, please
email your ideas to me!)
Here's the task: start with the three-letter English abbreviations
for the seven days of the week, in their natural order. Write a
subroutine that takes two of these weekday abbreviations, and returns
a single comma-separated string with all the days in between, wrapping
around if necessary. For example, given an input of Tue and
Thu, the return value is Tue,Wed,Thu. However, an
input of Thu and Tue should wrap all the way around
to Thu,Fri,Sat,Sun,Mon,Tue. Be sure to reject (via die)
any erroneous inputs.
This doesn't sound like that difficult a task, but some interesting
subtleties arose as I was starting to solve it in my head. So, I'm
writing this column effectively in real time, as I would consider
each piece of the problem, to illustrate effective practices at
developing Perl routines.
First, I need a subroutine name. This is sometimes harder than
it looks. I want a name that's short enough that I'll
reuse it, but long enough to be descriptive and unique. Let's
start with day_string_range. Our initial subroutine looks
like:
sub day_string_range {
... code goes here ...
}
Good so far. I hope that wasn't too surprising. Next, I need
to grab the start and end values, so let's first check that they
are there, and if so, grab them:
sub day_string_range {
die "too few args: @_" if @_ < 2;
die "too many args: @_" if @_ > 2;
my ($start,$end) = @_;
...
}
Note here the use of @_ in a scalar context (with the two comparison
operators) to check the number of elements. We then create two local
variables to hold the arguments if we make it past the test.
But $start is now something like Thu. How do we
turn that into a computable value so that we can increment it? Well,
we'll need a map to map it back and forth to integers. Let's
use an array (for now) to hold the names in the proper order:
my @day_names = qw(Sun Mon Tue Wed Thu Fri Sat);
Where do we put this? If we put it inside the subroutine, it'll
get initialized every time, at a slight speed penalty. However, if
we put it outside the subroutine, it needs to get executed before
the initialization occurs. Fortunately, we can create "static
local" variables using a common Perl idiom:
BEGIN {
my @day_names = qw(Sun Mon Tue Wed Thu Fri Sat);
sub day_string_range {
die "too few args: @_" if @_ < 2;
die "too many args: @_" if @_ > 2;
my ($start,$end) = @_;
...
}
}
The BEGIN block causes the code to be executed at compile time,
initializing the value of @day_names before any other "normal"
code is executed. The variable is local to the block, so it won't
be seen by any other part of the program, just the subroutine inside
the BEGIN block.
Using this array, how do we turn a name into a number? An array
isn't very good for searching, except as a linear search. A
linear search might look like:
my $number_for_start;
for (0..$#day_names) {
if ($day_names[$_] eq $start) {
$number_for_start = $_;
last;
}
}
die "$start is not a day name"
unless defined $number_for_start;
This would probably suffice if we called this routine only a few times
in the program. But let's step up the efficiency a bit (and simplify
the logic) by using a hash. First, we'll convert the array trivially
into a hash with keys of the original array names and values equal
to the position within the array:
my @day_names = qw(Sun Mon Tue Wed Thu Fri Sat);
my %mapping; @mapping{@day_names} = 0..$#day_names;
Now we have $day_names[3] as Wed, and $mapping{Wed}
as 3, so we can go from one to the other. For symmetry, we
could have made these both hashes, but the differences in the
resulting code would be minor.
Now, how do we get from $start to $number_for_start?
Much simpler:
die "No such name: $start" unless exists $mapping{$start};
my $number_for_start = $mapping{$start};
That's a very pure way to do it. We can be slightly dirtier and
optimize knowing that there are no undef values in the hash:
defined (my $number_for_start = $mapping{$start})
or die "no such name: $start";
This is all fine and well for $start, but we need to perform
the same operation for $end. I could cut-and-paste that code
twice, making the subroutine so far as:
BEGIN {
my @day_names = qw(Sun Mon Tue Wed Thu Fri Sat);
my %mapping; @mapping{@day_names} = 0..$#day_names;
sub day_string_range {
die "too few args: @_" if @_ < 2;
die "too many args: @_" if @_ > 2;
my ($start,$end) = @_;
defined (my $number_for_start = $mapping{$start})
or die "no such name: $start";
defined (my $number_for_end = $mapping{$end})
or die "no such name: $end";
...
}
}
My "maintenance alarm", however, goes off when I type such
code. I've got the same code twice in the program, but merely
operating on different variables. If I were maintaining this code
for some reason (to add functionality or additional error checking,
for example), and I missed the fact that the two pieces of code must
stay in parallel, I'd probably spend a lot of time debugging.
Or, worse yet, the code would go into production to show errors in
live data.
I can solve this with a bit of indirection -- if I see it
properly as a "mapping" from a set of values to another,
the wonderful map operator pops into mind:
my ($number_for_start, $number_for_end) = map {
defined (my $ret = $mapping{$_})
or die "no such name: $_";
$ret;
} $start, $end;
Here, each value is placed into $_, and then we run the code
in the block. The last expression evaluated (in this case, $ret)
provides the elements of the output list.
But as I'm staring at this, I realize that once I've
got the value of $number_for_start, I'll never need
the original $start value again. So, another approach (again
shooting for more simplification) is to use the "in-place-ness"
of the foreach loop:
foreach ($start, $end) {
exists $mapping{$_}
or die "no such name: $_";
$_ = $mapping{$_};
}
For each of the start and end values, if a mapping for it exists,
replace the value with its mapped equivalent, otherwise die.
At this point, we've got two small, validated integers in
the range of 0 to 6. It's time to start building the return
value. We'll build it up as a list, then join the list with
commas to get a single string:
my @return;
while (1) {
push @return, $day_names[$start];
last if $start == $end;
($start += 1) %= 7;
}
Starting with the empty list, we'll push the day name for $start
onto the end of the list. If this is the end, we also quit. Otherwise,
we'll increment the value of $start, but in a "modulus
7" way, wrapping around from 6 back to 0. I really wanted to
write ++$start %= 7, but that's sadly not permitted. This
loop has stuff before the exit test, and stuff after the exit test,
which is easiest to write as an "infinite" loop with an
exit test in the middle.
For the final return value -- a simple join on the list thus
created:
return join ",", @return;
And to put that all together:
BEGIN {
my @day_names = qw(Sun Mon Tue Wed Thu Fri Sat);
my %mapping; @mapping{@day_names} = 0..$#day_names;
sub day_string_range {
die "too few args: @_" if @_ < 2;
die "too many args: @_" if @_ > 2;
my ($start,$end) = @_;
foreach ($start, $end) {
exists $mapping{$_}
or die "no such name: $_";
$_ = $mapping{$_};
}
my @return;
while (1) {
push @return, $day_names[$start];
last if $start == $end;
($start += 1) %= 7;
}
return join ",", @return;
}
}
It's simple, but there are a lot of steps to get it done right.
Sure, you can probably play "Perl golf" and "minimize
the [key]strokes" to write this routine in about half the number
of lines. But I think we've got enough here for the maintenance
programmer to follow along or modify nicely, and it does the job with
reasonable efficiency. Until next time, enjoy!
Randal L. Schwartz is a two-decade veteran of the software
industry -- skilled in software design, system administration,
security, technical writing, and training. He has coauthored the
"must-have" standards: Programming Perl, Learning
Perl, Learning Perl for Win32 Systems, and Effective
Perl Programming, as well as writing regular columns for WebTechniques
and Unix Review magazines. He's also a frequent contributor
to the Perl newsgroups, and has moderated comp.lang.perl.announce
since its inception. Since 1985, Randal has owned and operated Stonehenge
Consulting Services, Inc.
|