Cover V10, I08

Article

aug2001.tar


Developing a Perl Routine

Randal L. Schwartz

I was cruising the Perl newsgroups the other day, and found the following request. It appears to be a homework problem from a university assignment, so I won't embarrass the original poster by including his name. (Normally, I try to give credit to the source of inspiration for one of my columns, so if you want your name in lights, please email your ideas to me!)

Here's the task: start with the three-letter English abbreviations for the seven days of the week, in their natural order. Write a subroutine that takes two of these weekday abbreviations, and returns a single comma-separated string with all the days in between, wrapping around if necessary. For example, given an input of Tue and Thu, the return value is Tue,Wed,Thu. However, an input of Thu and Tue should wrap all the way around to Thu,Fri,Sat,Sun,Mon,Tue. Be sure to reject (via die) any erroneous inputs.

This doesn't sound like that difficult a task, but some interesting subtleties arose as I was starting to solve it in my head. So, I'm writing this column effectively in real time, as I would consider each piece of the problem, to illustrate effective practices at developing Perl routines.

First, I need a subroutine name. This is sometimes harder than it looks. I want a name that's short enough that I'll reuse it, but long enough to be descriptive and unique. Let's start with day_string_range. Our initial subroutine looks like:

sub day_string_range {
  ... code goes here ...
}
Good so far. I hope that wasn't too surprising. Next, I need to grab the start and end values, so let's first check that they are there, and if so, grab them:

sub day_string_range {
  die "too few args: @_" if @_ < 2;
  die "too many args: @_" if @_ > 2;
  my ($start,$end) = @_;
  ...
}
Note here the use of @_ in a scalar context (with the two comparison operators) to check the number of elements. We then create two local variables to hold the arguments if we make it past the test.

But $start is now something like Thu. How do we turn that into a computable value so that we can increment it? Well, we'll need a map to map it back and forth to integers. Let's use an array (for now) to hold the names in the proper order:

my @day_names = qw(Sun Mon Tue Wed Thu Fri Sat);
Where do we put this? If we put it inside the subroutine, it'll get initialized every time, at a slight speed penalty. However, if we put it outside the subroutine, it needs to get executed before the initialization occurs. Fortunately, we can create "static local" variables using a common Perl idiom:

BEGIN {
  my @day_names = qw(Sun Mon Tue Wed Thu Fri Sat);

  sub day_string_range {
    die "too few args: @_" if @_ < 2;
    die "too many args: @_" if @_ > 2;
    my ($start,$end) = @_;
    ...
  }
}
The BEGIN block causes the code to be executed at compile time, initializing the value of @day_names before any other "normal" code is executed. The variable is local to the block, so it won't be seen by any other part of the program, just the subroutine inside the BEGIN block.

Using this array, how do we turn a name into a number? An array isn't very good for searching, except as a linear search. A linear search might look like:

my $number_for_start;
for (0..$#day_names) {
  if ($day_names[$_] eq $start) {
    $number_for_start = $_;
    last;
  }
}
die "$start is not a day name"
  unless defined $number_for_start;
This would probably suffice if we called this routine only a few times in the program. But let's step up the efficiency a bit (and simplify the logic) by using a hash. First, we'll convert the array trivially into a hash with keys of the original array names and values equal to the position within the array:

my @day_names = qw(Sun Mon Tue Wed Thu Fri Sat);
my %mapping; @mapping{@day_names} = 0..$#day_names;
Now we have $day_names[3] as Wed, and $mapping{Wed} as 3, so we can go from one to the other. For symmetry, we could have made these both hashes, but the differences in the resulting code would be minor.

Now, how do we get from $start to $number_for_start? Much simpler:

die "No such name: $start" unless exists $mapping{$start};
my $number_for_start = $mapping{$start};
That's a very pure way to do it. We can be slightly dirtier and optimize knowing that there are no undef values in the hash:

defined (my $number_for_start = $mapping{$start})
  or die "no such name: $start";
This is all fine and well for $start, but we need to perform the same operation for $end. I could cut-and-paste that code twice, making the subroutine so far as:

BEGIN {
  my @day_names = qw(Sun Mon Tue Wed Thu Fri Sat);
  my %mapping; @mapping{@day_names} = 0..$#day_names;

  sub day_string_range {
    die "too few args: @_" if @_ < 2;
    die "too many args: @_" if @_ > 2;
    my ($start,$end) = @_;
    defined (my $number_for_start = $mapping{$start})
      or die "no such name: $start";
    defined (my $number_for_end = $mapping{$end})
      or die "no such name: $end";
    ...
  }
}
My "maintenance alarm", however, goes off when I type such code. I've got the same code twice in the program, but merely operating on different variables. If I were maintaining this code for some reason (to add functionality or additional error checking, for example), and I missed the fact that the two pieces of code must stay in parallel, I'd probably spend a lot of time debugging. Or, worse yet, the code would go into production to show errors in live data.

I can solve this with a bit of indirection -- if I see it properly as a "mapping" from a set of values to another, the wonderful map operator pops into mind:

    my ($number_for_start, $number_for_end) = map {
      defined (my $ret = $mapping{$_})
        or die "no such name: $_";
      $ret;
    } $start, $end;
Here, each value is placed into $_, and then we run the code in the block. The last expression evaluated (in this case, $ret) provides the elements of the output list.

But as I'm staring at this, I realize that once I've got the value of $number_for_start, I'll never need the original $start value again. So, another approach (again shooting for more simplification) is to use the "in-place-ness" of the foreach loop:

    foreach ($start, $end) {
      exists $mapping{$_}
        or die "no such name: $_";
      $_ = $mapping{$_};
    }
For each of the start and end values, if a mapping for it exists, replace the value with its mapped equivalent, otherwise die.

At this point, we've got two small, validated integers in the range of 0 to 6. It's time to start building the return value. We'll build it up as a list, then join the list with commas to get a single string:

    my @return;
    while (1) {
      push @return, $day_names[$start];
      last if $start == $end;
      ($start += 1) %= 7;
    }
Starting with the empty list, we'll push the day name for $start onto the end of the list. If this is the end, we also quit. Otherwise, we'll increment the value of $start, but in a "modulus 7" way, wrapping around from 6 back to 0. I really wanted to write ++$start %= 7, but that's sadly not permitted. This loop has stuff before the exit test, and stuff after the exit test, which is easiest to write as an "infinite" loop with an exit test in the middle.

For the final return value -- a simple join on the list thus created:

  return join ",", @return;
And to put that all together:

BEGIN {
  my @day_names = qw(Sun Mon Tue Wed Thu Fri Sat);
  my %mapping; @mapping{@day_names} = 0..$#day_names;

  sub day_string_range {
    die "too few args: @_" if @_ < 2;
    die "too many args: @_" if @_ > 2;
    my ($start,$end) = @_;
    foreach ($start, $end) {
      exists $mapping{$_}
        or die "no such name: $_";
      $_ = $mapping{$_};
    }
    my @return;
    while (1) {
      push @return, $day_names[$start];
      last if $start == $end;
      ($start += 1) %= 7;
    }
    return join ",", @return;
  }
  }
It's simple, but there are a lot of steps to get it done right. Sure, you can probably play "Perl golf" and "minimize the [key]strokes" to write this routine in about half the number of lines. But I think we've got enough here for the maintenance programmer to follow along or modify nicely, and it does the job with reasonable efficiency. Until next time, enjoy!

Randal L. Schwartz is a two-decade veteran of the software industry -- skilled in software design, system administration, security, technical writing, and training. He has coauthored the "must-have" standards: Programming Perl, Learning Perl, Learning Perl for Win32 Systems, and Effective Perl Programming, as well as writing regular columns for WebTechniques and Unix Review magazines. He's also a frequent contributor to the Perl newsgroups, and has moderated comp.lang.perl.announce since its inception. Since 1985, Randal has owned and operated Stonehenge Consulting Services, Inc.