Cover V10, I06
Article

jun2001.tar


It's All About Context

Randal L. Schwartz

A recent article on Slashdot (www.slashdot.org) discussed the surprising story of a high-school student who made a small Perl programming mistake that got him into a big amount of trouble. On his dynamically generated Web page, he had used the code:

my($f) = `fortune';
when what he should have done was:

my $f = `fortune';
Now, both of these invoke the fortune program, capturing its random quip of text. In this particular case, when the school administrators visited the boy's page, fortune had selected a quote from a William Gibson novel:

    I put the shotgun in an Adidas bag and padded it out with four pairs of tennis
    socks, not my style at all, but that was what I was aiming for: If they think
    you're crude, go technical; if they think you're technical, go crude. I'm a
    very technical boy. So I decided to get as crude as possible. These days,
    though, you have to be pretty technical before you can even aspire to
    crudeness.

    --Johnny Mnemonic, by William Gibson

Now, if you can't tell what would be in $f for both of the code fragments above, read on, and you'll see how an unwitting mistake can leave someone with an unexpected police file.

The problem is a matter of context (in more ways than one). Perl's operators are "context sensitive", in that the operator can detect whether it is being used in a place looking for a scalar rather than looking for a list, and return an appropriate result. In this case, the backtick operator returns a differing result, depending on whether it was invoked in a scalar context or a list context.

To understand this, first look at how to detect context. Starting with the basics, the right-hand side of an assignment to a scalar variable must be a scalar value:

$a = ...
Whatever's on the right has to be a scalar value, because that's the only thing that'll fit into a scalar variable.

Similarly, the right side of an assignment to an array can be any list value:

@b = ...
Let's put some things in both places and see how results differ. One that you are probably familar with is the readline operator, spelled "less-than filehandle greater-than":

$a = <STDIN>
In this "scalar context", the readline operator returns the next line to be read, or undef if an I/O error occurs (such as at end-of-file).

However, the very same operator and punctuation in a "list context" yields all the remaining lines until end-of-file is reached, or an empty list if already at end of file:

@b = <STDIN>
Now, Larry Wall could have come up with two different operators for these two similar operations, but by making the operator "context sensitive", we get the savings of brainspace and keyboardspace. Apparently, we humans are fairly good at grokking context, so why not leverage off that a bit in the language?

Similarly, the matching operator in a scalar context returns a success value:

$a = /(\w+) (\d+)/
which is true if the regular expression matches $_, and false otherwise. If the result was true, we'd look in $1 for the word, and $2 for the digit string. A shorter way to do the same thing, though, is to use the same regular expression in a list context:

@b = /(\w+) (\d+)/
And now, the regular expression match operator is not returning true/false, but rather a list of two items (the two memories) or an empty list if the match was not successful. Thus, $b[0] ends up with the word, and $b[1] gets the digit-string.

In both of these operators, the scalar interpretation and the list interpretation are related, but not by any predictable formula. That's the way it is in general. You can't apply a general rule, except that there are no rules. It's whatever Larry thought would be the most practical and useful, and least surprising (well, least surprising to Larry).

A few more examples to keep getting our feet progressively more wet, and then we'll look some more at detecting context.

In a scalar context, gmtime returns a human-readable string of the GMT time (defaulting to the current time, but optionally converting any UNIX-epoch integer timestamp). But in a list context, a nine-element list contains the various second, minute, hour (and so on) pieces of the time for easy manipulation.

The readdir operator acts similarly to the readline operator, returning the "next" name from a directory in a scalar context, but all the remaining names in a list context.

And finally, a common operation is to use the name of an array in both contexts. The "operator" of @x in a list context yields the current elements of the @x array. However, in a scalar context, the same "operator" yields the number of elements in that same array (sometimes called the "length of the array", but that can be confusing, so I'd rather not use that here).

Please note on that last example that at no time does Perl first extract all the elements in the scalar context, only to then somehow "convert" it to a count. From the very beginning, Perl knows that the @x operation is in a scalar context and performs the "scalar" version of that operation.

Put another way, there is no way to "coerce" or "convert" a list to a scalar, because it can never happen, in spite of what some of the so-called commercial Perl documentation incorrectly implies.

So, where does context occur? Everywhere! Let's introduce a convention for a moment, to make it easy to talk. If a portion of the expression is evaluated in scalar context, let's use SCALAR to represent that:

$a = SCALAR;
And similarly, we'll show list context with LIST:

@x = LIST;
Let's look at some other common ones. Assigning to the element of an array looks like this:

$w[SCALAR] = SCALAR;
Note that the subscripting expression is evaluated in a scalar context. That means if we had an array name on the left, and a readline operation on the right, we'll use scalar meanings for both:

$w[@x] = <STDIN>;
and assign a single line (or undef) to the element of @w, indexed by the number of elements currently in @x. As an aside, that's always evaluated before the assignment starts to happen, so:

$w[@w] = <STDIN>;
adds the next line to the end of @w, although you'll probably scare people by doing that.

Slices are in list context, even with only a single value for an index:

@w[LIST] = LIST;
@w[3] = LIST;
Even hash slices work that way:

@h{LIST} = LIST;
Lists of scalars are always lists, even with only a single value (or no values) on the left:

($a, $b, $c) = LIST;
($a) = LIST;
() = LIST;
And then we have the context provided by some common operations:

foreach (LIST) { ... }
if (SCALAR) { ... }
while (SCALAR) { ... }
@w = map { LIST } LIST;
@x = grep { SCALAR } LIST;
One useful rule is that anything being evaluated for a true/false value is always a scalar, as shown in the if, while, and grep items above.

Subroutines act "at a distance". The return value of a subroutine is always evaluated in the context of the invocation of the subroutine. Here's the basic form:

$a = &fred(LIST); sub fred { ....; return SCALAR; }
@b = &barney(LIST); sub barney { ....; return LIST; }
But what if I had used fred for both of those? Yes, the context would pass through, and be different for different invocations! If that makes your head spin, try not to do that for a while until you fully understand it.

Speaking of subroutines: a common thing to do is to create a lexical variable (often called a my-variable) to hold incoming subroutine arguments or temporary values, as in:

sub marine {
  my ($a) = @_;
  ...
}
In this case, if the parentheses are included, we get list context (imagine the my is not there). The many elements of @_ get returned, but only the first of which is stored into $a (the remainder are ignored).

However, the same expression without parentheses provides scalar context to the right side:

my $a = @_;
which gets the number of elements in $@ (the argument list). There's not one that's "more right"; you need to learn the difference, and use the appropriate one.

And that brings us full circle to the question I posted at the beginning. What is the difference? Backquotes in a scalar context generate the entire value as one string:

my $f = `fortune';
but the same expression in a list context generates a list of items (one line per item, just like reading from a file), only the first of which can fit into the scalar on the left:

my ($f) = `fortune';
So $f gets just the first line of the fortune, harmless for those one-liners, but pretty devastating when a school official sees that a student has apparently written:

    I put the shotgun in an Adidas bag and padded it out with four pairs of tennis

on a Web page, in light of the tragic school shootings we hear about these days. Nevermind that a simple reload of the page had shown something different each time, or that this is really just a random quote.

The police were called, the boy was questioned, and now has a police file simply because he added some erroneous parentheses. No charges resulted, but the embarrassment here is certainly unwelcome. (I say this from personal experience -- my own ongoing saga about misplaced understandings and resulting criminal charges can be found at the archive located at:

http://www.lightlink.com/fors/
The embarrassment was also avoidable with a little more care in programming and quality-assurance testing. So when you hack Perl, and you wonder about context, get the text right or you may end up a con. Until next time, enjoy!

Randal L. Schwartz is a two-decade veteran of the software industry -- skilled in software design, system administration, security, technical writing, and training. He has coauthored the "must-have" standards: Programming Perl, Learning Perl, Learning Perl for Win32 Systems, and Effective Perl Programming, as well as writing regular columns for WebTechniques and Unix Review magazines. He's also a frequent contributor to the Perl newsgroups, and has moderated comp.lang.perl.announce since its inception. Since 1985, Randal has owned and operated Stonehenge Consulting Services, Inc.