It's
All About Context
Randal L. Schwartz
A recent article on Slashdot (www.slashdot.org) discussed
the surprising story of a high-school student who made a small Perl
programming mistake that got him into a big amount of trouble. On
his dynamically generated Web page, he had used the code:
my($f) = `fortune';
when what he should have done was:
my $f = `fortune';
Now, both of these invoke the fortune program, capturing its
random quip of text. In this particular case, when the school administrators
visited the boy's page, fortune had selected a quote from
a William Gibson novel:
I put the shotgun in an Adidas bag and padded it out with four
pairs of tennis
socks, not my style at all, but that was what I
was aiming for: If they think
you're crude, go technical; if
they think you're technical, go crude. I'm a
very technical
boy. So I decided to get as crude as possible. These days,
though,
you have to be pretty technical before you can even aspire to
crudeness.
--Johnny Mnemonic, by William Gibson
Now, if you can't tell what would be in $f for both
of the code fragments above, read on, and you'll see how an
unwitting mistake can leave someone with an unexpected police file.
The problem is a matter of context (in more ways than one). Perl's
operators are "context sensitive", in that the operator
can detect whether it is being used in a place looking for a scalar
rather than looking for a list, and return an appropriate result.
In this case, the backtick operator returns a differing result,
depending on whether it was invoked in a scalar context or a list
context.
To understand this, first look at how to detect context. Starting
with the basics, the right-hand side of an assignment to a scalar
variable must be a scalar value:
$a = ...
Whatever's on the right has to be a scalar value, because that's
the only thing that'll fit into a scalar variable.
Similarly, the right side of an assignment to an array can be
any list value:
@b = ...
Let's put some things in both places and see how results differ.
One that you are probably familar with is the readline operator,
spelled "less-than filehandle greater-than":
$a = <STDIN>
In this "scalar context", the readline operator returns
the next line to be read, or undef if an I/O error occurs (such
as at end-of-file).
However, the very same operator and punctuation in a "list
context" yields all the remaining lines until end-of-file is
reached, or an empty list if already at end of file:
@b = <STDIN>
Now, Larry Wall could have come up with two different operators for
these two similar operations, but by making the operator "context
sensitive", we get the savings of brainspace and keyboardspace.
Apparently, we humans are fairly good at grokking context, so why
not leverage off that a bit in the language?
Similarly, the matching operator in a scalar context returns a
success value:
$a = /(\w+) (\d+)/
which is true if the regular expression matches $_, and false
otherwise. If the result was true, we'd look in $1 for
the word, and $2 for the digit string. A shorter way to do
the same thing, though, is to use the same regular expression in a
list context:
@b = /(\w+) (\d+)/
And now, the regular expression match operator is not returning true/false,
but rather a list of two items (the two memories) or an empty list
if the match was not successful. Thus, $b[0] ends up with the
word, and $b[1] gets the digit-string.
In both of these operators, the scalar interpretation and the
list interpretation are related, but not by any predictable formula.
That's the way it is in general. You can't apply a general
rule, except that there are no rules. It's whatever Larry thought
would be the most practical and useful, and least surprising (well,
least surprising to Larry).
A few more examples to keep getting our feet progressively more
wet, and then we'll look some more at detecting context.
In a scalar context, gmtime returns a human-readable string
of the GMT time (defaulting to the current time, but optionally
converting any UNIX-epoch integer timestamp). But in a list context,
a nine-element list contains the various second, minute, hour (and
so on) pieces of the time for easy manipulation.
The readdir operator acts similarly to the readline
operator, returning the "next" name from a directory in
a scalar context, but all the remaining names in a list context.
And finally, a common operation is to use the name of an array
in both contexts. The "operator" of @x in a list
context yields the current elements of the @x array. However,
in a scalar context, the same "operator" yields the number
of elements in that same array (sometimes called the "length
of the array", but that can be confusing, so I'd rather
not use that here).
Please note on that last example that at no time does Perl first
extract all the elements in the scalar context, only to then somehow
"convert" it to a count. From the very beginning, Perl
knows that the @x operation is in a scalar context and performs
the "scalar" version of that operation.
Put another way, there is no way to "coerce" or "convert"
a list to a scalar, because it can never happen, in spite
of what some of the so-called commercial Perl documentation incorrectly
implies.
So, where does context occur? Everywhere! Let's introduce
a convention for a moment, to make it easy to talk. If a portion
of the expression is evaluated in scalar context, let's use
SCALAR to represent that:
$a = SCALAR;
And similarly, we'll show list context with LIST:
@x = LIST;
Let's look at some other common ones. Assigning to the element
of an array looks like this:
$w[SCALAR] = SCALAR;
Note that the subscripting expression is evaluated in a scalar context.
That means if we had an array name on the left, and a readline operation
on the right, we'll use scalar meanings for both:
$w[@x] = <STDIN>;
and assign a single line (or undef) to the element of @w,
indexed by the number of elements currently in @x. As
an aside, that's always evaluated before the assignment starts
to happen, so:
$w[@w] = <STDIN>;
adds the next line to the end of @w, although you'll
probably scare people by doing that.
Slices are in list context, even with only a single value for
an index:
@w[LIST] = LIST;
@w[3] = LIST;
Even hash slices work that way:
@h{LIST} = LIST;
Lists of scalars are always lists, even with only a single value (or
no values) on the left:
($a, $b, $c) = LIST;
($a) = LIST;
() = LIST;
And then we have the context provided by some common operations:
foreach (LIST) { ... }
if (SCALAR) { ... }
while (SCALAR) { ... }
@w = map { LIST } LIST;
@x = grep { SCALAR } LIST;
One useful rule is that anything being evaluated for a true/false
value is always a scalar, as shown in the if, while,
and grep items above.
Subroutines act "at a distance". The return value of
a subroutine is always evaluated in the context of the invocation
of the subroutine. Here's the basic form:
$a = &fred(LIST); sub fred { ....; return SCALAR; }
@b = &barney(LIST); sub barney { ....; return LIST; }
But what if I had used fred for both of those? Yes, the context
would pass through, and be different for different invocations! If
that makes your head spin, try not to do that for a while until you
fully understand it.
Speaking of subroutines: a common thing to do is to create a lexical
variable (often called a my-variable) to hold incoming subroutine
arguments or temporary values, as in:
sub marine {
my ($a) = @_;
...
}
In this case, if the parentheses are included, we get list context
(imagine the my is not there). The many elements of @_
get returned, but only the first of which is stored into $a
(the remainder are ignored).
However, the same expression without parentheses provides scalar
context to the right side:
my $a = @_;
which gets the number of elements in $@ (the argument
list). There's not one that's "more right"; you
need to learn the difference, and use the appropriate one.
And that brings us full circle to the question I posted at the
beginning. What is the difference? Backquotes in a scalar
context generate the entire value as one string:
my $f = `fortune';
but the same expression in a list context generates a list of items
(one line per item, just like reading from a file), only the first
of which can fit into the scalar on the left:
my ($f) = `fortune';
So $f gets just the first line of the fortune, harmless for
those one-liners, but pretty devastating when a school official sees
that a student has apparently written:
on a Web page, in light of the tragic school shootings we hear
about these days. Nevermind that a simple reload of the page had
shown something different each time, or that this is really just
a random quote.
The police were called, the boy was questioned, and now has a
police file simply because he added some erroneous parentheses.
No charges resulted, but the embarrassment here is certainly unwelcome.
(I say this from personal experience -- my own ongoing saga
about misplaced understandings and resulting criminal charges can
be found at the archive located at:
http://www.lightlink.com/fors/
The embarrassment was also avoidable with a little more care in programming
and quality-assurance testing. So when you hack Perl, and you wonder
about context, get the text right or you may end up a con. Until next
time, enjoy!
Randal L. Schwartz is a two-decade veteran of the software
industry -- skilled in software design, system administration,
security, technical writing, and training. He has coauthored the
"must-have" standards: Programming Perl, Learning
Perl, Learning Perl for Win32 Systems, and Effective
Perl Programming, as well as writing regular columns for WebTechniques
and Unix Review magazines. He's also a frequent contributor
to the Perl newsgroups, and has moderated comp.lang.perl.announce
since its inception. Since 1985, Randal has owned and operated Stonehenge
Consulting Services, Inc.
|