Sidebar: Regular Expression Metacharacters
The following are the metacharacters used within regular
expressions
by the editors ed, ex, and vi; the stream editor,
sed; the search programs grep and egrep; and
by the little language, awk. Some of the metacharacters
are used by
all of these programs. Others are only used by a few
of the programs.
. (Period, used by all) Matches any character
except newline. Example: "a.c" matches "abc,"
"a2c,"
or "a c."
[] (Brackets, used by all) Defines a set of
characters from which any one may match. Example: "a[bc]d"
matches "abd" or "acd." The following
are sub-metacharacters
that can be used within the brackets.
^ (Caret) If used as the first character of the
set, it excludes the set from matching. Everything except
what is
in the set matches. Example: "ab[^cd]e" could
match "abae"
or "ab4e" since neither of those have "c"
or "d"
as the third character.
- (Hyphen) If used between two characters in a
set, a range of ASCII characters is specified. It is
a regular hyphen
character when it is the first character of the set
or the first character
after the caret. Examples:
[0-9] Match any single digit.
[a-z] Match lower case only.
[A-Za-z] Match any alphabetic.
[^A-Za-z0-9] Match any non-alphanumeric.
] (Close bracket) If used as first character of
set or first after a caret, it acts as the close bracket
character
instead of the matching bracket for the set. Example:
"[]ab]"
matches "]" or "a" or "b."
* (Asterisk, or star, used by all) Matches
0 or more of the previous character or set. Example:
"ab*c"
matches "abc" or "abbbbbc," but
also matches "ac"
because of the zero. It always matches the longest possible
string.
+ (Plus, used by egrep and awk) Matches
one or more of the previous character set. Example:
"ab+c"
matches "abc" or "abbbbbc," but
not "ac" since
there must be at least one "b."
? (Question, used by egrep and awk)
Matches zero or one of the previous character set (in
egrep
only). Example: "ab?c" matches "ac"
or "abc"
only.
\{min,max\} (Escaped braces with a number min,
or a number min and a comma, or a number min and a comma
and another
number max, where 0 <= min <= max <= 255; used
by ed, sed,
and grep) Follows a character or set to specify a range
of
match repetitions. With just \{min\} it matches exactly
min
times. With \{min,\} it matches a minimum of min times.
With
\{min,max\} it matches within the range of min to max
inclusive
times. This sets up a repeat count for the number of
times to match.
Examples: "a\{5\}" matches five "a"
characters, but
"a\{5,\}" (notice the comma) must find a minimum
of five "a"
characters, although more than five will also be a match.
"a\{5,7\}"
must find at least five "a" characters, but
no more than seven
will be a match.
^ (Caret, used by all) When used as first metacharacter,
the expression matches only when at the beginning of
a line. Example:
"^abc" matches if and only if "abc"
is at the beginning
of a line.
$ (Dollar, used by all) When used as last metacharacter,
the expression matches only when at the end of a line.
Example: "abc$"
matches if and only if "abc" is at the end
of a line.
\ (Backslash, used by all) "Escapes"
the following metacharacter by turning it into a regular
character.
Example: "a\.b" is "a.b," not "a",
then any
character, then "b."
() (Parentheses, used by egrep and awk)
Creates a subexpression to which any of the modifier
metacharacters
may apply, such as *, +, or ?. Example: "(abc)?"
matches "abc"
or "abcabc" since "?" matches 0
or 1 of the
previous subexpression.
\(\) (Escaped parentheses, used by ed,
ex, and sed) Surrounds a subexpression, tagging it
for subsequent reference. The subsequent reference uses
the notation
"\#" where "#" represents any single
digit from 1
to 9. For instance, "\1" refers to the first
tagged subexpression,
and "\2" refers to the second. Example: "abc\(def\)ghi"
will match "abcdefghi", but the "def"
subexpression
can be referred to later as "\1."
\<\> (Escaped angle brackets, used by ed,
ex, and vi) Used with a character or set to be matched
only at the beginning of a word (if \<) or end of
a word (if \>).
Example: "\<abc" matches "abcde"
but not "123abc",
while "abc\>" matches "123abc"
but not "abcde",
presuming those are words in the midst of other text.
"\<abc\>"
matches only if "abc" is a whole word.
| (Vertical bar, OR, used by egrep and
awk) Separates two regular expressions (in egrep only)
that
are OR'd together, which means either or both may match.
Example:
"abc|def" matches either "abc" or
"def" or
both anywhere in the line.
The following metacharacters can be used in ex (and
one also
in sed) in the replacement expression of a search and
replace
operation.
& (Ampersand, used by ex and sed)
Repeat entire regular expression used by the search
in the replacement.
Example, "/[0-9]\{5\}/000&/" says search
for any five consecutive
digits (the subexpression between the first and second
slashes), and
replace them with three "0" characters followed
by the same
five digits that matched, whatever they may be (represented
by the
subexpression between the second and third slashes).
In other words,
replace every five-digit number with an equivalent eight-digit
number.
~ (Tilde, used by ex) Reuse the replacement
expression from the previous replacement command. Example:
given the
example shown for the ampersand metacharacter, following
it with a
new replacement command ("[0-9]\{10\}/~/")
will find any 10-digit
number and replace it with an equivalent 13-digit number,
since the
previous replacement prepended three "0" digits
to whatever
matches.
|