Cover V01, I03
Article
Sidebar 1
Sidebar 2

sep92.tar


Sidebar: Regular Expression Metacharacters

The following are the metacharacters used within regular expressions by the editors ed, ex, and vi; the stream editor, sed; the search programs grep and egrep; and by the little language, awk. Some of the metacharacters are used by all of these programs. Others are only used by a few of the programs.

. (Period, used by all) Matches any character except newline. Example: "a.c" matches "abc," "a2c," or "a c."

[] (Brackets, used by all) Defines a set of characters from which any one may match. Example: "a[bc]d" matches "abd" or "acd." The following are sub-metacharacters that can be used within the brackets. ^ (Caret) If used as the first character of the set, it excludes the set from matching. Everything except what is in the set matches. Example: "ab[^cd]e" could match "abae" or "ab4e" since neither of those have "c" or "d" as the third character. - (Hyphen) If used between two characters in a set, a range of ASCII characters is specified. It is a regular hyphen character when it is the first character of the set or the first character after the caret. Examples: [0-9] Match any single digit. [a-z] Match lower case only. [A-Za-z] Match any alphabetic. [^A-Za-z0-9] Match any non-alphanumeric. ] (Close bracket) If used as first character of set or first after a caret, it acts as the close bracket character instead of the matching bracket for the set. Example: "[]ab]" matches "]" or "a" or "b."

* (Asterisk, or star, used by all) Matches 0 or more of the previous character or set. Example: "ab*c" matches "abc" or "abbbbbc," but also matches "ac" because of the zero. It always matches the longest possible string.

+ (Plus, used by egrep and awk) Matches one or more of the previous character set. Example: "ab+c" matches "abc" or "abbbbbc," but not "ac" since there must be at least one "b."

? (Question, used by egrep and awk) Matches zero or one of the previous character set (in egrep only). Example: "ab?c" matches "ac" or "abc" only.

\{min,max\} (Escaped braces with a number min, or a number min and a comma, or a number min and a comma and another number max, where 0 <= min <= max <= 255; used by ed, sed, and grep) Follows a character or set to specify a range of match repetitions. With just \{min\} it matches exactly min times. With \{min,\} it matches a minimum of min times. With \{min,max\} it matches within the range of min to max inclusive times. This sets up a repeat count for the number of times to match. Examples: "a\{5\}" matches five "a" characters, but "a\{5,\}" (notice the comma) must find a minimum of five "a" characters, although more than five will also be a match. "a\{5,7\}" must find at least five "a" characters, but no more than seven will be a match.

^ (Caret, used by all) When used as first metacharacter, the expression matches only when at the beginning of a line. Example: "^abc" matches if and only if "abc" is at the beginning of a line.

$ (Dollar, used by all) When used as last metacharacter, the expression matches only when at the end of a line. Example: "abc$" matches if and only if "abc" is at the end of a line.

\ (Backslash, used by all) "Escapes" the following metacharacter by turning it into a regular character. Example: "a\.b" is "a.b," not "a", then any character, then "b."

() (Parentheses, used by egrep and awk) Creates a subexpression to which any of the modifier metacharacters may apply, such as *, +, or ?. Example: "(abc)?" matches "abc" or "abcabc" since "?" matches 0 or 1 of the previous subexpression.

\(\) (Escaped parentheses, used by ed, ex, and sed) Surrounds a subexpression, tagging it for subsequent reference. The subsequent reference uses the notation "\#" where "#" represents any single digit from 1 to 9. For instance, "\1" refers to the first tagged subexpression, and "\2" refers to the second. Example: "abc\(def\)ghi" will match "abcdefghi", but the "def" subexpression can be referred to later as "\1."

\<\> (Escaped angle brackets, used by ed, ex, and vi) Used with a character or set to be matched only at the beginning of a word (if \<) or end of a word (if \>). Example: "\<abc" matches "abcde" but not "123abc", while "abc\>" matches "123abc" but not "abcde", presuming those are words in the midst of other text. "\<abc\>" matches only if "abc" is a whole word.

| (Vertical bar, OR, used by egrep and awk) Separates two regular expressions (in egrep only) that are OR'd together, which means either or both may match. Example: "abc|def" matches either "abc" or "def" or both anywhere in the line.

The following metacharacters can be used in ex (and one also in sed) in the replacement expression of a search and replace operation.

& (Ampersand, used by ex and sed) Repeat entire regular expression used by the search in the replacement. Example, "/[0-9]\{5\}/000&/" says search for any five consecutive digits (the subexpression between the first and second slashes), and replace them with three "0" characters followed by the same five digits that matched, whatever they may be (represented by the subexpression between the second and third slashes). In other words, replace every five-digit number with an equivalent eight-digit number.

~ (Tilde, used by ex) Reuse the replacement expression from the previous replacement command. Example: given the example shown for the ampersand metacharacter, following it with a new replacement command ("[0-9]\{10\}/~/") will find any 10-digit number and replace it with an equivalent 13-digit number, since the previous replacement prepended three "0" digits to whatever matches.