REGEXP(6)REGEXP(6)
NAME
regexp – regular expression notation
DESCRIPTION
A
regular expression
specifies
a set of strings of characters.
A member of this set of strings is said to be
matched
by the regular expression. In many applications
a delimiter character, commonly
/,
bounds a regular expression.
In the following specification for regular expressions
the word ‘character’ means any character (rune) but newline.
The syntax for a regular expression
e0
is
e3: literal | charclass | '.' | '^' | '$' | '(' e0 ')'
e2: e3
| e2 REP
REP: '*' | '+' | '?'
e1: e2
| e1 e2
e0: e1
| e0 '|' e1
A
literal
is any non-metacharacter, or a metacharacter
(one of
.*+?[]()|\^$),
or the delimiter
preceded by
\.
A
charclass
is a nonempty string
s
bracketed
[ s ]
(or
[^s ]);
it matches any character in (or not in)
s.
A negated character class never
matches newline.
A substring
a-b,
with
a
and
b
in ascending
order, stands for the inclusive
range of
characters between
a
and
b.
In
s,
the metacharacters
-,
],
an initial
^,
and the regular expression delimiter
must be preceded by a
\;
other metacharacters
have no special meaning and
may appear unescaped.
A
.
matches any character.
A
^
matches the beginning of a line;
$
matches the end of the line.
The
REP
operators match zero or more
(*),
one or more
(+),
zero or one
(?),
instances respectively of the preceding regular expression
e2.
A concatenated regular expression,
e1 e2 ,
matches a match to
e1
followed by a match to
e2.
An alternative regular expression,
e0 | e1 ,
matches either a match to
e0
or a match to
e1.
A match to any part of a regular expression
extends as far as possible without preventing
a match to the remainder of the regular expression.
SEE ALSO
awk(1),
ed(1),
grep(1),
sam(1),
sed(1),
regexp(2)