(M)  s i s t e m a   o p e r a c i o n a l   m a g n u x   l i n u x ~/ · documentação · suporte · sobre

 

3.15. Regular Expressions

In order to fully utilize the power of shell scripting, you need to master regular expressions.

3.15.1. A Brief Introduction to Regular Expressions

An expression is a set of characters that has an interpretation above and beyond its literal meaning. A quote symbol ("), for example, may denote speech by a person, ditto, or a meta-meaning for the symbols that follow. Regular expressions are sets of characters that UNIX endows with special features.

The main uses for regular expressions (REs) are text searches and string manipulation. An RE matches a single character or a set of characters.

  • The asterisk * matches any number of characters, including zero.

  • The dot . matches any one character, except a newline.

  • The question mark ? matches zero or one of the previous RE. It is generally used for matching single characters.

  • The plus + matches one or more of the previous RE. It serves a role similar to the *, but does not match zero occurrences.

  • The caret ^ matches the beginning of a line, but sometimes, depending on context, negates the meaning of a set of characters in an RE.

  • The dollar sign $ at the end of a an RE matches the end of a line.

  • Brackets [...] enclose a set of characters to match in a single RE.

    [xyz] matches the characters x, y, or z.

    [c-n] matches any of the characters in the range c to n.

    [^b-d] matches all characters except those in the range b to d. This is an instance of ^ negating or inverting the meaning of the following RE (taking on a role similar to ! in a different context).

  • The backslash \ escapes a special character, which means that character gets interpreted literally.

    A \$ reverts back to its literal meaning of "dollar sign", rather than its RE meaning of end-of-line.

  • Escaped "curly brackets" \{ \} indicate the number of occurrences of a preceding RE to match.

    It is necessary to escape the curly brackets since they have a different special character meaning otherwise.

    [0-9]\{5\} matches exactly five digits (characters in the range of 0 to 9).

    Caution

    Curly brackets are not available as an RE in awk.

"Sed & Awk", by Dougherty and Robbins (see Bibliography) gives a very complete and lucid treatment of REs.

3.15.2. Using REs in scripts

Sed, awk, and Perl, used as filters in scripts, take REs as arguments when "sifting" or transforming files or I/O streams. See Example A-4 and Example A-8 for illustrations of this.