Regular expressions are a powerful tool for finding, examining and/or modifying text. Regular expressions themselves are, with a general pattern notation almost like a mini programming language, allowing you to define and parse text. They enable you to search for patterns within a string, extracting matches flexible and precise. However, you should note that because regular expressions are more powerful, they also suffer from added overhead and are slower than the more basic string functions. You should make careful consideration and only use regular expressions if you have a particular need.
PHP supports two different types of regular expressions: POSIX-extended and Perl-Compatible Regular Expressions (PCRE). The PCRE functions are more commonly used, are more powerful than the POSIX ones and faster as well
In a regular expression, most characters match only themselves. For instance, if you search for the regular expression “foo” in the string “only a fool does not use regular expressions” you get a match because “foo” occurs in that string. Some characters have a special meaning. For instance, the dollar sign ($) is used to match strings that end with the given pattern. Similarly, a caret (^) character at the beginning of a regular expression indicates that it must match the beginning of the string. Characters that match themselves are called literals while characters that have special meanings are called metacharacters.
The dot (.) metacharacter matches any single character except newline (\). Hence, the pattern h.t matches hat, hothit, hut and h7t. The vertical pipe (|) metacharacter is used for alternatives in a regular expression. It behaves much like a logical OR operator and you should use it if you want to construct a pattern that matches more than one set of characters. For instance, the pattern Monday|Tuesday|Wednesday matches strings that contain “Monday” or “Tuesday” or “Wednesday”. Parentheses are used to group sequences. For example, (fri|satur)day matches “friday” or “saturday”. Using parentheses to group characters for alternation is called grouping.
If we want to match a literal metacharacter in a pattern, we have to escape it with a backslash.
To specify a set of acceptable characters in a pattern, we can either build a character class ourself, or use a predefined one. A character class lets us represent a bunch of characters as a single item. We can build our own character class by enclosing the acceptable characters in square brackets. A character class matches any one of the characters in the class. For example a character class [abc] matches a, b or c. To define a range of characters, we just add the first and last characters separated by hyphen. For example, to match all alphanumeric characters: [a-zA-Z0-9]. We can also create a negated character class, which matches any character that is not in the class. To create a negated character class, we start the character class with ^: [^0-9].
Metacharacters +, *, ?, and {} affect the number of times a pattern should be matched. + means “Match one or more of the preceding expression”, * means “Match zero or more of the preceding expression”, and ? means “Match zero or one of the preceding expression”. Curly braces {} are be used differently. With a single integer, {n} means “match exactly n occurrences of the preceding expression”, with one integer and a comma, {n,} means “match n or more occurrences of the preceding expression”, and with two comma-separated integers {n,m} means “match the previous character if it occurs at least n times, but no more than m times”.