73

What's the problem with regular expressions?

Regular expressions might seem tricky and hard to read and write especially for beginners.

Let’s see if it’s just a misunderstanding.

Getting started

A regular expression, also called regex or regexp, is a search pattern. This search pattern allows you to match a specific subsequence in a sequence of chars.

There are a lot of different engines you can use to match this subsequence with your pattern. The most popular are probably POSIX and PCRE.

Depending on what engine you use you may get different results but we won’t see that point in this tutorial.

PHP includes PCRE functions you can use to apply your regex to a sequence. For example, preg_match() searches subject for a match to the regular expression given in a pattern :

$chocolateString = "I want more chocolate in my chocolate.";
if (preg_match("/chocolate/", $chocolateString) === 1) {
    echo "I see some chocolate";
}

The “/” are delimiters for your pattern. Be careful, regex are case sensitive by default :

$chocolateString = "I want more Chocolate in my Chocolate.";
if (preg_match("/chocolate/", $chocolateString) === 1) {
    // this won't match here
} else {
    echo "I don't see any chocolate";
}

Fortunately, you can use the i modifier to fix this :

$chocolateString = "I want more Chocolate in my Chocolate.";
if (preg_match("/chocolate/i", $chocolateString) === 1) {
    echo "I see some chocolate";
}

More advanced examples

You can use meta characters to include alternatives and repetitions in your pattern :

$chocolateString = "I want more Chocolate in my Chocolate.";
if (preg_match('/(choco|late)/', $chocolateString) === 1) { 
    echo "I see some choco but it might be too late!";
}

The | meta character allows for alternative branches.

You can even define subpatterns :

<?php
$chocolateString = "I'm already late but I want more milk in my chocomilk.";
if (preg_match("/choco(late|milk)/i", $chocolateString) === 1) {
    echo "I see some milk or chocolate.";
}

Here we don’t care about the first “late”, the parentheses allow for matching both “chocolate”, “choco” and “chocomilk”. You can verify that with the following :

<?php
$chocolateString = "I'm already late but I want more milk in my chocomilk.";
$test = preg_match_all("/choco(late|milk)/", $chocolateString, $matches);
print_r($matches);

this would print :

Array
(
    [0] => Array
        (
            [0] => chocomilk
        )

    [1] => Array
        (
            [0] => milk
        )

)

Regex can be bad

As you can see, you have to use a specific syntax to make your search pattern work. It’s just a matter of practice and using the right modifier-quantifier at the end of the day.

But problems might happen. It might take much much longer to process your search if your pattern is bad. A bad pattern might trigger a lot of unnecessary operations instead of aiming at what you need.

This is not micro-optimization! Bad regexes can take up to thousands of milliseconds whereas several tens milliseconds at most with the right pattern.

Use regex101 to test your pattern

Regex101 is one of the most popular online testing tools. It has an extra cool interface with great features such as save and share, debugger and code generator.

Besides, it will provide some useful explanations to help you understand why your pattern is not working.

Don’t use regex for everything and nothing

PHP has built-in filters for validation. For example, instead of trying to write a custom regex pattern to validate e-mail addresses, use the following filter :

if (filter_var($email, FILTER_VALIDATE_EMAIL)) { }

There are filters to validate IPs, integers, URLs and even regexp.

If you need to make sure that a string contains some specific chars it’s probably a better idea to use something like strpos() :

$target = 'choco';
$string = 'chocolate, chocolate, chocolate';
$pos    = strpos($string, $target);

if ($pos !== false) {
    echo "This has some choco.";
}

To another extent, if you need to parse HTML or XML then using a parser is a much much better idea than trying to match things with a regex pattern.

But learn regex anyway

There are some great PHP functions or libraries out there. Most of the time, you can get what you want without any regex.

However, you must be able to read them. It’s not rare to have questions about regexes during interviews.

Besides, regexes are available across many languages, so statistically, you will have to deal with them.