Lesson 04 – Regular Expressions

Regular expressions describe text patterns.

Each text pattern in a regular expression is called a metacharacter.

=~ is the operator used for regular expressions.

When characters are written between [ and ] it means they are part of a character class. One character from the character class must match in order to continue evaluating the rest of the regular expression.

Inside a character class, – indicates a range and ^ indicates negation.

Perl has shortcuts for the most common character classes.

[a-zA-Z0-9_] can be written as \w and [^a-zA-Z0-9_] as \W.

Metacharacters.

. means match any character except a newline
\w means match any alphanumeric character or the underscore
\W means match any character that is not alphanumeric or the underscore
\d means match any character that is a digit
\D means match any character that is not a digit
\s means match any character that is a whitespace such as a space, newline or a tab
\S means match any character that is not a whitespace
^ means match the beginning of the line
$ means match the end of the line

^ and $ are called anchor metacharacters. They’re also sometimes called assertions.

Quantifiers describe how many times a character can be found in a string.

* means zero or more
+ means one or more
? means zero or one time
{n} means n times where n is an integer
{n,m}means any number of times between n and m
{n,} means n or more times

Modifiers.

i (Ignore case)
s (Single line)
u (Unicode)
m (Multiline)
x (Verbose)
l (Locale)

m/regular expression here/ is the same as /regular expression here/. It checks whether the first operand matches the text pattern.

s/find this regular expression/replace with this text/

Regex can be used to find a certain text and substitute it with another text.

The following example substitutes spaghetti with pizza:

#!/usr/bin/perl

use strict;
use warnings;

my $sentence = "I love eating spaghetti.";

$sentence =~ s/spaghetti/pizza/;

print $sentence, "\n";

This example substitutes the number of slices to 4:

my $order = "3 slices of plain pizza
5 slices of pepperoni pizza";

$order =~ s/\d+/4/g;
print "Your order has been changed to:\n", $order, "\n";

/g modifier means match the regex globally so it replaces all occurrences of a digit to 4.

The program prints this on the screen:

Your order has been changed to:
4 slices of plain pizza
4 slices of pepperoni pizza

When you want to take a portion of a string based on your regular expression, you must put parentheses around each pattern that you want to match. First matching part will be stored in $1, second matching part will be stored in $2, etc. We call this process capturing.

If you read perlrequick, there is this example:

($hours, $minutes, $second) = ($time =~ /(\d\d):(\d\d):(\d\d)/);

It’s capturing this:

($time =~ /(\d\d):(\d\d):(\d\d)/) # returns $1, $2, $3

The values are assigned to ($hours, $minutes, $second)
You need the parentheses to group the expression like this. Otherwise it’d first assign $time to $hours, then check $second (undef) against the regex. (Precedence issue with = and =~)

Notes: In Programming Perl, it says that an easy mistake is to think that \w matches a word. Use \w+ to match a word.

When you’re learning how to make regex, I found this very useful. http://gskinner.com/RegExr/

Learn Perl

Learning Perl is fun

Lesson 04 – Regular Expressions

Leave a Reply Cancel reply