Implementing Regular Expressions

Python re Module

To perform regular expressions in Python, we first need to import Python’s built-in regular expression module, re.

We can use the PYTHEX website to test Python regular expression matching!

import re # Lets us perform regular expression matching

Then, we can compile a regular expression object using compile()! This object can then be used to match strings for a particular pattern.

# can be used to match 3 digits XXX
regex_object = re.compile(r"[0-9]{3}")

Note the use of the r before the string. In Python, this stands for raw string, meaning escape sequences \ won’t be translated!

Below we provide some of the commonly used functions in re. Assume we’ve already compiled a regular expression object regex for use.

Match Objects

If we obtain a Match object from a re function, there is a variety of information we can pull from it.

# assume match is a Match object 
span = match.span() # tuple containing start and end position of the match
string = match.string # string passed in ("hello world")
group = match.group() # part of the string where the match was found

Additionally, we can even use Match objects for string parsing! In some languages like Python, the precedence operator ( ) is also an operator for grouping values that can be pulled out of our regular expression!

# matches phone number XXXXXXXXXX
# we group by the first 3 digits, then the next 3 digits, finally the last 4 digits
regex = re.compile(r"([0-9]{3})([0-9]{3})([0-9]{4})")
m = regex.match("1234567890")
 
print(m.group(0)) # "123456890" - 0 returns the entire match
print(m.group(1)) # "123"
print(m.group(2)) # "456"
print(m.group(3)) # "7890"

Group IDs are assigned in the order of their open parentheses, as we read the string left to right! Note that we can nest groups together, though we cannot overlap groups.

regex.findall(pattern,string)

Finds all matches of a regular expression in a string, or an empty list if no matches are found.

regex = re.compile(r"cmsc3[0-9]{2}")
regex.findall("cmsc330 cmsc250 cmsc351") # ['cmsc330', 'cmsc351']
regex.findall("cmsc110 cmsc131 cmsc216") # []

regex.fullmatch(string)

Returns a Match object if the entire string matches the regular expression, or None of this does not occur.

regex = re.compile(r"ab*")
regex.fullmatch("abbbbbb") # Returns Match object
regex.fullmatch("abbbbc") # None

regex.match(string)

Returns a Match object if any substring starting from the beginning of string matches the regular expression, or None of this does not occur.

regex = re.compile(r"ab*")
regex.fullmatch("abbbcc") # Returns Match object
regex.fullmatch("cabbb") # None