Regular expressions in python allows for pattern searching with a string , this is already available in Linux using GREP command.
Some Tech guy told ” Knowing regular expression means the difference between solving a problem in 3 steps and 3000 steps.”
The pattern Matching is one of the vital function in Browser and MS office , for ex : the Ctlr F works on the principles of Regular expressions. We will discuss all the features of Regular Expressions in python on the following sections.
The following example illustrates the simplicity of Regular expressions:
Matching A phone number in ordinary programming language takes around 20 lines of code but whereas using regular expression it can be reduced to 3 lines.
mo = phonenumregx.search(‘My number is 415-123-123’)
print(‘Phone number found ‘ + mo.group()).
The search() method will return None if the regex pattern is not found in the string. If the pattern is found, the search() method returns a Match object.
Match objects have a group() method that will return the actual matched text from the searched string.
Pattern Matching Types
The | symbol is called pipe ,which is used match either or option. For Ex: abc | xyz . either matches abc or xyz.
The following example illustrate this:
The * and + matching
The * corresponds to zero or more occurance ,
Whereas + corresponds to one or more occurance.
The character classes are shorthand for Regular expressions. The list of character classes are as follows :
\d : Matches digits.
\D : Matches Other than digits.
\w : Any letter , numeric digit or the under score.
\W : Any character that is not a digit , numeric digit or the undersore.
\s : Any space , tab or newline character .
\S : Any character other than space ,tab or newline character.
We can make our own character classes like the following example:
regex = re.compile(r'[aeoiuAEIOU]’)
regex.findall(My name is TechRRB) .
The above expression matches the volwels present in the sentance passed and display the same.
The Carot and Doller symbol
The carot(^) symbol is used to match at the begininng .
The doller($) symbols is used to match at the end.
beginsWithHello = re.compile(r’^Hello’)
<_sre.SRE_Match object; span=(0, 5), match=’Hello’>
>>> beginsWithHello.search(‘He said hello.’) == None.
In the above example we used the carot sysmbol to match Hello at te beginning of the strings . In first case it shows the matched string and in the second case it matches zero and return None.
The second example for Doller is as follows:
>>> endsWithNumber = re.compile(r’\d$’)
>>> endsWithNumber.search(‘Your number is 42’)
<_sre.SRE_Match object; span=(16, 17), match=’2′>
>>> endsWithNumber.search(‘Your number is forty two.’) == None
In ths code , the $ is used to match at the end of the strings.
The wild card Character
The .(dot) in regular expression is called the wild character , it matches anything except newline .
To match newline ,we need to pass the second paramter as “re.DOTALL”.
>> atRegex = re.compile(r’.at’)
>>> atRegex.findall(‘The cat in the hat sat on the flat mat.’)
[‘cat’, ‘hat’, ‘sat’, ‘lat’, ‘mat’]
The sub() method to substitute the strings
The sub() method for Regex objects is passed two arguments.
The first argument is a string to replace any matches.
The second is the string for the regular expression.
The sub() method returns a string with the substitutions applied.
>> namesRegex = re.compile(r’Agent \w+’)
>>> namesRegex.sub(‘CENSORED’, ‘Agent Alice gave the secret documents to Agent Bob.’)
‘CENSORED gave the secret documents to CENSORED.’
The above code replaces the agent followed by word with CENSORED ,hence the output.
The regular expression are very important in the application programming , In this article I discussed briefly about the regular expression , if you found useful ,please share among your friends . Comments are welcome.