Welcome to the Treehouse Community

Want to collaborate on code errors? Have bugs you need feedback on? Looking for an extra set of eyes on your latest project? Get support with fellow developers, designers, and programmers of all backgrounds and skill levels here with the Treehouse Community! While you're at it, check out some resources Treehouse students have shared here.

Looking to learn something new?

Treehouse offers a seven day free trial for new students. Get access to thousands of hours of content and join thousands of Treehouse students and alumni in the community today.

Start your free trial

Python Regular Expressions in Python Introduction to Regular Expressions Email Groups

create a variable named contacts that is an re.search

"Create a new variable named contacts that is an re.search() where the pattern catches the email address and phone number from string. Name the email pattern email and the phone number pattern phone. The comma and spaces * should not* be part of the groups."

emails.py
import re

string = '''Love, Kenneth, kenneth+challenge@teamtreehouse.com, 555-555-5555, @kennethlove
Chalkley, Andrew, andrew@teamtreehouse.co.uk, 555-555-5556, @chalkers
McFarland, Dave, dave.mcfarland@teamtreehouse.com, 555-555-5557, @davemcfarland
Kesten, Joy, joy@teamtreehouse.com, 555-555-5558, @joykesten'''

contacts = re.search(r^
    (?P<email>[-\w\d.+]+@[-\w\d.]+) 
     ,\s
     (?P<phone>\d\{3}-\d\{3}-\d{4}$
     , string)

6 Answers

Chris Freeman
MOD
Chris Freeman
Treehouse Moderator 68,441 Points

A few thing needed cleaning up:

contacts = re.search(r''' #<-- Wrap in multiline quote. Remove leading caret (^)
    (?P<email>[-\w\d.+]+@[-\w\d.]+) 
     ,\s
     (?P<phone>\d{3}-\d{3}-\d{4})''' #<-- Add closing paren. Remove extra backslashes. Remove trailing $
     , string, re.X | re.M) #<-- Add verbose (for multiline quote), and multiline (for muliple lines in string) flags

Thank you Chris, When do i get to use the (^) caret?

Chris Freeman
Chris Freeman
Treehouse Moderator 68,441 Points

The caret is used to anchor the pattern to the beginning of the line. The current pattern could have also filled in with match anything from the beginning of the line:

contacts = re.search(r''' #<-- Wrap in multiline quote. Remove leading caret (^)
    ^.*  #<-- match Beginning if line followed by anything
    (?P<email>[-\w\d.+]+@[-\w\d.]+) 
     ,\s
     (?P<phone>\d{3}-\d{3}-\d{4})''' #<-- Add closing paren. Remove extra backslashes. Remove trailing $
     , string, re.X | re.M) #<-- Add verbose (for multiline quote), and multiline (for muliple lines in string) flags

Is there any way you can better clarify when the ^ and $ are needed and when it is not. I am unable to follow the post below.

Chris Freeman
Chris Freeman
Treehouse Moderator 68,441 Points

When developing a regular expression, you are defining pattern that is compared to a string or portion of the string to find a match. In a very simple sense, think of aligning the pattern starting at each character of the string checking for a match. The pattern is shifted one character at a time rechecking for a match in each case.

The special characters of caret (^) and dollar sign ($) represent specific alignment instructions for the pattern.

The caret indicates the pattern comparison must start at the first character of the searched string (or the beginning of the line and a multi-line string). The dollar sign indicates the pattern comparison must end at the last character of the searched string (or the end of the line in a multi-line string). These special characters can be thought of as anchors to tie the pattern to a specific position.

There are many reasons to choose to use these anchoring characters:

  • If your search pattern may match multiple places within a string, an anchor can help it find the first or last match occurrence.

  • Using both caret and $ you can force your pattern to consume the entire searched string.

  • Using an anchor can improve performance of your regex by reducing the amount of back traces done during pattern matching.

I'm sure there are others I've left off.

The caret and dollar sign are only used if your regex pattern is required to include the start or end of the searched string or line in a muli-line sting.

Nate Yu
Nate Yu
2,992 Points

I am confused that why in the pattern for email there are "-" in both [], it's necessary for email, right? but I still don't know why....

Chris Freeman
Chris Freeman
Treehouse Moderator 68,441 Points

The hyphen character is valid in usernames and in domain names so it needs to be included. In regex, a hyphen is also used in specifying a range of characters, like 0-9. To include a literal hyphen as a matching character, it needs to be listed first when in a character set so it's not confusing as part of a range.

Keep in mind the challenge is using simplified domain names. The full regex check for a valid domain name is very complicated.

Packages such as Django have built-in URL validators.

Hi Chris,

I had the same solution as this except for phone number I had: ([\d]{3}-[\d]{3}-[\d]{4})

how come this didn't work?

Chris Freeman
Chris Freeman
Treehouse Moderator 68,441 Points

Timothy Tseng, I’ve added the character set notation and it also works. Perhaps there is some other error. Could you include your full code?

Did you include the group name P<phone>?

Bright Zhao
Bright Zhao
2,505 Points

This is my way for this challenge, it passed:

contacts = re.search(r'''
    ^[\w]+,\s[\w]+,\s
    (?P<email>[\w+.]*@[\w.]*)
    ,\s
    (?P<phone>\d{3}-\d{3}-\d{4})
    ,\s
    @[\w]+$
''', string, re.X|re.I|re.M)

twitters = re.search(r'''
    @[\w]+$
''', string, re.X|re.M)

Is there anyway to be improved?

Got it !

Christopher Gunawan
Christopher Gunawan
10,754 Points

contacts = re.search(r''' (?P<email>[.+\w]+@[.\w]+) #email ,\s (?P<phone>\d{3}-\d{3}-\d{4}) #phone ''', string, re.X|re.M)

Here is my answer. My question is why do I need to add the ",\s" if I do not want to capture it?

Chris Freeman
Chris Freeman
Treehouse Moderator 68,441 Points

The pattern will only match text exactly. So the β€œ,\s” allows for matching the comma-space and content surrounding it, but since the additional pattern is outside of a group it won’t be captured.

contacts = re.search(r''' ^[\w]+,\s[\w]+,\s (?P<email>[\w+.]@[\w.]) ,\s (?P<phone>\d{3}-\d{3}-\d{4}) ,\s @[\w]+$ ''', string, re.X|re.I|re.M)

twitters = re.search(r''' @[\w]+$ ''', string, re.X|re.M)

import re

contacts = re.search(r''' (?P<email>[.+\w]+@[.\w]+),\s (?P<phone>\d{3}-\d{3}-\d{4}) #phone ''', string, re.X|re.M)

twitters = re.search(r''' @[\w]+$ ''', string, re.X|re.M)