Heads up! To view this whole video, sign in with your Courses account or enroll in your free 7-day trial. Sign In Enroll
Preview
Start a free Courses trial
to watch this video
Now that we can search for just about anything, let's organize our results a bit better. Regular expressions give us indexed and named groups, both of which are super-handy.
New terms
-
([abc])
- creates a group that contains a set for the letters 'a', 'b', and 'c'. This could be later accessed from theMatch
object as.group(1)
-
(?P<name>[abc])
- creates a named group that contains a set for the letters 'a', 'b', and 'c'. This could later be accessed from theMatch
object as.group('name')
. -
.groups()
- method to show all of the groups on aMatch
object. -
re.MULTILINE
orre.M
- flag to make a pattern regard lines in your text as the beginning or end of a string. -
^
- specifies, in a pattern, the beginning of the string. -
$
- specifies, in a pattern, the end of the string.
Related Discussions
Have questions about this video? Start a discussion with the community and Treehouse staff.
Sign upRelated Discussions
Have questions about this video? Start a discussion with the community and Treehouse staff.
Sign up
When we create our patterns, they just
match the entire thing.
0:00
We've seen it already where our match
objects only have one group in them.
0:02
It's often really handy to have multiple
groups defined inside of your pattern, so
0:06
that you can later access just parts of
the text that you care about.
0:10
Like for our case, making a group for the
email address, and a group for
0:13
the phone number, and a group for
0:16
the name would make it a lot simpler later
to pull those out and use them.
0:18
So, at this point,
0:22
we've gotten to where we can catch pretty
much anything in our text file.
0:23
So I think what we should do now is we
should kind of just use all these
0:28
things at once.
0:30
This might get a little confusing, so
0:33
what we're gonna do is we're actually
gonna break these up into groups.
0:34
So let's let's start this out with our
normal print and re.findall.
0:38
And then let's do a large verbose one and
we're gonna need re.X.
0:45
So, all right.
0:51
Now we can write our pattern.
0:53
I'm gonna add in some extra lines her just
so
0:55
I can make this a little bit more
readable.
0:57
All right, so we define groups with
parentheses.
1:00
So, our first group here, we wanna capture
the last name and the first name.
1:04
So for the last name, we need that.
1:09
So hyphens, word characters, and spaces.
1:17
Any number of those from zero on up.
1:20
And then we need a comma, and we need an
actual space, and
1:23
then we need hyphen w space again, and
that's our group.
1:27
So that's last name, comma space first
name.
1:33
And then there's gonna be a tab.
1:36
All right.
So let's make a little note of
1:38
that last and first names.
1:39
Okay.
So now for
1:42
our email address, which was our next
thing in our line.
1:43
those, oops, those should cover our items.
1:49
So hyphens, word characters, numbers,
periods, and plus signs.
1:53
So we've got one or more of those.
2:00
We have an at symbol, and then we again
have hyphens,
2:01
word characters, digits, and periods.
2:05
One or more of those, and then there's a
tab.
2:08
And this is for our email.
2:12
All right, so what comes next?
2:14
Well, next is the phone number.
2:16
So, remember we have to escape these
parentheses and
2:18
we wanna mark them as optional.
2:21
So, then there's three digits.
2:23
And there is closing parentheses that is
optional.
2:27
There is a hyphen that is optional, and a
space that is optional.
2:31
And then there are three numbers, a
hyphen, and the four numbers.
2:36
That's our group, and then there's a tab.
2:42
So we'll say that's phone.
2:43
yep, all right.
2:47
Then we have the job and the company that
they work for.
2:49
So, this is a whole lot like our one that
captures the names, but
2:53
we don't have a lot of stuff in here.
2:59
So, it's pretty much just word characters
and spaces.
3:02
So there can be one or more of those,
3:05
a comma, some sort of white space, and
then again words and spaces.
3:07
And then of course there's a tab.
3:14
So job and company.
3:16
And then the last thing that we put in
there on some of the lines at
3:18
least is a Twitter account.
3:21
So, let's grab that, Twitter is actually
really easy to grab.
3:23
It's just /w/d, because,
3:27
I guess, no underscores are being included
in slash w.
3:31
You can't have hyphens, you can't really
have special characters,
3:36
you can just have numbers and letters.
3:39
So, that's that for Twitter, and let's
mark that Twitter.
3:40
All right.
So, that's our pattern.
3:47
Now it's a really long reg X, and there's
actually a couple of problems with this,
3:49
things that it won't catch.
3:53
But let's run it and, and see what we get.
3:55
So, we'll come down here, python
address_book.
3:58
And we can see like, you notice that
there's opening parenthesis, there's a,
4:02
a tuple.
4:07
Yeah, you see the tuple?
4:08
And the tuple shows all of our little
groups that we caught.
4:09
Each item in the tuple is one of our
groups.
4:13
So that's pretty awesome.
4:16
We're gonna come back to that.
4:17
Do you notice there's anything missing?
4:19
Dave's not here.
4:21
And King Arthur isn't in here.
4:23
And the reason is because they don't have
some of the items that we're looking for.
4:25
So since they don't match exactly, they
don't get included.
4:32
So what we should do is we should go back
and
4:36
mark a couple of things as being optional.
4:38
We're also gonna do a couple of other
tricks here.
4:41
So, let's see.
4:43
The first thing we're gonna do is we're
actually gonna add a symbol right here.
4:44
We're gonna add the carrot, and that means
the beginning of the string.
4:49
Okay.
4:54
And to compliment that, right down here
right after that closing parenthesis,
4:54
we're gonna put in a dollar sign, which
marks the end of the string.
4:59
'Kay, we've got another trick we're gonna
do for
5:04
that in just a minute, but remember those.
5:06
So Tim doesn't have a last name.
5:10
So we'll mark those as completely
optional.
5:12
And everybody's got email.
5:15
I don't think there's anything we need to
change on email.
5:17
And some of them.
5:20
Let's see.
5:22
I think they all have phone numbers.
5:23
Some of them, however, don't have jobs
listed.
5:25
So, rather, they have jobs listed.
5:30
They may not have if they don't have a
phone number,
5:33
then we mar, oh, sorry, yeah, we wanna
change this.
5:37
A phone number is optional.
5:41
We wanna make that phone number optional.
5:42
If they don't have a phone number, it
won't be there.
5:44
The tab after job, if they don't have a
Twitter account,
5:48
the tab after job will actually be a new
line.
5:51
So that tab won't be there.
5:54
We wanna mark that tab as being optional.
5:55
And really over here in the company name,
5:58
we should add in a dot as being a possible
character.
6:01
Because we've got that one, that co dot.
6:04
So we want to be able to mark that, or
catch it.
6:05
And then some of them don't have Twitter
accounts, so
6:08
let's make Twitter optional as well.
6:10
The other thing we need to add, because we
marked beginning and end of the string.
6:13
And our string is this entire thing.
6:17
We want our string to be in one line.
6:20
Right?
6:24
So what we need to do is we need to add in
re.MULTILINE.
6:25
And what that says is treat each line a
return me and count our slash in.
6:29
Treat that as the end of the string.
6:33
So, it turns our one big string into a lot
of strings,
6:35
as far as the regular expression engine is
concerned.
6:39
Okay?
6:42
If we want we can do re.M, instead of
re.MULTILINE.
6:43
So either way that's gonna work.
6:48
All right, let's try this one out.
6:51
Look at that.
We've got a whole lot more stuff.
6:55
I do believe we've got everything for
everyone.
6:56
There's the doctor, even with his big
email address.
6:59
[BLANK_AUDIO]
7:02
We got Tim.
7:05
We got everybody in there.
7:06
All of our stuff is there.
7:07
So that's amazing.
7:09
That's awesome.
7:09
So, what we wanna do now though, is we
wanna make this regular expression.
7:11
It's really handy as it is, but
7:15
it's just giving us out a list of tuples
when we do this find all.
7:17
And no matter what we did, we would only
get tuples, and
7:21
we would get like index positions.
7:24
What I wanna do though, is I wanna be able
to turn this into a dictionary, so
7:26
that I can use that dictionary and do
something else with it.
7:29
So let's take our groups and make them
named groups.
7:34
So the way that we do that, we don't have
to change any of our code.
7:38
Our code gets to stay the same.
7:41
We just add on a couple of things.
7:42
We add a question mark and a p, and this
is what makes it a name.
7:43
And then we specify the name inside of
less than and greater than signs.
7:48
So we're gonna name this first group name,
cuz that's what it is.
7:52
The second group we're going to name
email.
7:58
The third group we will name phone.
8:02
The fourth group we'll name job.
8:06
And the last group, we'll name Twitter.
8:10
All right?
I think that's pretty good.
8:16
But let's actually, instead of doing all
of this here and, and
8:18
printing, let's make this a little easier
for ourselves.
8:23
Let's say line equals and let's do a
search.
8:25
[BLANK_AUDIO]
8:28
And then we need to get rid of one of
these.
8:30
All right?
So line is a search.
8:32
For right now it's just gonna be that
first line.
8:34
It's just gonna be me.
8:35
But we can print out what this gets.
8:37
So let's print out line.
8:39
[BLANK_AUDIO]
8:41
And then let's also print out line
line.groupdict.
8:43
And let's see what's these two things do.
8:50
So, okay, let's come down here, address
book.
8:52
So when we print out line, we get this
match objects.
8:55
All right.
8:57
And the match object catches a whole bunch
of stuff.
8:58
But when we print the dictionary, look
what we get.
9:00
We've got the dictionary that has the name
and email address, and the job.
9:02
Yeah, it gets the slash t on the job, but
that's okay.
9:06
And Twitter gets kennethlove and the phone
gets the phone number.
9:10
So we got all this stuff.
9:12
That's so much better than what we've
gotten before when we
9:14
were just getting these tuples.
9:17
So our next video, we've got just two last
big steps and
9:20
we'll have turned this in to something
absolutely amazing.
9:24
Wow, using groups, especially named
groups,
9:26
makes our string almost act like an object
or dictionary.
9:29
We've turned a simple string into really
useful data, good job, us.
9:32
All right, just a bit more to go, and
we'll have this in the bag.
9:37
In our next video, let's look at making
reusable patterns, and
9:39
how we can loop over our addresses in a
more useful manner.
9:43
You need to sign up for Treehouse in order to download course files.
Sign upYou need to sign up for Treehouse in order to set up Workspace
Sign up