🤑 Join the Treehouse affiliate program and earn 25% recurring commission!

New No-Code Track! 🚀start learning today!

🌟 Dreaming of a bright future? 🎓 Ask about the Treehouse Scholarship program! 🚀

✨ Earn college credits in Cybersecurity, JS, HTML, CSS and Python

Well done!

You have completed Regular Expressions in Python!

Sign up for Treehouse Back to Library

Preview

Sign up for Treehouse Continue

Groups

9:45 with Kenneth Love

Now that we can search for just about anything, let's organize our results a bit better. Regular expressions give us indexed and named groups, both of which are super-handy.

Teacher's Notes
Questions?
Video Transcript
Downloads
Workspaces

New terms

([abc]) - creates a group that contains a set for the letters 'a', 'b', and 'c'. This could be later accessed from the Match object as .group(1)
(?P<name>[abc]) - creates a named group that contains a set for the letters 'a', 'b', and 'c'. This could later be accessed from the Match object as .group('name').
.groups() - method to show all of the groups on a Match object.
re.MULTILINE or re.M - flag to make a pattern regard lines in your text as the beginning or end of a string.
^ - specifies, in a pattern, the beginning of the string.
$ - specifies, in a pattern, the end of the string.

Related Discussions

Have questions about this video? Start a discussion with the community and Treehouse staff.

Sign up

Related Discussions

Have questions about this video? Start a discussion with the community and Treehouse staff.

Sign up

When we create our patterns, they just match the entire thing. 0:00

We've seen it already where our match objects only have one group in them. 0:02

It's often really handy to have multiple groups defined inside of your pattern, so 0:06

that you can later access just parts of the text that you care about. 0:10

Like for our case, making a group for the email address, and a group for 0:13

the phone number, and a group for 0:16

the name would make it a lot simpler later to pull those out and use them. 0:18

So, at this point, 0:22

we've gotten to where we can catch pretty much anything in our text file. 0:23

So I think what we should do now is we should kind of just use all these 0:28

things at once. 0:30

This might get a little confusing, so 0:33

what we're gonna do is we're actually gonna break these up into groups. 0:34

So let's let's start this out with our normal print and re.findall. 0:38

And then let's do a large verbose one and we're gonna need re.X. 0:45

So, all right. 0:51

Now we can write our pattern. 0:53

I'm gonna add in some extra lines her just so 0:55

I can make this a little bit more readable. 0:57

All right, so we define groups with parentheses. 1:00

So, our first group here, we wanna capture the last name and the first name. 1:04

So for the last name, we need that. 1:09

So hyphens, word characters, and spaces. 1:17

Any number of those from zero on up. 1:20

And then we need a comma, and we need an actual space, and 1:23

then we need hyphen w space again, and that's our group. 1:27

So that's last name, comma space first name. 1:33

And then there's gonna be a tab. 1:36

All right. So let's make a little note of 1:38

that last and first names. 1:39

Okay. So now for 1:42

our email address, which was our next thing in our line. 1:43

those, oops, those should cover our items. 1:49

So hyphens, word characters, numbers, periods, and plus signs. 1:53

So we've got one or more of those. 2:00

We have an at symbol, and then we again have hyphens, 2:01

word characters, digits, and periods. 2:05

One or more of those, and then there's a tab. 2:08

And this is for our email. 2:12

All right, so what comes next? 2:14

Well, next is the phone number. 2:16

So, remember we have to escape these parentheses and 2:18

we wanna mark them as optional. 2:21

So, then there's three digits. 2:23

And there is closing parentheses that is optional. 2:27

There is a hyphen that is optional, and a space that is optional. 2:31

And then there are three numbers, a hyphen, and the four numbers. 2:36

That's our group, and then there's a tab. 2:42

So we'll say that's phone. 2:43

yep, all right. 2:47

Then we have the job and the company that they work for. 2:49

So, this is a whole lot like our one that captures the names, but 2:53

we don't have a lot of stuff in here. 2:59

So, it's pretty much just word characters and spaces. 3:02

So there can be one or more of those, 3:05

a comma, some sort of white space, and then again words and spaces. 3:07

And then of course there's a tab. 3:14

So job and company. 3:16

And then the last thing that we put in there on some of the lines at 3:18

least is a Twitter account. 3:21

So, let's grab that, Twitter is actually really easy to grab. 3:23

It's just /w/d, because, 3:27

I guess, no underscores are being included in slash w. 3:31

You can't have hyphens, you can't really have special characters, 3:36

you can just have numbers and letters. 3:39

So, that's that for Twitter, and let's mark that Twitter. 3:40

All right. So, that's our pattern. 3:47

Now it's a really long reg X, and there's actually a couple of problems with this, 3:49

things that it won't catch. 3:53

But let's run it and, and see what we get. 3:55

So, we'll come down here, python address_book. 3:58

And we can see like, you notice that there's opening parenthesis, there's a, 4:02

a tuple. 4:07

Yeah, you see the tuple? 4:08

And the tuple shows all of our little groups that we caught. 4:09

Each item in the tuple is one of our groups. 4:13

So that's pretty awesome. 4:16

We're gonna come back to that. 4:17

Do you notice there's anything missing? 4:19

Dave's not here. 4:21

And King Arthur isn't in here. 4:23

And the reason is because they don't have some of the items that we're looking for. 4:25

So since they don't match exactly, they don't get included. 4:32

So what we should do is we should go back and 4:36

mark a couple of things as being optional. 4:38

We're also gonna do a couple of other tricks here. 4:41

So, let's see. 4:43

The first thing we're gonna do is we're actually gonna add a symbol right here. 4:44

We're gonna add the carrot, and that means the beginning of the string. 4:49

Okay. 4:54

And to compliment that, right down here right after that closing parenthesis, 4:54

we're gonna put in a dollar sign, which marks the end of the string. 4:59

'Kay, we've got another trick we're gonna do for 5:04

that in just a minute, but remember those. 5:06

So Tim doesn't have a last name. 5:10

So we'll mark those as completely optional. 5:12

And everybody's got email. 5:15

I don't think there's anything we need to change on email. 5:17

And some of them. 5:20

Let's see. 5:22

I think they all have phone numbers. 5:23

Some of them, however, don't have jobs listed. 5:25

So, rather, they have jobs listed. 5:30

They may not have if they don't have a phone number, 5:33

then we mar, oh, sorry, yeah, we wanna change this. 5:37

A phone number is optional. 5:41

We wanna make that phone number optional. 5:42

If they don't have a phone number, it won't be there. 5:44

The tab after job, if they don't have a Twitter account, 5:48

the tab after job will actually be a new line. 5:51

So that tab won't be there. 5:54

We wanna mark that tab as being optional. 5:55

And really over here in the company name, 5:58

we should add in a dot as being a possible character. 6:01

Because we've got that one, that co dot. 6:04

So we want to be able to mark that, or catch it. 6:05

And then some of them don't have Twitter accounts, so 6:08

let's make Twitter optional as well. 6:10

The other thing we need to add, because we marked beginning and end of the string. 6:13

And our string is this entire thing. 6:17

We want our string to be in one line. 6:20

Right? 6:24

So what we need to do is we need to add in re.MULTILINE. 6:25

And what that says is treat each line a return me and count our slash in. 6:29

Treat that as the end of the string. 6:33

So, it turns our one big string into a lot of strings, 6:35

as far as the regular expression engine is concerned. 6:39

Okay? 6:42

If we want we can do re.M, instead of re.MULTILINE. 6:43

So either way that's gonna work. 6:48

All right, let's try this one out. 6:51

Look at that. We've got a whole lot more stuff. 6:55

I do believe we've got everything for everyone. 6:56

There's the doctor, even with his big email address. 6:59

[BLANK_AUDIO] 7:02

We got Tim. 7:05

We got everybody in there. 7:06

All of our stuff is there. 7:07

So that's amazing. 7:09

That's awesome. 7:09

So, what we wanna do now though, is we wanna make this regular expression. 7:11

It's really handy as it is, but 7:15

it's just giving us out a list of tuples when we do this find all. 7:17

And no matter what we did, we would only get tuples, and 7:21

we would get like index positions. 7:24

What I wanna do though, is I wanna be able to turn this into a dictionary, so 7:26

that I can use that dictionary and do something else with it. 7:29

So let's take our groups and make them named groups. 7:34

So the way that we do that, we don't have to change any of our code. 7:38

Our code gets to stay the same. 7:41

We just add on a couple of things. 7:42

We add a question mark and a p, and this is what makes it a name. 7:43

And then we specify the name inside of less than and greater than signs. 7:48

So we're gonna name this first group name, cuz that's what it is. 7:52

The second group we're going to name email. 7:58

The third group we will name phone. 8:02

The fourth group we'll name job. 8:06

And the last group, we'll name Twitter. 8:10

All right? I think that's pretty good. 8:16

But let's actually, instead of doing all of this here and, and 8:18

printing, let's make this a little easier for ourselves. 8:23

Let's say line equals and let's do a search. 8:25

[BLANK_AUDIO] 8:28

And then we need to get rid of one of these. 8:30

All right? So line is a search. 8:32

For right now it's just gonna be that first line. 8:34

It's just gonna be me. 8:35

But we can print out what this gets. 8:37

So let's print out line. 8:39

[BLANK_AUDIO] 8:41

And then let's also print out line line.groupdict. 8:43

And let's see what's these two things do. 8:50

So, okay, let's come down here, address book. 8:52

So when we print out line, we get this match objects. 8:55

All right. 8:57

And the match object catches a whole bunch of stuff. 8:58

But when we print the dictionary, look what we get. 9:00

We've got the dictionary that has the name and email address, and the job. 9:02

Yeah, it gets the slash t on the job, but that's okay. 9:06

And Twitter gets kennethlove and the phone gets the phone number. 9:10

So we got all this stuff. 9:12

That's so much better than what we've gotten before when we 9:14

were just getting these tuples. 9:17

So our next video, we've got just two last big steps and 9:20

we'll have turned this in to something absolutely amazing. 9:24

Wow, using groups, especially named groups, 9:26

makes our string almost act like an object or dictionary. 9:29

We've turned a simple string into really useful data, good job, us. 9:32

All right, just a bit more to go, and we'll have this in the bag. 9:37

In our next video, let's look at making reusable patterns, and 9:39

how we can loop over our addresses in a more useful manner. 9:43

You need to sign up for Treehouse in order to download course files.

Sign up

You need to sign up for Treehouse in order to set up Workspace

Sign up