Heads up! To view this whole video, sign in with your Courses account or enroll in your free 7-day trial. Sign In Enroll
Preview
Start a free Courses trial
to watch this video
Negated sets let us specify characters and sequences that should be left out of any matches.
New terms
-
[^abc]
- a set that will not match, and exclude, the letters 'a', 'b', and 'c'. -
re.IGNORECASE
orre.I
- flag to make a search case-insensitive.re.match('A', 'apple', re.I)
would find the 'a' in 'apple'. -
re.VERBOSE
orre.X
- flag that allows regular expressions to span multiple lines and contain (ignored) whitespace and comments.
Related Discussions
Have questions about this video? Start a discussion with the community and Treehouse staff.
Sign upRelated Discussions
Have questions about this video? Start a discussion with the community and Treehouse staff.
Sign up
Let's try a slightly harder one, slightly
weirder one perhaps.
0:00
So let's actually, let's see.
0:06
Let's comment these two both out, and
let's take our email address one.
0:08
And I wanna match all the email address,
just like we did before.
0:16
But if the email undress ends in .gov, I
want to leave that part off.
0:19
Just pretend I have a good reason for this
cuz I, I really don't.
0:25
So, all right, this sounds like a really
good place for us to use a negative set.
0:30
And we can also write this out.
0:35
I mean, this is really, there's a lot to
this.
0:37
So, let's leave ourselves some comments,
make this a little bit easier.
0:41
So okay, first of all, yeah.
0:45
We can definitely use a negative set here.
0:48
So, let's make this a multiline string.
0:51
And we gotta end that multiline string.
0:58
You know what, we need to make this four
spaces.
1:02
There we go, all right.
1:09
And we'll end our multiline string, and
then we'll do stuff as usual against data.
1:11
All right.
So, let's take this.
1:18
I actually don't wanna catch the part
before.
1:20
I just wanna get the the e-mail address.
1:23
So, let's do a \b and
1:27
then an @, and then that part.
1:32
And I don't care how many things are
there.
1:39
So find a word boundary, just leaving
myself a little note here,
1:43
an @, and then any number of characters.
1:48
All right, then what I want to ignore is
gov, and
1:55
I don't wanna get that tab that's in
there.
2:01
You can't necessarily see it, but
2:03
the space between each of these things is
a tab character.
2:07
And I know there's a tab character right
here, and
2:12
it just might catch it, so let's leave
that off.
2:14
So one or more of those is fine.
2:19
And let's leave another comment here of
ignore, wow, wow.
2:21
Ignore one or more instances of the
letters g,
2:29
o, or v and a tab.
2:36
All right.
And
2:41
then we have another b here, so match
another word boundary, all right.
2:41
And then we do data.
2:49
Now, I've done a flag here, which is that
I've done multiple lines.
2:50
So I need to use this VERBOSE flag.
2:57
And then, since we've got gov in there,
and we've got it in lowercase.
3:00
Just in case there was an uppercase
version, I'd want to add on the flag re.I.
3:04
And we add multiple flags with the pipe
symbol in between each of the flags.
3:09
It's a little weird.
3:15
It's just something you get used to.
3:17
You just kinda have to remember it.
3:19
So, all right, let's try that out.
3:21
And there we go,
3:25
we've got @teamtreehouse.com,
@teamtreehouse.com, blah, blah, blah.
3:26
And then we get over here, and we've got
us, this was supposed to be us.gov, and
3:30
we've got just us.
3:34
And then we were supposed to have
empire.gov, as we've got up here, and
3:36
we've just have empire.
3:39
And we're supposed to have spain.gov, and
we just got spain.
3:41
So, that's pretty cool,
3:43
we got all the email addresses, but we
left off the .gov on two of them.
3:46
So, I think that's pretty cool, pretty
handy.
3:51
Let's try another one with our VERBOSE
flag,
3:56
just to get used to doing our VERBOSE
flag.
4:00
Gonna comment this out.
4:04
All right.
4:05
So let's try another verbose pattern that
will match our our names.
4:08
It'll also match our jobs, but it's still
a good practice.
4:14
So we're gonna do print(re.findall.
4:18
And then we're gonna do a multi-line
string, cuz we're gonna use verbose.
4:23
So let's do \b -\w.
4:26
So that would be Find a word boundary 1+
4:33
hyphens or word characters.
4:40
We'll just say characters.
4:47
And a comma cuz that comma's in there.
4:49
It has to find that comma.
4:52
And then let's have it find, find
whitespace.
4:54
Find 1 whitespace.
5:00
And then let's have it find another
hyphen, a w, or
5:02
a space as part of our set.
5:06
We'll talk about why that's different in
just a second.
5:07
1+ hyphens and characters, and explicit
spaces.
5:10
And then I want it to not find tabs or new
line characters.
5:21
Ignore tabs and newlines.
5:25
And then we wanna close this, we're gonna
run this against data, and
5:29
we're gonna do re.x.
5:34
All right.
5:36
So let's talk about this one for a second
before we run it.
5:36
So, when we do the verbose flag,
5:40
which re.x if you didn't guess is the
short hand version of re.VERBOSE.
5:43
When we do the verbose ones, the regular
expression engine ignores all of
5:49
the spaces that are just out in our
pattern.
5:54
So like, these spaces here and
5:57
these spaces here are completely ignored,
as is this comment.
6:00
So we have to mark those with this \s.
6:05
That, and, and that is whitespace.
6:09
So that matches spaces, it matches tabs,
it matches new lines.
6:12
It matches all sorts of stuff.
6:17
Actually, I don't remember if it matches
new lines or
6:18
not, but it matches spaces and tabs, and
other characters like that.
6:19
If you wanna go look up like, half tab or
letter space and
6:24
stuff like that, there's all sorts of
these spaces that are available.
6:28
So it matches all of those.
6:31
But inside of a set, we can use an
explicit space and
6:33
that will only match spaces.
6:36
It won't match tabs or newlines or
whatever.
6:40
And then down here we want to ignore tab
and newline.
6:42
Now, why didn't we have to use re.i in
this one, or re.ignorecase.
6:46
The reason's because we're not matching
any explicit characters.
6:50
We're not matching, like, the letter t,
that may be uppercase or lowercase.
6:54
Since we're not matching those things,
we're matching more generic stuff like
6:59
word characters, then we can use, or we
can, we can leave off re.i.
7:03
.
So let's run this and see what it does.
7:09
And I forgot another character.
7:13
We should have a plus sign there as well.
7:15
So let's run that again.
7:19
There we go.
7:21
So now we've got Kenneth Love and Teacher
Treehouse, Dave MacFarlane, or
7:22
MacFarlane, Dave, Teacher Treehouse, and
so on.
7:26
So we got the names, and we got the where
they work.
7:29
So, of course, if we want to get Tim in
there, we need to change this to a star.
7:34
Run this again and we should get Tim.
7:40
I don't see Tim actually.
7:44
So Tim's not in there, but we will fix
that later.
7:47
We'll select everybody before we get to
the end of this.
7:52
As you can tell though, it really, really
helps breaking up our patterns or
7:55
multiple lines.
7:59
And being able to annotate each line with
a comment, so that we remember what we're
8:00
doing, what we're looking for and how to
make things again.
8:04
We have a ton of choices now when we write
patterns.
8:09
They can be as flexible or strict as we
need.
8:12
Our next video will cover the real meat of
what'll make our regular expressions
8:14
capable of solving our immediate problem.
8:18
You need to sign up for Treehouse in order to download course files.
Sign upYou need to sign up for Treehouse in order to set up Workspace
Sign up