Heads up! To view this whole video, sign in with your Courses account or enroll in your free 7-day trial. Sign In Enroll
Well done!
You have completed Practice Cleaning Data!
You have completed Practice Cleaning Data!
Preview
Review one solution to the data cleaning challenge.
This video doesn't have any notes.
Related Discussions
Have questions about this video? Start a discussion with the community and Treehouse staff.
Sign upRelated Discussions
Have questions about this video? Start a discussion with the community and Treehouse staff.
Sign up
Ready to see a solution?
0:00
Let's tackle this.
0:02
You can see I created a single function.
0:04
I called it clean_data and
I'm passing in our data.
0:06
Don't forget at the top here,
0:10
I'm already importing our data from
data.pi at the top for us already.
0:12
I created a new list called cleaned and
0:16
I'm going to return it when all this stuff
is completed and everything is cleaned.
0:19
And then I'm just calling the function
while passing in our data and
0:24
printing that out to the console so
0:29
we can make sure that we're
doing everything correctly.
0:32
So the first thing I'm going
to do is our data is a list.
0:36
So if I loop through all
the items in the list,
0:41
I'm going to get each
individual dictionary.
0:44
So let's see what that looks like.
0:47
I'm gonna say for user in data.
0:49
And then just to see how things go,
0:52
I'm going to print Users so
we can see this in the terminal and
0:55
I'm gonna pull this all the way up just so
we have plenty of space.
1:00
This is app.py, there we go.
1:06
Okay, so you can see, [SOUND] I'm
getting each individual dictionary
1:09
that's inside of our list and then I'm
returning the clean list at the end,
1:15
which is why there's
an empty list at the bottom.
1:21
So if I hit clear,
I can pull this back down.
1:25
Okay, so we know that we're
accessing each individual user, so
1:31
now let's go through and
let's fix each one.
1:35
I'm going to create a fixed
variable instead of equal, oops,
1:39
now list an empty dictionary,
make sure to use those curly brackets.
1:42
And then I'm going to go through and I'm
just gonna go through from top to bottom.
1:47
So if I look at our data.py, I'm gonna
do email name, date_joined, admin,
1:51
id, I'm gonna do it in the same order.
1:54
That just makes sense to me.
1:57
So first, email is one of
the ones that we're not changing.
1:59
So I'm gonna do fixed.
2:04
We're gonna create a new key called email,
and
2:06
we're gonna set that equal
to the value of user["email"].
2:11
And now just to show you how this works,
I'm gonna copy this,
2:17
just so
you remember how dictionaries work.
2:21
And I'm gonna print out all of
our values for the email address.
2:25
So remember, this is gonna grab each user,
which is from each dictionary.
2:29
And it's going to grab the email key,
and it's gonna return us the value.
2:37
So let me run this real quick let me hit
Save, pull this up again a little bit.
2:45
Actually I can push up arrow to
get back to our last command and
2:51
there you go, you can see we
get all of the IDs, perfect.
2:56
So we've got our ID fixed,
next is going to be the name.
3:03
I pop in our data.py, we see we have name.
3:09
Now we know in our directions,
number one here we need to split
3:13
the full name into two fields,
first name and last name.
3:18
And a little hint here,
I use the word split on purpose because we
3:23
can use Python's split
functionality to do that for us.
3:28
So I'm gonna do fixed.
3:33
We're gonna add a new key called first_name.
3:35
I set that equal to and
I'm gonna do user["name"]
3:40
and then we're going to split this,
we're gonna call split.
3:46
And we're gonna split it on the space.
3:50
So empty strings.
3:53
I'm gonna put one space inside of it.
3:55
We're gonna split on the space.
3:57
Oops, I need to be outside
the parentheses and
4:00
then we want the first part that's
returned, remember index starts at zero.
4:03
So let's check out this
right here in the console.
4:08
I'm gonna scroll this up so
we can make sure we can see it,
4:13
in our console the same time.
4:16
So I'm gonna do python3
to go into our shell.
4:17
And I'm just gonna create a fake name,
I'll do my own Megan Amendola.
4:20
And now if I'm going to split it,
name.split,
4:27
space, close it and let's see what we get.
4:32
Okay, so when we run split,
we're going to get a list of two items
4:39
because that is what we will
get when we split on a space.
4:43
If I had put my full name in there,
it would give us three things in
4:48
the list because it would give me first,
middle, and last name.
4:52
But because we only have one space,
4:57
it's going to split this
string into two pieces.
4:59
So remember index of zero and
index of one.
5:02
So when I call this, I can then call index
of zero to get just this first part.
5:06
If that seems confusing to you,
5:13
you can instead create a variable
called split_name or you know,
5:15
whatever you want to call it, and
then you can, oops, let's try it again.
5:20
Then you can copy.
5:26
I hate when things start
to go a little funky.
5:30
Let's try again.
5:32
There we go, you could copy this part and
5:35
save it here and then you would just do,
5:39
split_name, oops.
5:44
split_name[0].
5:48
Somehow got to two.
5:54
There we go.
5:55
So you can see if that is a little
bit easier for you to understand,
5:57
calling the split up here which
will give you a list saved
6:02
as your split name or value,
and then calling the first one.
6:08
And then essentially we do the exact
same thing, change this to last.
6:13
And then this would be,
index of one to get the last name.
6:21
So if that's much easier for
you to understand,
6:26
absolutely you can do it that way.
6:29
If you want to do it,
The other way like this,
6:32
you can too and
then you can just delete that variable.
6:38
Either way totally up to you.
6:43
Just gonna leave it that way since
that's the way we had at the end there.
6:46
Exit our show, and run a clear.
6:51
Okay so we have our email and
first name and
6:56
last name all completed,
let's see what is next.
6:59
Date_joined, this is another one
that stays exactly the same.
7:04
So I'm gonna just copy this, and
7:07
I'm just gonna change
this to date_joined and
7:11
date_joined, perfect another one done.
7:16
Next is admin and this is going to
be switching it to a Boolean value.
7:22
Now again, I'm gonna show us
something here in, Python shell.
7:29
So if we have a string,
let's just call it admin anyways,
7:36
and let's set it to False, and
we wanna call the bool value on admin.
7:42
To convert it,
it's always gonna give us true,
7:48
because it's not changing this
from a string into True or False,
7:52
it's changing it and it's saying,
hey, this string has something in it,
7:57
therefore, it is True, cuz that's how
Boolean values work with strings.
8:03
So instead, what we're going to have
to do is do a little comparison,
8:08
a little if statement here.
8:14
So if user["admin"] == True,
8:17
then we need to make
8:24
the fixed["admin"] = True.
8:28
And then we can do,
8:34
oops, else fixed["admin"]
8:39
equals False, okay?
8:45
That tackles our admin and
the last one we need to tackle is ID.
8:51
Now this one we can use
the built in functions,
8:58
so let's do fixed["id"] = int,
9:03
which is how you convert
something to an integer.
9:07
And we can do user["id"].
9:13
Okay, so that tackles all of our fields.
9:17
Now that that's complete,
we need to append this new
9:21
dictionary to our empty list here
at the top, our cleaned list.
9:25
So I'm gonna do cleaned.append(fixed),
9:30
which will append the entire
dictionary that we just built out here.
9:34
And then at the end, once we've finished
looping through all of our users and
9:39
cleaning them, it's gonna return our list
for us so we should be all complete.
9:44
It's already set up here for us to
print it and see it in the console, so
9:50
let me pull this up and
let's check our work.
9:53
All right, so we can see we have a list
here, you can see the close there.
10:00
And then inside we have,
see where it ends.
10:06
Individual dictionary items,
so that's perfect.
10:10
Email is still a string,
we have first name as Warren,
10:14
last name as Bates, awesome.
10:18
So our split name worked.
10:20
Date_joined is still the same.
10:22
Admin is now a Boolean value,
10:25
you can see it doesn't have the string
quotes up next to it anymore.
10:27
And id is now a number.
10:31
Awesome job,
I hope you had fun practicing this skill.
10:34
Keep playing around and keep having fun.
10:37
You need to sign up for Treehouse in order to download course files.
Sign upYou need to sign up for Treehouse in order to set up Workspace
Sign up