Heads up! To view this whole video, sign in with your Courses account or enroll in your free 7-day trial. Sign In Enroll
Well done!
You have completed Machine Learning Basics!
You have completed Machine Learning Basics!
Preview
Let's continue defining some machine learning terms so that we have the vocabulary to discuss these ideas in more detail.
Vocabulary and Definitions
- Label: A category for data, or a prediction from a classification algorithm
- Classifier: A supervised machine learning model that makes a prediction about how a piece of data should be categorized
Related Discussions
Have questions about this video? Start a discussion with the community and Treehouse staff.
Sign upRelated Discussions
Have questions about this video? Start a discussion with the community and Treehouse staff.
Sign up
As we've learned,
a data set is comprised of examples.
0:00
And each of those examples
has common features
0:03
that a model can use to perform
analysis and comparisons.
0:06
But what do we want a model
to do with that data?
0:10
Ultimately, we want it to make
some kind of a prediction.
0:14
And the prediction it
makes is called a label.
0:18
Let's go back to the earlier
example of a spam filter.
0:23
Each example, in this case an email,
has features which,
0:27
in this case, might be things like
the subject line, body, and sender.
0:32
In this case, the label is whether
the message is spam or not spam.
0:37
A classifier is a type of algorithm or
0:43
model that makes a prediction about how
a piece of data should be categorized.
0:46
You can think of a classifier
like a function.
0:52
Data goes in, and then the classifier
predicts the correct category for
0:55
that data.
0:59
It does this by using an existing data
set that has examples where the labels
1:01
are known.
1:05
So for a spam filter, you would
train the classifier with a data set
1:07
where lots of emails are already
labeled as spam or not spam.
1:11
And then when a new email comes in,
it can try to assign a label.
1:16
There's one more thing I want
to mention before we carry on.
1:22
Cleaning and organizing data in different
ways can often produce different results.
1:25
In the case of the emails, you might find
that the raw data from the email doesn't
1:31
make useful features because further
heuristics need to be applied.
1:36
For a spam filter classifier,
you might create features that counts
1:41
the number of spammy
phrases from a dictionary.
1:46
Like free offer or click here,
or a feature that identifies
1:49
an attachment as a photo or an executable
program that might be a virus.
1:54
There's an old saying in computing
called garbage in, garbage out.
2:01
It means that if you provide
the computer with bad information, or
2:05
if you give your machine learning
model a data set that's inaccurate or
2:10
not representative of the whole truth.
2:14
Then you're going to get a bad result.
2:17
And that's it.
2:20
As you can imagine, there are many more
definitions and terms in machine learning.
2:22
But those are the big ones we'll need
in order to continue with our exercise
2:26
in the next videos.
2:31
Where we're going to write
our own classifier in Python
2:32
using a library called scikit-learn.
2:36
You need to sign up for Treehouse in order to download course files.
Sign upYou need to sign up for Treehouse in order to set up Workspace
Sign up