Heads up! To view this whole video, sign in with your Courses account or enroll in your free 7-day trial. Sign In Enroll
Well done!
You have completed Introduction to Data Visualization with Matplotlib!
You have completed Introduction to Data Visualization with Matplotlib!
Preview
Histograms are used to show distributions of data. Let's explore the Iris data set with this chart style.
Further Reading
- Matplotlib style sheets
- Number of bins and widths for histograms
- Freedman-Diaconis rule for Histogram Bin widths
Related Discussions
Have questions about this video? Start a discussion with the community and Treehouse staff.
Sign upRelated Discussions
Have questions about this video? Start a discussion with the community and Treehouse staff.
Sign up
As I've mentioned, histograms are used
to show distributions of data.
0:00
This can be very useful to see
how closely grouped together or
0:04
spread out a variable is.
0:07
The area of the rectangles in
a histogram is proportional
0:09
to the frequency of the variable.
0:12
This allows for
0:14
the rough assessment of the probable
distribution of a given variable.
0:15
The rectangles or
0:19
bins in a histogram, are important to
consider when doing data visualization.
0:20
Both the number of overall bins and
0:25
the bin width can have an impact on
the overall presentation of data.
0:27
From our iris data set let's generate
a histogram chart to see the distribution
0:32
of petal length.
0:36
Let's examine the petal lengths
of the iris virginica class and
0:38
visualize the distribution of that data.
0:41
Here's where we start off in
a new notebook, iris histogram,
0:43
with bringing in our data and
getting it stored in a list called irises.
0:47
Let's process through this list
to just obtain the petal length
0:51
of the iris virginica species.
0:54
Create a list, hold our data.
0:57
Let's also create a variable for
our bin numbers,
1:08
so we can see how changing bin
numbers impacts our visualization.
1:11
Now let's loop through our
data to get our petal lengths.
1:18
For petal in range of our iris data.
1:23
So if the species is Iris-virginica,
1:50
we'll add the petal length to
our virginica_petal_length list.
1:53
And we'll get that from our iris data.
2:12
Now we can pass our data
into our plot.hist method.
2:19
This method takes several parameters,
2:22
including the number of
bins we'd like to have.
2:24
The color we'd like to set,
along with alpha values.
2:27
plt.hist pass in our
virginica_petal_length.
2:30
Our number bins.
2:40
The color of our plot will be red.
2:45
And we give it an alpha value
to make it slightly transparent.
2:50
As I've mentioned, it's always
important to add labels to your charts.
2:55
For chart title.
2:59
Iris-virginica Petal length.
3:05
We'll give that a font size of 12.
3:13
For our x-axis, for xlabel,
3:16
we'll give it what it is,
3:20
Petal length in centimeters.
3:23
Font size of 10.
3:30
And for our ylabel.
3:34
We'll just call it Probability.
3:36
And again,
we'll give that a font size of 10.
3:46
Cool and then we call our show method and
run our cell.
3:54
We are shown a histogram
chart with red rectangles.
4:02
However, the rectangles
are all clumped together and
4:05
can be a challenge to differentiate.
4:08
Matplotlib allows for and includes some
chart styling options which can help out.
4:10
Let's apply matplotlib's
classic style to our chart and
4:16
see if it helps clear things up.
4:19
We'll go back up here and
under where we assign our figure size.
4:22
We'll ask it to use the classic style and
then we can run our cell.
4:32
That's much better.
4:40
Now we are setting our
number of bins to ten,
4:41
which is also the matplotlib default for
histograms.
4:44
Let's change that the 15 and then to 5 to
see how that impacts our visualization.
4:47
Notice here that at 15 bins
we have some empty bins.
4:58
While we get more detail about the data
set, it also spreads the data into
5:02
a broken comb look that doesn't provide as
clear of a picture of the distribution.
5:07
And if we go back and set it to 5 bins.
5:11
At 5 bins,
the data isn't portrayed very well either.
5:19
There are a variety of formulas and
considerations for the number of bins and
5:23
their widths to use.
5:27
I've included links to some resources for
these in the teacher's notes.
5:28
It is not uncommon in practice
to produce multiple histograms
5:33
with different numbers of bins, before
settling on the best communication tool.
5:37
Histograms are great for
exploring the distribution of data, but
5:42
our data set has many more
ways that it can be explored.
5:46
Sepal length and sepal and
5:49
pedal width, can all be explored
across all different species.
5:50
Before the next video,
5:54
practice creating some other
histograms of this data on your own.
5:56
Next, we'll look at box plots.
5:59
You need to sign up for Treehouse in order to download course files.
Sign upYou need to sign up for Treehouse in order to set up Workspace
Sign up