上QQ阅读APP看书,第一时间看更新
There's more...
Here, we will see another example of a scatter plot, where we can clearly see distinct segments.
The Iris flower dataset is the oldest dataset, introduced in 1936 by Ronald Fisher. The dataset has 50 examples each of three species of Iris, named Setosa, Virginica, and Versicolor. Each example has four attributes, and the length and width in centimeters of both sepals and petals. This dataset is widely used in machine learning (ML) for classification and clustering. We will use this dataset to demonstrate how a scatter plot can show different clusters within a dataset.
The following code block plots a scatter plot of the length and width of a petal:
- Load the Iris dataset from a .csv file using pandas:
iris = pd.read_csv('iris_dataset.csv', delimiter=',')
- In the file, each class of species is defined with descriptive names, which we will map to numeric codes as 0, 1, or 2:
iris['species'] = iris['species'].map({"setosa" : 0, "versicolor" :
1, "virginica" : 2})
- Plot a scatter plot with the petal lengths on the x axis and the petal widths on the y axis:
plt.scatter(iris.petal_length, iris.petal_width, c=iris.species)
- Label the x and y axes:
plt.xlabel('petal length')
plt.ylabel('petal width')
- Display the graph on the screen:
plt.show()
Here is the explanation of the code:
- pd.read_csv() reads the data into the iris DataFrame.
- The species attribute in the DataFrame has a descriptive class name, setosa, versicolor, and virginica. However, if we want to plot each class in a different color, the argument we pass should be a numeric code. Hence, we map them to numeric codes.
- iris['species'] = iris['species'].map() replaces the descriptive names with 0, 1, and 2 numeric codes.
- c=iris.species specifies color mapping to different classes. These argument classes should be numeric, which is what we have done before.
- plt.scatter() plots the scatter plot.
You should see the following graph on your screen:
Clearly, we can see three different clusters here. But, it is not clear which color represents setosa, versicolor, and virginica cluster. We will see how to distinguish different clusters using labels in subsequent chapters, where we will learn how to customize the plots.