Hands-On Unsupervised Learning with Python
上QQ阅读APP看书,第一时间看更新

Supervised hello world!

In this example, we want to show how to perform a simple linear regression with bidimensional data. In particular, let's assume that we have a custom dataset containing 100 samples, as follows:

import numpy as np
import pandas as pd

T = np.expand_dims(np.linspace(0.0, 10.0, num=100), axis=1)
X = (T * np.random.uniform(1.0, 1.5, size=(100, 1))) + np.random.normal(0.0, 3.5, size=(100, 1))
df = pd.DataFrame(np.concatenate([T, X], axis=1), columns=['t', 'x'])
We have also created a pandas DataFrame because it's easier to create plots using the seaborn library ( https://seaborn.pydata.org). In the book, the code for the plots (using Matplotlib or seaborn) is normally omitted, but it's always present in the repository.

We want to express the dataset in a synthetic way, as follows:

This task can be carried out using a linear regression algorithm, as follows:

from sklearn.linear_model import LinearRegression

lr = LinearRegression()
lr.fit(T, X)

print('x(t) = {0:.3f}t + {1:.3f}'.format(lr.coef_[0][0], lr.intercept_[0]))

The output of the last command is the following:

x(t) = 1.169t + 0.628

We can also get visual confirmation, drawing the dataset together with the regression line, as shown in the following graph:

Dataset and regression line

In this example, the regression algorithm minimized a squared error cost function, trying to reduce the discrepancy between the predicted value and the actual one. The presence of Gaussian (with null mean) noise has a minimum impact on the slope, thanks to the symmetric distribution.