Supervised hello world!
In this example, we want to show how to perform a simple linear regression with bidimensional data. In particular, let's assume that we have a custom dataset containing 100 samples, as follows:
import numpy as np
import pandas as pd
T = np.expand_dims(np.linspace(0.0, 10.0, num=100), axis=1)
X = (T * np.random.uniform(1.0, 1.5, size=(100, 1))) + np.random.normal(0.0, 3.5, size=(100, 1))
df = pd.DataFrame(np.concatenate([T, X], axis=1), columns=['t', 'x'])
We want to express the dataset in a synthetic way, as follows:
This task can be carried out using a linear regression algorithm, as follows:
from sklearn.linear_model import LinearRegression
lr = LinearRegression()
lr.fit(T, X)
print('x(t) = {0:.3f}t + {1:.3f}'.format(lr.coef_[0][0], lr.intercept_[0]))
The output of the last command is the following:
x(t) = 1.169t + 0.628
We can also get visual confirmation, drawing the dataset together with the regression line, as shown in the following graph:
In this example, the regression algorithm minimized a squared error cost function, trying to reduce the discrepancy between the predicted value and the actual one. The presence of Gaussian (with null mean) noise has a minimum impact on the slope, thanks to the symmetric distribution.