Learning Data Mining with Python(Second Edition)
上QQ阅读APP看书,第一时间看更新

Putting it all together

We can now create a workflow by combining the code from the previous sections, using the broken dataset previously calculated:

X_transformed = MinMaxScaler().fit_transform(X_broken) 
estimator = KNeighborsClassifier()
transformed_scores = cross_val_score(estimator, X_transformed, y, scoring='accuracy')
print("The average accuracy for is {0:.1f}%".format(np.mean(transformed_scores) * 100))

We now recover our original score of 82.3 percent accuracy. The MinMaxScaler resulted in features of the same scale, meaning that no features overpowered others by simply being bigger values. While the Nearest Neighbor algorithm can be confused with larger features, some algorithms handle scale differences better. In contrast, some are much worse!