Effective Amazon Machine Learning
上QQ阅读APP看书,第一时间看更新

Missing from Amazon ML

Amazon ML offers supervised learning predictions for classification (binary and multiclass) and regression problems. It offers some very basic visualization of the original data and has a preset list of data transformations, such as binning or normalizing the data. It is efficient and simple. However, several functionalities that are important to the data scientist are unfortunately missing from the platform. Lacking these features may not be a deal breaker, but it nonetheless restricts the scope of problems Amazon ML can be applied to.

Some of the common machine learning features Amazon ML does not offer are as follows:

  • Unsupervised learning: It is not possible to do clustering or dimensionality reduction of your data.
  • A choice of models beside linear models: Non-linear Support Vector Machines, any type of Bayes classification, neural networks, and tree, based algorithms (decision trees, random forests, or boosted trees) are all absent models. All predictions, all experiments will be built on linear regression and logistic regression with the SGD.
  • Data visualization capabilities are reduced to histograms and density plots.
  • A choice of metrics: Amazon ML uses F1-score and ROC-AUC metrics for classification, and MSE for regression. It is not possible to assess the model performance with any other metric.
  • You cannot download your trained model and use it anywhere else than Amazon ML.

Finally, although it is not possible to directly use your own scripts (R, Python, Scala, and so on) within the Amazon ML platform, it is possible and recommended to use other AWS services, such as AWS Lambda, to preprocess the datasets. Data manipulation beyond the transformations available in Amazon ML can also be carried out with SQL if your data is stored in one of the AWS SQL enabled services (Athena, RDS, Redshift, and others).