What this book covers
Here is a list of changes from the first edition by chapter:
Chapter 1, A process for success, has the flowchart redone to update an unintended typo and add additional methodologies.
Chapter 2, Linear Regression – the Blocking and Tackling of Machine Learning, has the code improved, and better charts have been provided; other than that, it remains relatively close to the original.
Chapter 3, Logistic Regression and Discriminant Analysis, has the code improved and streamlined. One of my favorite techniques, multivariate adaptive regression splines, has been added; it performs well, handles non-linearity, and is easy to explain. It is my base model, with others becoming "challengers" to try and outperform it.
Chapter 4, Advanced Feature Selection in Linear Models, has techniques not only for regression but also for a classification problem included.
Chapter 5, More Classification Techniques – K-Nearest Neighbors and Support Vector Machines, has the code streamlined and simplified.
Chapter 6, Classification and Regression Trees, has the addition of the very popular techniques provided by the XGBOOST package. Additionally, I added the technique of using random forest as a feature selection tool.
Chapter 7, Neural Networks and Deep Learning, has been updated with additional information on deep learning methods and has improved code for the H2O package, including hyper-parameter search.
Chapter 8, Cluster Analysis, has the methodology of doing unsupervised learning with random forests added.
Chapter 9, Principal Components Analysis, uses a different dataset, and an out-of-sample prediction has been added.
Chapter 10, Market Basket Analysis, Recommendation Engines, and Sequential Analysis, has the addition of sequential analysis, which, I'm discovering, is more and more important, especially in marketing.
Chapter 11, Creating Ensembles and Multiclass Classification, has completely new content, using several great packages.
Chapter 12, Time Series and Causality, has a couple of additional years of climate data added, along with a demonstration of different methods of causality test.
Chapter 13, Text Mining, has additional data and improved code.
Chapter 14, R on the Cloud, is another chapter of new content, allowing you to get R on the cloud, simply and quickly.
Appendix A, R Fundamentals, has additional data manipulation methods.
Appendix B, Sources, has a list of sources and references.