Why Python for data science and machine learning?
Before moving on with more technical discussions, I think it's helpful to explain the choice of Python as the programming language for this book. In the last decade, research in the field of data science and machine learning has seen exponential growth, with thousands of valuable papers and dozens of complete tools. In particular, thanks to its efficiency, elegance, and compactness, Python has been chosen by many researchers and programmers to create a complete scientific ecosystem that has been released for free.
Nowadays, packages such as scikit-learn, SciPy, NumPy, Matplotlib, pandas, and many others represent the backbone of hundreds of production-ready systems and their usage keeps growing. Moreover, complex deep learning applications such as Theano, TensorFlow, and PyTorch allow every Python user to create and train complex models without any speed limits. In fact, it's important to note that Python is not a scripting language anymore. It supports dozens of specific tasks (for example, web frameworks and graphics) and it can be interfaced with native code written in C or C++.
For such reasons, Python is an optimal choice in almost any data science project and due to its features all programmers with different backgrounds can easily learn to use it effectively in a short time. Other free solutions are also available (for example, R, Java, or Scala), however, in the case of R, there's complete coverage of statistical and mathematical functions but it lacks the support frameworks that are necessary to build complete applications. Conversely, Java and Scala have a complete ecosystem of production-ready libraries, but, in particular, Java is not as compact and easy to use as Python. Moreover, the support for native code is much more complex and the majority of libraries rely exclusively on the JVM (with a consequent performance loss).
Scala has gained an important position in the big data panorama, thanks to its functional properties and the existence of frameworks such as Apache Spark, (which can be employed to carry out machine learning tasks with big data). However, considering all the pros and cons, Python remains the optimal choice and that's why it has been chosen for this book.