Applied Predictive Modeling covers the overall predictive modeling process, beginning with the crucial steps of data preprocessing, data splitting and foundations of model tuning. The text then provides intuitive explanations of numerous common and modern regression and classification techniques, always with an emphasis on illustrating and solving real data problems. The text illustrates all parts of the modeling process through many hands-on, real-life examples, and every chapter contains extensive R code for each step of the process. This multi-purpose text can be used as an introduction to predictive models and the overall modeling process, a practitioner’s reference handbook, or as a text for advanced undergraduate or graduate level predictive modeling courses. To that end, each chapter contains problem sets to help solidify the covered concepts and uses data available in the book’s R package. This text is intended for a broad audience as both an introduction to predictive models as well as a guide to applying them. Non-mathematical readers will appreciate the intuitive explanations of the techniques while an emphasis on problem-solving with real data across a wide variety of applications will aid practitioners who wish to extend their expertise. Readers should have knowledge of basic statistical ideas, such as correlation and linear regression analysis. While the text is biased against complex equations, a mathematical background is needed for advanced topics.
“Data Science� is the most exciting research and professional fields these days. It is creating a lot of buzz, both within the academy as well as in the business world. Detractors like to point out that most of the topics and techniques used by people who call themselves Data Scientists have been around for decades if not longer. However, has often been the case that a combination of topics and methodologies becomes important and concrete enough that a truly new subfield emerges.
Predictive Modeling is a particularly exciting subfield of Data Science. Thanks to the few recent high profile news grabbing success stories (the 2012 US presidential election, the Netflix prize, etc.) it has attracted a lot of attention and prominence. Thanks to the increased use and availability of data in all walks of life we are increasingly able to make reliable predictions and estimates regarding topics and issues that affect us in very substantive ways. This ability may sometimes seem almost magical, but behind it lay some very accessible ideas and techniques. “Applied Predictive Modeling� aims to expose many of these techniques in a very readable and self-contained book.
This is a very applied and hands-on book. It guides the reader through many examples that serve to illustrate main points, and it raises possible issues and considerations that are oftentimes overlooked or not sufficiently reflected upon. For instance, the way we model as simple of a data as a calendar date can have a significant impact on the kind of analysis and predictive model we choose. This is the kind of information that is often not discussed in other modeling books and can sometimes take years of practical experience before its impact is fully appreciated.
The book has a fairly low access bar, but it is definitely not intended for a complete novice. It assumes a fairly decent background in statistics, R language, and at least a passing understanding of machine learning. Many of these techniques are covered in this book, but mainly as summaries and refreshers. Each one of them could use up a book of its own, ore even a whole collection of books.
One of the best features of this book is that the authors understand that predictive modeling is not just a bunch of statistical and computational techniques. Understanding the data, how to obtain it, manipulate it, and format it, are some of the most crucial steps for predictive modeling (and other data-driven fields), and are often overlooked and not sufficiently explained in many other books and papers that I have come across. The same can be said about the model selection - the choice of a model and its predictive power will crucially depend on the kind of phenomena that we are predicting, as well as on what exactly are we trying to predict. This book does an excellent job in guiding the reader along these paths and installing the necessary intuitions required for successful predictive modeling. Here too, like with most things in life, there is no substitute for years of experience working with actual real world problems, but going through this book will ensure that you don’t have to stumble too much with your first steps.
I recently went through "Data Scientist" job interviews, and some of the most common questions are related to the "process" or predictive modeling. For example:
- What would you do if there's a class imbalance? - How would you how well your model is performing? - What do you do if you have a lot of features, and they're correlated?!
The interviewers are essentially trying to assess if you understand the "process" of model building, and that you're resourceful enough to "know what to do" when the analysis runs into common problems.
For me, this book was a terrific tour of the predictive modeling process from a practitioners point of view. Kuhn walks through many of these considerations, such as pre-processing, missing data, ways to evaluate your model, and more. Kuhn also gives useful "intuitive" explanations of some of the more complicated, but best performing models in the literature. While the SVM section didn't make a lot of sense, I think the explanation of Neural Networks and Tree-Based Methods was very insightful, and really helped me understand the key ideas behind these methods and why they work. I also learned many practical tips on how statisticians deal with common pitfalls in practice, such as screening correlated variables and partial least squares. Finally, the book had a great chapter on evaluation classification models.
For a statistics book, this was **very** easy to read, as I actually got through it in 6/8 hours on a plane ride across country, before an interview. Clearly, you could probably get more out of the book by systematically working through the examples and code, but I think a light read through the book was well worth it, and I learned a ton.
I think this book is best seen as a sequel to . It has three main features:
* Practical guidance on data preprocessing, feature engineering, and handling class imbalance * An introduction to the caret library, which offers a uniform interface to cross-validation and hyperparameter tuning * An overview of a larger set of models and libraries than ISLR covers
Do note that the coverage of algorithms is shallower and less mathematical than ISLR. If that's not what you want, consider reading instead.
I regard this as a more applied counterpart to more methodology oriented resources like Elements of Statistical Learning. So it applies machine learning methods that are found in readily available R libraries. In addition, the author is also the lead on the caret package in R, which provides a consistent interface between a large number of the common machine learning packages.
1. Built around case studies that are woven through the text. For each chapter, the math/stats is developed first, then the computational example is at the end, so that the example can develop data manipulation, application of method, then model evaluation. I like this as it allows for more complex and messy data sets than when using a new, small example for each problem. Also allows for better discussions when illustrating the differences between methods. 2. Data manipulation/data processing is given a separate chapter early on. I appreciate the attention given to working with the data (e.g. missing value imputation). There are other resources in data handling, but not in the same place as those that address the statistics methodology. 3. Emphasis on model evaluation. There is an early chapter devoted to model evaluation. Then each major section of the book has an early chapter devoted to model evaluation of that class of problem. This is in contrast to many books that are built around types of algorithms, and model evaluation is fit in. Methods and algorithms are relatively easy compared to the thought process of determining what is the right thing to do. It figures that this book will be strong in model evaluation when one of the authors is the lead on the caret package in R.�
I used this as a supplement in teaching a data science course that I use a range of different resources because I need to cover working with data, model evaluation, and machine learning methods. The next time I teach this course, I will use only this book because it covers all of these aspects of the field.
Applied Predictive Modeling by Max Kuhn and Kjell Johnson is a complete examination of essential machine learning models with a clear focus on making numeric or factorial predictions. On nearly 600 pages, the Authors discuss all topics from data engineering, modeling, and performance evaluation.
The core of Applied Predictive Modeling consists of four distinct chapters: 1. General Strategies on how to manipulate and re-sample data. 2. Regression Models for making numeric predictions. 3. Classification Models for making factor predictions. 4. Other Considerations concerning model quality.
Overall, Applied Predictive Modeling is a very informative course on machine learning. It assumes some prior knowledge and might be difficult to access for someone without any knowledge, despite leaving out unnecessary equations (Introduction to Statistical Learning by Robert Tibshirani and Trevor Hastie would be a good read before starting this book.). Some of the book's examples are taken from the field of medicine and pharmaceuticals which make them hard to understand for people outside of the realm of the health sciences.
However, the book does a very good job at making machine learning in R much more systematic. It clearly shows the advantages of using the caret package (written by the book's author) and how to evaluate and tune your model's performance.
If you are not entirely new to data science, this book will yield a high return for you. It makes your process of training a model more straightforward and thorough.
Great book for those who want to learn applied data science and / or programming with R.
The book can be combined with using a R toolbox written by the authors with the identical name. It contains many interesting example datasets, too. The book is more for the advanced reader who aims at appling the techniques in practice. As a prerequisite you should have some basic programming knowledge and should have heared at least one statistics (or better chemometrics, econometrics, etc.) course. You do not have to be a mathematician. The authors provide a few theoretical equations in combination with great insightings from their practical experience. So you will learn to study data that does not follow simple, linear trends. The book is pretty complete, covering most stasticial techniques that are currently used in practice. You learn not only about classic regression and classification techniques, but about also decision trees, neural networks as well as rule based systems. Only if you want to dig deeping into specific fields, e.g. apply LSTM-neural networks, you have to continue with more specialized books.
I work with predictive models every day, and I'm also the author of multiple R packages. This book is the best book I own on the topic of prediction. I say that even though I don't make extensive use of machine learning models, and even though there is not a single time series model in this book (when most of my work is with time series). The applied focus and wealth of practical experience on real problems is an invaluable set of insights for anyone building predictive models, in any field, and using any algorithm. I also found the writing style clear, well organized, and easy to read. Highly Recommended.
This book was written by the creator of the package 'caret', which is a swiss knife of machine learning and data pre-processing algorithms. Khun covers not only a variety of stats/ML algorithms, but also delves into topics related to data preprocessing, feature selection, and Type III problems. The book also has a very detailed Computational Section, where the R code is clearly laid out. This book has a lot of great stuff.
One of the best books on predictive analytics using R. This book sets the standard for readability, usage of real life examples to illustrate concepts and thorough documentation of all R code used to create graphs and results within the book. If you're getting into analytics using R, this book is a MUST HAVE.
I read this a few years ago and it was very helpful in doing machine learning in R. Very simple way of explaining complex topics. The primary library covered was the Caret library.
Eu precisei ler alguns capítulos para minha tese. É o melhor livro-texto que encontrei de ML aplicada. Muito bem escrito e bem organizado e mostra os fundamentos matemáticos quando precisa, sem se aprofundar muito, porém. Como uso mais python, achei ruim apenas o fato dos códigos estarem disponíveis apenas para R
A good refresher for old practitioners and aspiring data scientists. Many concepts are explained in applied manner rather with math formulas as in the case of The Elements of Statistical Learning
Remarkable text. It is "" of modeling in terms of its holistic approach and comprehensive coverage.
Not suitable for beginners, prior exposure to programming and statistics is a must. It keeps the promise of its name "applied". You better consume this book after .
It's a great practical book. It doesn't bog you down with unnecessary technical details like other books. Theoretical books have their place in some people's careers. However, I think people vastly overstate the importance of reading math-heavy books in the data science industry. People need books like Applied Predictive Modeling to succeed in their careers as it deals with problems you'll come across on the job.