Statistical Inference via Data Science: A ModernDive into R and the Tidyverse provides a pathway for learning about statistical inference using data science tools widely used in industry, academia, and government. It introduces the tidyverse suite of R packages, including the ggplot2 package for data visualization, and the dplyr package for data wrangling. After equipping readers with just enough of these data science tools to perform effective exploratory data analyses, the book covers traditional introductory statistics topics like confidence intervals, hypothesis testing, and multiple regression modeling, while focusing on visualization throughout.
Features: â—� Assumes minimal prerequisites, notably, no prior calculus nor coding experience â—� Motivates theory using real-world data, including all domestic flights leaving New York City in 2013, the Gapminder project, and the data journalism website, FiveThirtyEight.com â—� Centers on simulation-based approaches to statistical inference rather than mathematical formulas â—� Uses the infer package for "tidy" and transparent statistical inference to construct confidence intervals and conduct hypothesis tests via the bootstrap and permutation methods â—� Provides all code and output embedded directly in the text; also available in the online version at moderndive.com
This book is intended for individuals who would like to simultaneously start developing their data science toolbox and start learning about the inferential and modeling tools used in much of modern-day research. The book can be used in methods and data science courses and first courses in statistics, at both the undergraduate and graduate levels.
My official endorsement (literally, it's on the ):
Through apt use of analogies, hands-on exercises, and abundant opportunities to get coding, this book delivers on its promise to give a reader without a background in statistics or programming the tools necessary for understanding and conducting real-world statistical inference and data analysis.
With an emphasis on learning new concepts first "by hand," before turning to the code, it would make a particularly useful classroom companion. However, the "learning checks" provided throughout also make it a great guide for self-study.
Students and teachers alike will benefit from this thoughtful introduction, as it addresses even the smallest of details that can trip beginners up, and keep them from getting to the more fruitful parts of data analysis.
I've done multiple attempts at learning R through Lynda courses and YouTube videos but I have to say this book is the best resource out there to learn R and maybe statistics as well. I'm not left with the usual "what the heck did I just do?" effect that I got through other resources but it definitely has given me an understanding, confidence, and ability to do statistical calculations through lines of codes.
Tip: Don't copy and paste lines of codes in the book but write them yourself, over and over again. It would make them stick better in one's memory.
I can't be thankful enough that this gem of a book is free to access through the moderndive.com website.
The chapters on data wrangling and visualization were just a summary of the R4DS book by Hadley Wickham. So if you have read that book, just skip to the chapters on statistics. The authors have introduced new libraries for performing statistical inference in a tidy way. The applications were too simple though. The comparison of theory vs simulation based inference methods was good, and a few more in-depth examples could have helped. Overall, it is a book targeted at beginners and is free and open source, so I can't complain much !!
The authors state in the early part of the book that it was made with the following principles: � Blur the lines between lecture and lab. � Focus on the entire data/science research pipeline. � It’s all about the data. � Use simulation/resampling to introduce statistical inference, not probability/mathematical formulas. � Don’t fence off students from the computation pool, throw them in! � Complete reproducibility and customizability.
My biggest problem was probably number 4. This book was part of a mandatory course I had to take at my university, and I really wish it had not been. The lecture notes were supposed to complement the book (give the theoretical perspective), but in the end, I just wound up reading the lecture notes, making some short notes on the specific code I would need for the exam, and then went on my way.
The authors operate from the perspective that you learn by doing. Personally, I prefer to understand a concept (e.g. understand a definition by looking at an example), and make a flashcard so I remember it. This book lent itself very poorly to that way of studying. The commands were never explained in all generality but rather explored through examples. The mathematical concepts were rarely if ever given rigorous definitions.
I have no doubt in my mind that the authors had good intentions with this book (it being free, and the conventional wisdom being that students should get their hands dirty to gain an understanding). Maybe it will even grow on me in the future. But right now I could not care less for it:(