Name: R All-in-One For Dummies
Rating: 4 (2 reviews)

202 reviews3 followers

April 1, 2024

This is actually a great book to review the concepts; it covers almost all of the things I learned at graduate school. The reason why I like to reread the concepts is because some of them sit inside my brain like a plastic placement that's there because it has to. By reading it over and over again, my understanding deepens and gives it life; they begin to feel more organic inside my head. Statistics is not easy, nor is machine learning. Hopefully one day, I'll wake up feeling like a chef with ample fresh ingredients to use for cooking. Until I feel that confidence, I'll be reading and rereading :)

Favorite quotes:
1. Sometimes you're interested in part of a data frame. To isolate those columns into a data frame, use subset().
2. read.xlsx(), read.csv(), read.table() <- text files.
3. gg in ggplot stands for "grammar of graphics"
4. Instead of using $, you can use with().
5. When a histogram has fatter tails, it is leptokurtic with a greater kurtosis. Platykurtic has fewer extreme events than a normal distribution with negative kurtosis.
6. One type of error occurs when you believe that the data shows something important and you reject H0 but in reality, the data are due just to chance. This is called a Type 1 error with the probability called alpha. The other type of error occurs when you don't reject H0 and the data is really due to something out of the ordinary. This is called a Type 2 error, with probability called beta.
7. A two-tailed test indicates that you're looking for a difference between the sample mean and the null-hypothesis mean, but yo udon't know in which direction. A one-tailed test shows that you have a pretty good idea of how the difference should come out.
8. Paired samples example: When the same individual provides a score for before and after study. This is different from the assumption that choosing an individual for one sample has no bearing on the choice of an individual for the other.
9. Distributions: normal, t, chi-square, f
10. Multiple pairwise t-tests don't work (known as a "thorny problem") because if each test has an alpha=0.05, the overall probability of a Type 1 error increases with the number of means.
11. When something jumps out at you that you didn't anticipate, you can make comparisons such as
posteriori tests, post hoc tests, unplanned comparisons.
12. Epsilon represents "error" in the population. It's a catchall for "things you don't know or things you have no control over".
13. Analysis of variance and linear regression are the same thing. They're both part of what's called the general linear model (GLM). The third and final component of the general linear model is called the analysis of covariance (ANCOVA).
14. Adjusted r-squared takes degrees of freedom into account. Every time you add an independent variable, you change the degrees of freedom, and r-squared is adjusted accordingly.
15. The optimal separation boundary is the one that maximizes the distance between the separation boundary and its nearest points (margin). The lines from the two nearest points to the separation boundary are called support vectors.
16. The ratio (between sum of squares)/(within sum of squares) is a measure of how well the k-means clusters fit the data. A higher number is better.
17. Three activation functions are common. The hyperbolic tangent (known as tanh) takes a number and turns it into a number between -1 and 1. Sigmoid turns its input into a number between 0 and 1. Rectified linear unit (ReLU) replaces negative values with 0. By restricting the range of the output, activation functions set up a nonlinear relationship between the inputs and the outputs. Why is this important? In most real-world situations, you don't find a nice, neat linear relationship between what you try to predict and the data you use to predict it.
18. Bias is a constant that the network adds to each number coming out of the units in a layer. Bias is much like the intercept in a linear regression equation. Without the intercept, a regression line would pass through (0,0) and might miss many of the points it's supposed to fit.

ŷ��

R All-in-One For Dummies

Joseph Schmuller

About the author

Joseph Schmuller

Ratings & Reviews

Friends & Following

Community Reviews

Join the discussion

Can't find what you're looking for?

ŷ������

R All-in-One For Dummies

Joseph Schmuller

About the author

Joseph Schmuller

Ratings & Reviews

Friends & Following

Community Reviews

Join the discussion

Can't find what you're looking for?

ŷ��