├── .gitignore ├── README.md ├── ch10 ├── 1.Rmd ├── 1.html ├── 10.Rmd ├── 10.html ├── 11.Rmd ├── 11.html ├── 2.Rmd ├── 2.html ├── 3.Rmd ├── 3.html ├── 4.Rmd ├── 4.html ├── 5.Rmd ├── 5.html ├── 6.Rmd ├── 6.html ├── 7.Rmd ├── 7.html ├── 8.Rmd ├── 8.html ├── 9.Rmd ├── 9.html ├── Ch10Ex11.csv └── lab.R ├── ch2 ├── 3a.jpg ├── answers ├── applied.R ├── lab.R └── notes │ ├── Hypercube-boundary-experiment.R │ ├── curse_dimensionality.png │ ├── dimension.R │ ├── edges.png │ ├── neighbor.R │ └── neighbor.png ├── ch3 ├── 5.jpg ├── 7.Rmd ├── 7.html ├── 7.md ├── answers ├── applied.Rmd ├── applied.html ├── figure │ ├── unnamed-chunk-11.png │ ├── unnamed-chunk-111.png │ ├── unnamed-chunk-112.png │ ├── unnamed-chunk-113.png │ ├── unnamed-chunk-114.png │ ├── unnamed-chunk-115.png │ ├── unnamed-chunk-12.png │ ├── unnamed-chunk-16.png │ ├── unnamed-chunk-17.png │ ├── unnamed-chunk-2.png │ ├── unnamed-chunk-28.png │ ├── unnamed-chunk-3.png │ ├── unnamed-chunk-30.png │ ├── unnamed-chunk-31.png │ ├── unnamed-chunk-32.png │ ├── unnamed-chunk-33.png │ ├── unnamed-chunk-34.png │ ├── unnamed-chunk-36.png │ ├── unnamed-chunk-38.png │ ├── unnamed-chunk-4.png │ ├── unnamed-chunk-41.png │ ├── unnamed-chunk-42.png │ ├── unnamed-chunk-43.png │ ├── unnamed-chunk-441.png │ ├── unnamed-chunk-442.png │ ├── unnamed-chunk-443.png │ ├── unnamed-chunk-47.png │ ├── unnamed-chunk-5.png │ ├── unnamed-chunk-8.png │ ├── unnamed-chunk-81.png │ ├── unnamed-chunk-82.png │ ├── unnamed-chunk-83.png │ ├── unnamed-chunk-84.png │ └── unnamed-chunk-9.png └── lab.R ├── ch4 ├── 1.Rmd ├── 1.html ├── 1.md ├── 10.Rmd ├── 10.html ├── 10.md ├── 11.Rmd ├── 11.html ├── 11.md ├── 12.Rmd ├── 12.html ├── 12.md ├── 13.Rmd ├── 13.html ├── 13.md ├── 2.Rmd ├── 2.html ├── 2.md ├── 3.Rmd ├── 3.html ├── 3.md ├── 4.Rmd ├── 4.html ├── 4.md ├── 5.Rmd ├── 5.html ├── 5.md ├── 6.Rmd ├── 6.html ├── 6.md ├── 7.Rmd ├── 7.html ├── 7.md ├── 8.Rmd ├── 8.html ├── 8.md ├── 9.Rmd ├── 9.html ├── 9.md ├── figure │ ├── 10a.png │ ├── 11b.png │ ├── 12e.png │ └── 12f.png └── lab.R ├── ch5 ├── 1.Rmd ├── 1.html ├── 1.md ├── 2.Rmd ├── 2.html ├── 2.md ├── 3.Rmd ├── 3.html ├── 3.md ├── 4.Rmd ├── 4.html ├── 4.md ├── 5.Rmd ├── 5.html ├── 5.md ├── 6.Rmd ├── 6.html ├── 6.md ├── 7.Rmd ├── 7.html ├── 7.md ├── 8.Rmd ├── 8.html ├── 8.md ├── 9.Rmd ├── 9.html ├── 9.md ├── figure │ ├── 2g.png │ └── 8b.png └── lab.R ├── ch6 ├── 1.Rmd ├── 1.html ├── 1.md ├── 10.Rmd ├── 10.html ├── 10.md ├── 11.Rmd ├── 11.html ├── 11.md ├── 2.Rmd ├── 2.html ├── 2.md ├── 3.Rmd ├── 3.html ├── 3.md ├── 4.Rmd ├── 4.html ├── 4.md ├── 5.Rmd ├── 5.html ├── 5.md ├── 6.Rmd ├── 6.html ├── 6.md ├── 7.Rmd ├── 7.html ├── 7.md ├── 8.Rmd ├── 8.html ├── 8.md ├── 9.Rmd ├── 9.html ├── 9.md ├── figure │ ├── 6a.png │ ├── 6b.png │ ├── 8c1.png │ ├── 8c2.png │ ├── 8c3.png │ ├── 8d.png │ ├── 8e.png │ ├── 9e.png │ ├── 9f.png │ ├── 9g.png │ ├── unnamed-chunk-1.png │ ├── unnamed-chunk-2.png │ ├── unnamed-chunk-3.png │ └── unnamed-chunk-4.png └── lab.R ├── ch7 ├── 1.Rmd ├── 1.html ├── 1.md ├── 10.Rmd ├── 10.html ├── 10.md ├── 11.Rmd ├── 11.html ├── 11.md ├── 12.Rmd ├── 12.html ├── 12.md ├── 2.Rmd ├── 2.html ├── 2.md ├── 3.Rmd ├── 3.html ├── 4.Rmd ├── 4.html ├── 4.md ├── 5.Rmd ├── 5.html ├── 5.md ├── 6.Rmd ├── 6.html ├── 6.md ├── 7.Rmd ├── 7.html ├── 7.md ├── 8.Rmd ├── 8.html ├── 8.md ├── 9.Rmd ├── 9.html ├── 9.md ├── figure │ ├── 10a.png │ ├── 10b.png │ ├── 11e.png │ ├── 11f.png │ ├── 6a.png │ ├── 6aa.png │ ├── 6b.png │ ├── 6bb.png │ ├── 7_1.png │ ├── 8_1.png │ ├── 9a.png │ ├── 9c.png │ ├── 9d.png │ ├── 9f.png │ └── unnamed-chunk-1.png └── lab.R ├── ch8 ├── 1.Rmd ├── 1.html ├── 1.md ├── 10.Rmd ├── 10.html ├── 10.md ├── 11.Rmd ├── 11.html ├── 11.md ├── 12.Rmd ├── 12.html ├── 12.md ├── 2.Rmd ├── 2.html ├── 2.md ├── 3.Rmd ├── 3.html ├── 3.md ├── 4.Rmd ├── 4.html ├── 4.md ├── 5.Rmd ├── 5.html ├── 5.md ├── 6.Rmd ├── 6.html ├── 6.md ├── 7.Rmd ├── 7.html ├── 7.md ├── 8.Rmd ├── 8.html ├── 8.md ├── 9.Rmd ├── 9.html ├── 9.md ├── figure │ ├── 1.png │ ├── 10c.png │ ├── 10d.png │ ├── 10f.png │ ├── 11b.png │ ├── 3.png │ ├── 4b.png │ ├── 8c.png │ ├── 8c1.png │ ├── 8c2.png │ ├── 9a.png │ ├── 9d.png │ ├── 9g.png │ ├── b8.png │ └── f.png └── lab.R ├── ch9 ├── 1.Rmd ├── 1.html ├── 1.md ├── 2.Rmd ├── 2.html ├── 2.md ├── 3.Rmd ├── 3.html ├── 3.md ├── 4.Rmd ├── 4.html ├── 4.md ├── 5.Rmd ├── 5.html ├── 5.md ├── 6.Rmd ├── 6.html ├── 6.md ├── 7.Rmd ├── 7.html ├── 7.md ├── 8.Rmd ├── 8.html ├── 8.md ├── 9.R ├── 9.R.png ├── Lab_chapter9.R ├── R_exercise_video.R ├── figure │ ├── 1.png │ ├── 2a.png │ ├── 2b.png │ ├── 2c.png │ ├── 3a.png │ ├── 3b.png │ ├── 3d.png │ ├── 3e.png │ ├── 3g.png │ ├── 3h.png │ ├── 4a.png │ ├── 4b.png │ ├── 4c.png │ ├── 4d.png │ ├── 4e1.png │ ├── 4e2.png │ ├── 4e3.png │ ├── 5b.png │ ├── 5d.png │ ├── 5f.png │ ├── 5g.png │ ├── 5h.png │ ├── 6a.png │ ├── 6c.png │ ├── 7d1.png │ ├── 7d10.png │ ├── 7d11.png │ ├── 7d12.png │ ├── 7d13.png │ ├── 7d14.png │ ├── 7d15.png │ ├── 7d16.png │ ├── 7d17.png │ ├── 7d18.png │ ├── 7d19.png │ ├── 7d2.png │ ├── 7d20.png │ ├── 7d21.png │ ├── 7d3.png │ ├── 7d4.png │ ├── 7d5.png │ ├── 7d6.png │ ├── 7d7.png │ ├── 7d8.png │ ├── 7d9.png │ └── 9a.png └── lab.R ├── data ├── Auto.csv ├── Auto.data └── College.csv └── index.html /.gitignore: -------------------------------------------------------------------------------- 1 | *~ 2 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | stat-learning 2 | ============= 3 | 4 | Notes and exercise attempts for "An Introduction to Statistical Learning" 5 | 6 | http://www.statlearning.com 7 | http://statlearning.class.stanford.edu/ 8 | 9 | "(*)" means I am not sure about the answer 10 | 11 | Try out RStudio (www.RStudio.com) as an R IDE with the knitr package. 12 | 13 | Pull requests gladly accepted. If a pull request is too much effort, please at least file a new issue. :) 14 | 15 | Visit http://asadoughi.github.io/stat-learning for an index of exercise solutions. 16 | -------------------------------------------------------------------------------- /ch10/1.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Chapter 10: Exercise 1" 3 | output: html_document 4 | --- 5 | 6 | ### a 7 | Proof of Equation (10.12) $$ 8 | \frac{1}{|C_k|} \sum\limits_{i,i^{\prime} \in C_k} \sum\limits_{j=1}^p (x_{ij} - x_{i^\prime j})^2 = 9 | 2 \sum\limits_{i \in C_k} \sum\limits_{j=1}^{p} (x_{ij} - \bar{x}_{kj})^2 10 | \\ 11 | = \frac{1}{|C_k|} \sum\limits_{i,i^{\prime} \in C_k} \sum\limits_{j=1}^p ((x_{ij} - \bar{x}_{kj}) - (x_{i^\prime j} - \bar{x}_{kj}))^2 12 | \\ 13 | = \frac{1}{|C_k|} \sum\limits_{i,i^{\prime} \in C_k} \sum\limits_{j=1}^p ((x_{ij} - \bar{x}_{kj})^2 - 2 (x_{ij} - \bar{x}_{kj})(x_{i^\prime j} - \bar{x}_{kj}) + (x_{i^\prime j} - \bar{x}_{kj})^2) 14 | \\ 15 | = \frac{|C_k|}{|C_k|} \sum\limits_{i \in C_k} \sum\limits_{j=1}^p (x_{ij} - \bar{x}_{kj})^2 + 16 | \frac{|C_k|}{|C_k|} \sum\limits_{i^{\prime} \in C_k} \sum\limits_{j=1}^p (x_{i^\prime j} - \bar{x}_{kj})^2 - 17 | \frac{2}{|C_k|} \sum\limits_{i,i^{\prime} \in C_k} \sum\limits_{j=1}^p (x_{ij} - \bar{x}_{kj})(x_{i^\prime j} - \bar{x}_{kj}) 18 | \\ 19 | = 2 \sum\limits_{i \in C_k} \sum\limits_{j=1}^p (x_{ij} - \bar{x}_{kj})^2 + 0 20 | $$ 21 | 22 | ### b 23 | Equation (10.12) shows that minimizing the sum of the squared Euclidean 24 | distance for each cluster is the same as minimizing the within-cluster variance 25 | for each cluster. -------------------------------------------------------------------------------- /ch10/10.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: 'Chapter 10: Exercise 10' 3 | output: html_document 4 | --- 5 | 6 | ### a 7 | ```{r} 8 | set.seed(2) 9 | x = matrix(rnorm(20*3*50, mean=0, sd=0.001), ncol=50) 10 | x[1:20, 2] = 1 11 | x[21:40, 1] = 2 12 | x[21:40, 2] = 2 13 | x[41:60, 1] = 1 14 | ``` 15 | The concept here is to separate the three classes amongst two dimensions. 16 | 17 | ### b 18 | ```{r 10b} 19 | pca.out = prcomp(x) 20 | summary(pca.out) 21 | pca.out$x[,1:2] 22 | plot(pca.out$x[,1:2], col=2:4, xlab="Z1", ylab="Z2", pch=19) 23 | ``` 24 | 25 | ### c 26 | ```{r} 27 | km.out = kmeans(x, 3, nstart=20) 28 | table(km.out$cluster, c(rep(1,20), rep(2,20), rep(3,20))) 29 | ``` 30 | Perfect match. 31 | 32 | ### d 33 | ```{r} 34 | km.out = kmeans(x, 2, nstart=20) 35 | km.out$cluster 36 | ``` 37 | All of one previous class absorbed into a single class. 38 | 39 | ### e 40 | ```{r} 41 | km.out = kmeans(x, 4, nstart=20) 42 | km.out$cluster 43 | ``` 44 | All of one previous cluster split into two clusters. 45 | 46 | ### f 47 | ```{r} 48 | km.out = kmeans(pca.out$x[,1:2], 3, nstart=20) 49 | table(km.out$cluster, c(rep(1,20), rep(2,20), rep(3,20))) 50 | ``` 51 | Perfect match, once again. 52 | 53 | ### g 54 | ```{r} 55 | km.out = kmeans(scale(x), 3, nstart=20) 56 | km.out$cluster 57 | ``` 58 | Poorer results than (b): the scaling of the observations effects the distance 59 | between them. 60 | 61 | -------------------------------------------------------------------------------- /ch10/11.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Chapter 10: Exercise 11" 3 | output: html_document 4 | --- 5 | 6 | ### a 7 | ```{r} 8 | data = read.csv("./Ch10Ex11.csv", header=F) 9 | dim(data) 10 | ``` 11 | 12 | ### b 13 | ```{r 2b} 14 | dd = as.dist(1 - cor(data)) 15 | plot(hclust(dd, method="complete")) 16 | plot(hclust(dd, method="single")) 17 | plot(hclust(dd, method="average")) 18 | ``` 19 | 20 | Two or three groups depending on the linkage method. 21 | 22 | ### c 23 | To look at which genes differ the most across the healthy patients and diseased 24 | patients, we could look at the loading vectors outputted from 25 | PCA to see which genes are used to describe the variance the most. 26 | ```{r} 27 | pr.out = prcomp(t(data)) 28 | summary(pr.out) 29 | total_load = apply(pr.out$rotation, 1, sum) 30 | indices = order(abs(total_load), decreasing=T) 31 | indices[1:10] 32 | total_load[indices[1:10]] 33 | ``` 34 | This shows one representation of the top 1% of differing genes. 35 | 36 | (*) I'm not sure this is the correct way to aggregate the loading vector. 37 | 38 | 39 | 40 | -------------------------------------------------------------------------------- /ch10/2.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: 'Chapter 10: Exercise 2' 3 | output: html_document 4 | --- 5 | 6 | ### a 7 | ```{r 2a} 8 | d = as.dist(matrix(c(0, 0.3, 0.4, 0.7, 9 | 0.3, 0, 0.5, 0.8, 10 | 0.4, 0.5, 0.0, 0.45, 11 | 0.7, 0.8, 0.45, 0.0), nrow=4)) 12 | plot(hclust(d, method="complete")) 13 | ``` 14 | 15 | ### b 16 | ```{r 2b} 17 | plot(hclust(d, method="single")) 18 | ``` 19 | 20 | ### c 21 | (1,2), (3,4) 22 | 23 | ### d 24 | (1, 2, 3), (4) 25 | 26 | ### e 27 | ```{r 2e} 28 | plot(hclust(d, method="complete"), labels=c(2,1,4,3)) 29 | ``` -------------------------------------------------------------------------------- /ch10/3.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: 'Chapter 10: Exercise 3' 3 | output: html_document 4 | --- 5 | 6 | ```{r} 7 | set.seed(1) 8 | x = cbind(c(1, 1, 0, 5, 6, 4), c(4, 3, 4, 1, 2, 0)) 9 | x 10 | ``` 11 | 12 | ### a 13 | ```{r} 14 | plot(x[,1], x[,2]) 15 | ``` 16 | 17 | ### b 18 | ```{r} 19 | labels = sample(2, nrow(x), replace=T) 20 | labels 21 | ``` 22 | 23 | ### c 24 | ```{r} 25 | centroid1 = c(mean(x[labels==1, 1]), mean(x[labels==1, 2])) 26 | centroid2 = c(mean(x[labels==2, 1]), mean(x[labels==2, 2])) 27 | centroid1 28 | centroid2 29 | plot(x[,1], x[,2], col=(labels+1), pch=20, cex=2) 30 | points(centroid1[1], centroid1[2], col=2, pch=4) 31 | points(centroid2[1], centroid2[2], col=3, pch=4) 32 | ``` 33 | 34 | ### d 35 | ```{r} 36 | euclid = function(a, b) { 37 | return(sqrt((a[1] - b[1])^2 + (a[2]-b[2])^2)) 38 | } 39 | assign_labels = function(x, centroid1, centroid2) { 40 | labels = rep(NA, nrow(x)) 41 | for (i in 1:nrow(x)) { 42 | if (euclid(x[i,], centroid1) < euclid(x[i,], centroid2)) { 43 | labels[i] = 1 44 | } else { 45 | labels[i] = 2 46 | } 47 | } 48 | return(labels) 49 | } 50 | labels = assign_labels(x, centroid1, centroid2) 51 | labels 52 | ``` 53 | 54 | ### e 55 | ```{r} 56 | last_labels = rep(-1, 6) 57 | while (!all(last_labels == labels)) { 58 | last_labels = labels 59 | centroid1 = c(mean(x[labels==1, 1]), mean(x[labels==1, 2])) 60 | centroid2 = c(mean(x[labels==2, 1]), mean(x[labels==2, 2])) 61 | print(centroid1) 62 | print(centroid2) 63 | labels = assign_labels(x, centroid1, centroid2) 64 | } 65 | labels 66 | ``` 67 | 68 | ### f 69 | ```{r} 70 | plot(x[,1], x[,2], col=(labels+1), pch=20, cex=2) 71 | points(centroid1[1], centroid1[2], col=2, pch=4) 72 | points(centroid2[1], centroid2[2], col=3, pch=4) 73 | ``` 74 | -------------------------------------------------------------------------------- /ch10/4.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: 'Chapter 10: Exercise 4' 3 | output: html_document 4 | --- 5 | 6 | ### a 7 | Not enough information to tell. The maximal intercluster dissimilarity could be 8 | equal or not equal to the minimial intercluster dissimilarity. If the 9 | dissimilarities were equal, they would fuse at the same height. If they were 10 | not equal, they single linkage dendogram would fuse at a lower height. 11 | 12 | ### b 13 | They would fuse at the same height because linkage does not affect leaf-to-leaf 14 | fusion. 15 | -------------------------------------------------------------------------------- /ch10/5.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: 'Chapter 10: Exercise 5' 3 | output: html_document 4 | --- 5 | 6 | Clusters selected based on two-dimensional Euclidean distance. 7 | 8 | ### a 9 | Least socks and computers (3, 4, 6, 8) versus more socks and computers 10 | (1, 2, 7, 8). 11 | 12 | ### b 13 | Purchased computer (5, 6, 7, 8) versus no computer purchase (1, 2, 3, 4). The 14 | distance on the computer dimension is greater than the distance on the socks 15 | dimension. 16 | 17 | ### c 18 | Purchased computer (5, 6, 7, 8) versus no computer purchase (1, 2, 3, 4). -------------------------------------------------------------------------------- /ch10/6.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Chapter 10: Exercise 6" 3 | output: html_document 4 | --- 5 | 6 | ### a 7 | The first principal component "explains 10% of the variation" means 90% of the 8 | information in the gene data set is lost by projecting the tissue sample 9 | observations onto the first principal component. Another way of explaining it is 10 | 90% of the variance in the data is not contained in the first principal 11 | component. 12 | 13 | ### b 14 | Given the flaw shown in pre-analysis of a time-wise linear trend amongst the 15 | tissue samples' first principal component, I would advise the researcher to 16 | include the machine used (A vs B) as a feature of the data set. This should 17 | enhance the PVE of the first principal component before applying the two-sample 18 | t-test. 19 | 20 | ### c 21 | ```{r} 22 | set.seed(1) 23 | Control = matrix(rnorm(50*1000), ncol=50) 24 | Treatment = matrix(rnorm(50*1000), ncol=50) 25 | X = cbind(Control, Treatment) 26 | X[1,] = seq(-18, 18 - .36, .36) # linear trend in one dimension 27 | ``` 28 | 29 | ```{r} 30 | pr.out = prcomp(scale(X)) 31 | summary(pr.out)$importance[,1] 32 | ``` 33 | 9.911% variance explained by the first principal component. 34 | 35 | Now, adding in A vs B via 10 vs 0 encoding. 36 | ```{r} 37 | X = rbind(X, c(rep(10, 50), rep(0, 50))) 38 | pr.out = prcomp(scale(X)) 39 | summary(pr.out)$importance[,1] 40 | ``` 41 | 11.54% variance explained by the first principal component. That's an 42 | improvement of 1.629%. 43 | 44 | (*) I'm sure a better simulation could be derived from someone more versed in 45 | PCA. 46 | 47 | -------------------------------------------------------------------------------- /ch10/7.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: 'Chapter 10: Exercise 7' 3 | output: html_document 4 | --- 5 | 6 | ```{r} 7 | library(ISLR) 8 | set.seed(1) 9 | ``` 10 | 11 | ```{r} 12 | dsc = scale(USArrests) 13 | a = dist(dsc)^2 14 | b = as.dist(1 - cor(t(dsc))) 15 | summary(b/a) 16 | ``` 17 | 18 | -------------------------------------------------------------------------------- /ch10/8.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: 'Chapter 10: Exercise 8' 3 | output: html_document 4 | --- 5 | 6 | ```{r} 7 | library(ISLR) 8 | set.seed(1) 9 | ``` 10 | 11 | ### a 12 | ```{r} 13 | pr.out = prcomp(USArrests, center=T, scale=T) 14 | pr.var = pr.out$sdev^2 15 | pve = pr.var / sum(pr.var) 16 | pve 17 | ``` 18 | 19 | ### b 20 | ```{r} 21 | loadings = pr.out$rotation 22 | pve2 = rep(NA, 4) 23 | dmean = apply(USArrests, 2, mean) 24 | dsdev = sqrt(apply(USArrests, 2, var)) 25 | dsc = sweep(USArrests, MARGIN=2, dmean, "-") 26 | dsc = sweep(dsc, MARGIN=2, dsdev, "/") 27 | for (i in 1:4) { 28 | proto_x = sweep(dsc, MARGIN=2, loadings[,i], "*") 29 | pc_x = apply(proto_x, 1, sum) 30 | pve2[i] = sum(pc_x^2) 31 | } 32 | pve2 = pve2/sum(dsc^2) 33 | pve2 34 | ``` 35 | -------------------------------------------------------------------------------- /ch10/9.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: 'Chapter 10: Exercise 9' 3 | output: html_document 4 | --- 5 | 6 | ```{r} 7 | library(ISLR) 8 | set.seed(2) 9 | ``` 10 | 11 | ### a 12 | ```{r} 13 | hc.complete = hclust(dist(USArrests), method="complete") 14 | plot(hc.complete) 15 | ``` 16 | 17 | ### b 18 | ```{r} 19 | cutree(hc.complete, 3) 20 | table(cutree(hc.complete, 3)) 21 | ``` 22 | 23 | ### c 24 | ```{r} 25 | dsc = scale(USArrests) 26 | hc.s.complete = hclust(dist(dsc), method="complete") 27 | plot(hc.s.complete) 28 | ``` 29 | 30 | ### d 31 | ```{r} 32 | cutree(hc.s.complete, 3) 33 | table(cutree(hc.s.complete, 3)) 34 | table(cutree(hc.s.complete, 3), cutree(hc.complete, 3)) 35 | ``` 36 | Scaling the variables effects the max height of the dendogram obtained from 37 | hierarchical clustering. From a cursory glance, it doesn't effect the bushiness 38 | of the tree obtained. However, it does affect the clusters obtained from cutting 39 | the dendogram into 3 clusters. In my opinion, for this data set the data should 40 | be standardized because the data measured has different units ($UrbanPop$ 41 | compared to other three columns). 42 | -------------------------------------------------------------------------------- /ch2/3a.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch2/3a.jpg -------------------------------------------------------------------------------- /ch2/lab.R: -------------------------------------------------------------------------------- 1 | # Chapter 2 Lab: Introduction to R 2 | 3 | # Basic Commands 4 | 5 | x <- c(1,3,2,5) 6 | x 7 | x = c(1,6,2) 8 | x 9 | y = c(1,4,3) 10 | length(x) 11 | length(y) 12 | x+y 13 | ls() 14 | rm(x,y) 15 | ls() 16 | rm(list=ls()) 17 | ?matrix 18 | x=matrix(data=c(1,2,3,4), nrow=2, ncol=2) 19 | x 20 | x=matrix(c(1,2,3,4),2,2) 21 | matrix(c(1,2,3,4),2,2,byrow=TRUE) 22 | sqrt(x) 23 | x^2 24 | x=rnorm(50) 25 | y=x+rnorm(50,mean=50,sd=.1) 26 | cor(x,y) 27 | set.seed(1303) 28 | rnorm(50) 29 | set.seed(3) 30 | y=rnorm(100) 31 | mean(y) 32 | var(y) 33 | sqrt(var(y)) 34 | sd(y) 35 | 36 | # Graphics 37 | 38 | x=rnorm(100) 39 | y=rnorm(100) 40 | plot(x,y) 41 | plot(x,y,xlab="this is the x-axis",ylab="this is the y-axis",main="Plot of X vs Y") 42 | pdf("Figure.pdf") 43 | plot(x,y,col="green") 44 | dev.off() 45 | x=seq(1,10) 46 | x 47 | x=1:10 48 | x 49 | x=seq(-pi,pi,length=50) 50 | y=x 51 | f=outer(x,y,function(x,y)cos(y)/(1+x^2)) 52 | contour(x,y,f) 53 | contour(x,y,f,nlevels=45,add=T) 54 | fa=(f-t(f))/2 55 | contour(x,y,fa,nlevels=15) 56 | image(x,y,fa) 57 | persp(x,y,fa) 58 | persp(x,y,fa,theta=30) 59 | persp(x,y,fa,theta=30,phi=20) 60 | persp(x,y,fa,theta=30,phi=70) 61 | persp(x,y,fa,theta=30,phi=40) 62 | 63 | # Indexing Data 64 | 65 | A=matrix(1:16,4,4) 66 | A 67 | A[2,3] 68 | A[c(1,3),c(2,4)] 69 | A[1:3,2:4] 70 | A[1:2,] 71 | A[,1:2] 72 | A[1,] 73 | A[-c(1,3),] 74 | A[-c(1,3),-c(1,3,4)] 75 | dim(A) 76 | 77 | # Loading Data 78 | 79 | Auto=read.table("Auto.data") 80 | fix(Auto) 81 | Auto=read.table("Auto.data",header=T,na.strings="?") 82 | fix(Auto) 83 | Auto=read.csv("Auto.csv",header=T,na.strings="?") 84 | fix(Auto) 85 | dim(Auto) 86 | Auto[1:4,] 87 | Auto=na.omit(Auto) 88 | dim(Auto) 89 | names(Auto) 90 | 91 | # Additional Graphical and Numerical Summaries 92 | 93 | plot(cylinders, mpg) 94 | plot(Auto$cylinders, Auto$mpg) 95 | attach(Auto) 96 | plot(cylinders, mpg) 97 | cylinders=as.factor(cylinders) 98 | plot(cylinders, mpg) 99 | plot(cylinders, mpg, col="red") 100 | plot(cylinders, mpg, col="red", varwidth=T) 101 | plot(cylinders, mpg, col="red", varwidth=T,horizontal=T) 102 | plot(cylinders, mpg, col="red", varwidth=T, xlab="cylinders", ylab="MPG") 103 | hist(mpg) 104 | hist(mpg,col=2) 105 | hist(mpg,col=2,breaks=15) 106 | pairs(Auto) 107 | pairs(~ mpg + displacement + horsepower + weight + acceleration, Auto) 108 | plot(horsepower,mpg) 109 | identify(horsepower,mpg,name) 110 | summary(Auto) 111 | summary(mpg) 112 | -------------------------------------------------------------------------------- /ch2/notes/Hypercube-boundary-experiment.R: -------------------------------------------------------------------------------- 1 | 2 | set.seed(11) 3 | ## create a data frame of 3 columns and 10,000 rows filled with 4 | ## random uniform data 5 | cube <- data.frame(matrix(runif(3*10000, min = 0, 1), ncol=3)) 6 | 7 | ## get the max and min for each row 8 | cube[, "max"] <- apply(cube, MARGIN = 1, max) 9 | cube[, "min"] <- apply(cube, MARGIN = 1, min) 10 | 11 | ## add a column to mark if the row is considered in the boundary 12 | ## initially this column will be filled with 0 (not boundary) 13 | cube$boundary <-0 14 | 15 | ## set boundary to 1 if min is less than .05 16 | cube$boundary[cube$min < .05] <-1 17 | ## or max greater than .95 18 | cube$boundary[cube$max > .95] <-1 19 | sum(cube$boundary) 20 | 21 | ## Expected answer 22 | 10000 * (1-.9^3) 23 | 24 | ##-------------------------------------------------------- 25 | 26 | set.seed(11) 27 | 28 | ## create a data frame of 50 columns and 10,000 rows filled with 29 | ## random uniform data 30 | hypercube <- data.frame(matrix(runif(50*10000, min = 0, 1), ncol=50)) 31 | 32 | ## get the max and min for each row 33 | hypercube[, "max"] <- apply(hypercube, MARGIN = 1, max) 34 | hypercube[, "min"] <- apply(hypercube, MARGIN = 1, min) 35 | 36 | 37 | ## add a column to mark if the row is considered in the boundary 38 | ## initially this column will be filled with 0 (not boundary) 39 | hypercube$boundary <-0 40 | 41 | ## set boundary to 1 if min is less than .05 42 | hypercube$boundary[hypercube$min < .05] <-1 43 | ## or max greater than .95 44 | hypercube$boundary[hypercube$max > .95] <-1 45 | sum(hypercube$boundary) 46 | 47 | ## Expected answer 48 | 10000 * (1-.9^50) 49 | -------------------------------------------------------------------------------- /ch2/notes/curse_dimensionality.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch2/notes/curse_dimensionality.png -------------------------------------------------------------------------------- /ch2/notes/dimension.R: -------------------------------------------------------------------------------- 1 | # This script creates plots illustrating the curse of dimensionality 2 | 3 | # Choose points, randomly distributed 4 | n <- 3000 # number of points 5 | p <- 2 # number of dimensions 6 | 7 | x <- matrix(runif(n*p), ncol=p) 8 | 9 | Fringe <- function(z){ 10 | # Determine if z lies within 0.05 distance from the boundary 11 | # z is a vector of length p 12 | any(abs(z - 0.5) > 0.45) 13 | } 14 | 15 | result <- apply(x, 1, Fringe) 16 | 17 | png('edges.png') 18 | plot(x[,1:2]) 19 | points(x[,1][result], x[,2][result], col='red') 20 | dev.off() 21 | 22 | ratio <- sum(result) / n 23 | 24 | # plot p versus ratio 25 | p <- 1:50 26 | 27 | GetRatio <- function(dim){ 28 | 1 - (0.9)^dim 29 | } 30 | ratio <- sapply(p, GetRatio) 31 | 32 | png('curse_dimensionality.png') 33 | plot(p, ratio) 34 | dev.off() 35 | -------------------------------------------------------------------------------- /ch2/notes/edges.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch2/notes/edges.png -------------------------------------------------------------------------------- /ch2/notes/neighbor.R: -------------------------------------------------------------------------------- 1 | # Graph the technique described by Trevor Hastie in Lecture 2.1 2 | 3 | set.seed(37) 4 | 5 | # Generate sample data 6 | n <- 20 7 | x0 <- sort(runif(n, min=0, max=10)) 8 | y0 <- 0.5 * x0 + rnorm(n, sd=0.2) 9 | 10 | Estimate <- function(x, neighborhood=2){ 11 | # Estimates x as the average value of y within neighborhood 12 | # Note this implementation is not k nearest neighbors, 13 | # since it uses a neighborhood of fixed size. 14 | mean(y0[abs(x - x0) < neighborhood]) 15 | } 16 | 17 | x <- seq(0, 10, length.out=2000) 18 | yhat <- sapply(x, Estimate) 19 | 20 | png('neighbor.png') 21 | plot(x, yhat, col='green') 22 | points(x0, y0) 23 | dev.off() 24 | -------------------------------------------------------------------------------- /ch2/notes/neighbor.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch2/notes/neighbor.png -------------------------------------------------------------------------------- /ch3/5.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch3/5.jpg -------------------------------------------------------------------------------- /ch3/answers: -------------------------------------------------------------------------------- 1 | 1. In Table 3.4, the null hypothesis for "TV" is that in the presence of radio 2 | ads and newspaper ads, TV ads have no effect on sales. Similarly, the null 3 | hypothesis for "radio" is that in the presence of TV and newspaper ads, radio 4 | ads have no effect on sales. (And there is a similar null hypothesis for 5 | "newspaper".) The low p-values of TV and radio suggest that the null hypotheses 6 | are false for TV and radio. The high p-value of newspaper suggests that the null 7 | hypothesis is true for newspaper. 8 | 9 | 10 | 2. KNN classifier and KNN regression methods are closely related in formula. 11 | However, the final result of KNN classifier is the classification output for Y 12 | (qualitative), where as the output for a KNN regression predicts the 13 | quantitative value for f(X). 14 | 15 | 16 | 3. Y = 50 + 20(gpa) + 0.07(iq) + 35(gender) + 0.01(gpa * iq) - 10 (gpa * gender) 17 | 18 | (a) Y = 50 + 20 k_1 + 0.07 k_2 + 35 gender + 0.01(k_1 * k_2) - 10 (k_1 * gender) 19 | male: (gender = 0) 50 + 20 k_1 + 0.07 k_2 + 0.01(k_1 * k_2) 20 | female: (gender = 1) 50 + 20 k_1 + 0.07 k_2 + 35 + 0.01(k_1 * k_2) - 10 (k_1) 21 | 22 | Once the GPA is high enough, males earn more on average. => iii. 23 | 24 | (b) Y(Gender = 1, IQ = 110, GPA = 4.0) 25 | = 50 + 20 * 4 + 0.07 * 110 + 35 + 0.01 (4 * 110) - 10 * 4 26 | = 137.1 27 | 28 | (c) False. We must examine the p-value of the regression coefficient to 29 | determine if the interaction term is statistically significant or not. 30 | 31 | 32 | 4. (a) I would expect the polynomial regression to have a lower training RSS 33 | than the linear regression because it could make a tighter fit against data that 34 | matched with a wider irreducible error (Var(epsilon)). 35 | 36 | (b) Converse to (a), I would expect the polynomial regression to have a higher 37 | test RSS as the overfit from training would have more error than the linear 38 | regression. 39 | 40 | (c) Polynomial regression has lower train RSS than the linear fit because of 41 | higher flexibility: no matter what the underlying true relationshop is the 42 | more flexible model will closer follow points and reduce train RSS. 43 | An example of this beahvior is shown on Figure~2.9 from Chapter 2. 44 | 45 | (d) There is not enough information to tell which test RSS would be lower 46 | for either regression given the problem statement is defined as not knowing 47 | "how far it is from linear". If it is closer to linear than cubic, the linear 48 | regression test RSS could be lower than the cubic regression test RSS. 49 | Or, if it is closer to cubic than linear, the cubic regression test RSS 50 | could be lower than the linear regression test RSS. It is dues to 51 | bias-variance tradeoff: it is not clear what level of flexibility will 52 | fit data better. 53 | 54 | 55 | 5. See 5.jpg. 56 | 57 | 58 | 6. y = B_0 + B_1 x 59 | from (3.4): B_0 = avg(y) - B_1 avg(x) 60 | right hand side will equal 0 if (avg(x), avg(y)) is a point on the line 61 | 0 = B_0 + B_1 avg(x) - avg(y) 62 | 0 = (avg(y) - B_1 avg(x)) + B_1 avg(x) - avg(y) 63 | 0 = 0 64 | -------------------------------------------------------------------------------- /ch3/figure/unnamed-chunk-11.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch3/figure/unnamed-chunk-11.png -------------------------------------------------------------------------------- /ch3/figure/unnamed-chunk-111.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch3/figure/unnamed-chunk-111.png -------------------------------------------------------------------------------- /ch3/figure/unnamed-chunk-112.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch3/figure/unnamed-chunk-112.png -------------------------------------------------------------------------------- /ch3/figure/unnamed-chunk-113.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch3/figure/unnamed-chunk-113.png -------------------------------------------------------------------------------- /ch3/figure/unnamed-chunk-114.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch3/figure/unnamed-chunk-114.png -------------------------------------------------------------------------------- /ch3/figure/unnamed-chunk-115.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch3/figure/unnamed-chunk-115.png -------------------------------------------------------------------------------- /ch3/figure/unnamed-chunk-12.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch3/figure/unnamed-chunk-12.png -------------------------------------------------------------------------------- /ch3/figure/unnamed-chunk-16.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch3/figure/unnamed-chunk-16.png -------------------------------------------------------------------------------- /ch3/figure/unnamed-chunk-17.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch3/figure/unnamed-chunk-17.png -------------------------------------------------------------------------------- /ch3/figure/unnamed-chunk-2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch3/figure/unnamed-chunk-2.png -------------------------------------------------------------------------------- /ch3/figure/unnamed-chunk-28.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch3/figure/unnamed-chunk-28.png -------------------------------------------------------------------------------- /ch3/figure/unnamed-chunk-3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch3/figure/unnamed-chunk-3.png -------------------------------------------------------------------------------- /ch3/figure/unnamed-chunk-30.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch3/figure/unnamed-chunk-30.png -------------------------------------------------------------------------------- /ch3/figure/unnamed-chunk-31.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch3/figure/unnamed-chunk-31.png -------------------------------------------------------------------------------- /ch3/figure/unnamed-chunk-32.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch3/figure/unnamed-chunk-32.png -------------------------------------------------------------------------------- /ch3/figure/unnamed-chunk-33.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch3/figure/unnamed-chunk-33.png -------------------------------------------------------------------------------- /ch3/figure/unnamed-chunk-34.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch3/figure/unnamed-chunk-34.png -------------------------------------------------------------------------------- /ch3/figure/unnamed-chunk-36.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch3/figure/unnamed-chunk-36.png -------------------------------------------------------------------------------- /ch3/figure/unnamed-chunk-38.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch3/figure/unnamed-chunk-38.png -------------------------------------------------------------------------------- /ch3/figure/unnamed-chunk-4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch3/figure/unnamed-chunk-4.png -------------------------------------------------------------------------------- /ch3/figure/unnamed-chunk-41.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch3/figure/unnamed-chunk-41.png -------------------------------------------------------------------------------- /ch3/figure/unnamed-chunk-42.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch3/figure/unnamed-chunk-42.png -------------------------------------------------------------------------------- /ch3/figure/unnamed-chunk-43.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch3/figure/unnamed-chunk-43.png -------------------------------------------------------------------------------- /ch3/figure/unnamed-chunk-441.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch3/figure/unnamed-chunk-441.png -------------------------------------------------------------------------------- /ch3/figure/unnamed-chunk-442.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch3/figure/unnamed-chunk-442.png -------------------------------------------------------------------------------- /ch3/figure/unnamed-chunk-443.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch3/figure/unnamed-chunk-443.png -------------------------------------------------------------------------------- /ch3/figure/unnamed-chunk-47.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch3/figure/unnamed-chunk-47.png -------------------------------------------------------------------------------- /ch3/figure/unnamed-chunk-5.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch3/figure/unnamed-chunk-5.png -------------------------------------------------------------------------------- /ch3/figure/unnamed-chunk-8.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch3/figure/unnamed-chunk-8.png -------------------------------------------------------------------------------- /ch3/figure/unnamed-chunk-81.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch3/figure/unnamed-chunk-81.png -------------------------------------------------------------------------------- /ch3/figure/unnamed-chunk-82.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch3/figure/unnamed-chunk-82.png -------------------------------------------------------------------------------- /ch3/figure/unnamed-chunk-83.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch3/figure/unnamed-chunk-83.png -------------------------------------------------------------------------------- /ch3/figure/unnamed-chunk-84.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch3/figure/unnamed-chunk-84.png -------------------------------------------------------------------------------- /ch3/figure/unnamed-chunk-9.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch3/figure/unnamed-chunk-9.png -------------------------------------------------------------------------------- /ch3/lab.R: -------------------------------------------------------------------------------- 1 | # Chapter 3 Lab: Linear Regression 2 | 3 | library(MASS) 4 | library(ISLR) 5 | 6 | # Simple Linear Regression 7 | 8 | fix(Boston) 9 | names(Boston) 10 | lm.fit=lm(medv~lstat) 11 | lm.fit=lm(medv~lstat,data=Boston) 12 | attach(Boston) 13 | lm.fit=lm(medv~lstat) 14 | lm.fit 15 | summary(lm.fit) 16 | names(lm.fit) 17 | coef(lm.fit) 18 | confint(lm.fit) 19 | predict(lm.fit,data.frame(lstat=(c(5,10,15))), interval="confidence") 20 | predict(lm.fit,data.frame(lstat=(c(5,10,15))), interval="prediction") 21 | plot(lstat,medv) 22 | abline(lm.fit) 23 | abline(lm.fit,lwd=3) 24 | abline(lm.fit,lwd=3,col="red") 25 | plot(lstat,medv,col="red") 26 | plot(lstat,medv,pch=20) 27 | plot(lstat,medv,pch="+") 28 | plot(1:20,1:20,pch=1:20) 29 | par(mfrow=c(2,2)) 30 | plot(lm.fit) 31 | plot(predict(lm.fit), residuals(lm.fit)) 32 | plot(predict(lm.fit), rstudent(lm.fit)) 33 | plot(hatvalues(lm.fit)) 34 | which.max(hatvalues(lm.fit)) 35 | 36 | # Multiple Linear Regression 37 | 38 | lm.fit=lm(medv~lstat+age,data=Boston) 39 | summary(lm.fit) 40 | lm.fit=lm(medv~.,data=Boston) 41 | summary(lm.fit) 42 | library(car) 43 | vif(lm.fit) 44 | lm.fit1=lm(medv~.-age,data=Boston) 45 | summary(lm.fit1) 46 | lm.fit1=update(lm.fit, ~.-age) 47 | 48 | # Interaction Terms 49 | 50 | summary(lm(medv~lstat*age,data=Boston)) 51 | 52 | # Non-linear Transformations of the Predictors 53 | 54 | lm.fit2=lm(medv~lstat+I(lstat^2)) 55 | summary(lm.fit2) 56 | lm.fit=lm(medv~lstat) 57 | anova(lm.fit,lm.fit2) 58 | par(mfrow=c(2,2)) 59 | plot(lm.fit2) 60 | lm.fit5=lm(medv~poly(lstat,5)) 61 | summary(lm.fit5) 62 | summary(lm(medv~log(rm),data=Boston)) 63 | 64 | # Qualitative Predictors 65 | 66 | fix(Carseats) 67 | names(Carseats) 68 | lm.fit=lm(Sales~.+Income:Advertising+Price:Age,data=Carseats) 69 | summary(lm.fit) 70 | attach(Carseats) 71 | contrasts(ShelveLoc) 72 | 73 | # Writing Functions 74 | 75 | LoadLibraries 76 | LoadLibraries() 77 | LoadLibraries=function(){ 78 | library(ISLR) 79 | library(MASS) 80 | print("The libraries have been loaded.") 81 | } 82 | LoadLibraries 83 | LoadLibraries() 84 | -------------------------------------------------------------------------------- /ch4/1.Rmd: -------------------------------------------------------------------------------- 1 | 1 2 | ======================================================== 3 | 4 | $$ 5 | (4.2) p(X) = \frac {e^{\beta_0 + \beta_1 X}} {1 + e^{\beta_0 + \beta_1 X}} 6 | $$ 7 | 8 | So, $\frac {p(X)} {1 - p(X)}$ 9 | 10 | $$ 11 | = \frac {\frac {e^{\beta_0 + \beta_1 X}} {1 + e^{\beta_0 + \beta_1 X}}} 12 | {1 - \frac {e^{\beta_0 + \beta_1 X}} {1 + e^{\beta_0 + \beta_1 X}}} 13 | \\ 14 | = \frac {\frac {e^{\beta_0 + \beta_1 X}} {1 + e^{\beta_0 + \beta_1 X}}} 15 | { 16 | \frac {1 + e^{\beta_0 + \beta_1 X}} {1 + e^{\beta_0 + \beta_1 X}} 17 | - \frac {e^{\beta_0 + \beta_1 X}} {1 + e^{\beta_0 + \beta_1 X}} 18 | } 19 | \\ 20 | = \frac {\frac {e^{\beta_0 + \beta_1 X}} {1 + e^{\beta_0 + \beta_1 X}}} 21 | {\frac {1} {1 + e^{\beta_0 + \beta_1 X}}} 22 | \\ 23 | (4.3) \frac {p(X)} {1 - p(X)} =e^{\beta_0 + \beta_1 X} 24 | $$ 25 | -------------------------------------------------------------------------------- /ch4/1.md: -------------------------------------------------------------------------------- 1 | 1 2 | ======================================================== 3 | 4 | $$ 5 | (4.2) p(X) = \frac {e^{\beta_0 + \beta_1 X}} {1 + e^{\beta_0 + \beta_1 X}} 6 | $$ 7 | 8 | So, $\frac {p(X)} {1 - p(X)}$ 9 | 10 | $$ 11 | = \frac {\frac {e^{\beta_0 + \beta_1 X}} {1 + e^{\beta_0 + \beta_1 X}}} 12 | {1 - \frac {e^{\beta_0 + \beta_1 X}} {1 + e^{\beta_0 + \beta_1 X}}} 13 | \\ 14 | = \frac {\frac {e^{\beta_0 + \beta_1 X}} {1 + e^{\beta_0 + \beta_1 X}}} 15 | { 16 | \frac {1 + e^{\beta_0 + \beta_1 X}} {1 + e^{\beta_0 + \beta_1 X}} 17 | - \frac {e^{\beta_0 + \beta_1 X}} {1 + e^{\beta_0 + \beta_1 X}} 18 | } 19 | \\ 20 | = \frac {\frac {e^{\beta_0 + \beta_1 X}} {1 + e^{\beta_0 + \beta_1 X}}} 21 | {\frac {1} {1 + e^{\beta_0 + \beta_1 X}}} 22 | \\ 23 | (4.3) \frac {p(X)} {1 - p(X)} =e^{\beta_0 + \beta_1 X} 24 | $$ 25 | -------------------------------------------------------------------------------- /ch4/11.Rmd: -------------------------------------------------------------------------------- 1 | 11 2 | ======================================================== 3 | 4 | ### a 5 | ```{r} 6 | library(ISLR) 7 | summary(Auto) 8 | attach(Auto) 9 | mpg01 = rep(0, length(mpg)) 10 | mpg01[mpg>median(mpg)] = 1 11 | Auto = data.frame(Auto, mpg01) 12 | ``` 13 | 14 | ### b 15 | ```{r 11b} 16 | cor(Auto[,-9]) 17 | pairs(Auto) # doesn't work well since mpg01 is 0 or 1 18 | ``` 19 | Anti-correlated with cylinders, weight, displacement, horsepower. 20 | (mpg, of course) 21 | 22 | ### c 23 | ```{r} 24 | train = (year %% 2 == 0) # if the year is even 25 | test = !train 26 | Auto.train = Auto[train,] 27 | Auto.test = Auto[test,] 28 | mpg01.test = mpg01[test] 29 | ``` 30 | 31 | ### d 32 | ```{r} 33 | # LDA 34 | library(MASS) 35 | lda.fit = lda(mpg01~cylinders+weight+displacement+horsepower, 36 | data=Auto, subset=train) 37 | lda.pred = predict(lda.fit, Auto.test) 38 | mean(lda.pred$class != mpg01.test) 39 | ``` 40 | 12.6% test error rate. 41 | 42 | ### e 43 | ```{r} 44 | # QDA 45 | qda.fit = qda(mpg01~cylinders+weight+displacement+horsepower, 46 | data=Auto, subset=train) 47 | qda.pred = predict(qda.fit, Auto.test) 48 | mean(qda.pred$class != mpg01.test) 49 | ``` 50 | 13.2% test error rate. 51 | 52 | ### f 53 | ```{r} 54 | # Logistic regression 55 | glm.fit = glm(mpg01~cylinders+weight+displacement+horsepower, 56 | data=Auto, 57 | family=binomial, 58 | subset=train) 59 | glm.probs = predict(glm.fit, Auto.test, type="response") 60 | glm.pred = rep(0, length(glm.probs)) 61 | glm.pred[glm.probs > 0.5] = 1 62 | mean(glm.pred != mpg01.test) 63 | ``` 64 | 12.1% test error rate. 65 | 66 | ### g 67 | ```{r} 68 | library(class) 69 | train.X = cbind(cylinders, weight, displacement, horsepower)[train,] 70 | test.X = cbind(cylinders, weight, displacement, horsepower)[test,] 71 | train.mpg01 = mpg01[train] 72 | set.seed(1) 73 | # KNN(k=1) 74 | knn.pred = knn(train.X, test.X, train.mpg01, k=1) 75 | mean(knn.pred != mpg01.test) 76 | # KNN(k=10) 77 | knn.pred = knn(train.X, test.X, train.mpg01, k=10) 78 | mean(knn.pred != mpg01.test) 79 | # KNN(k=100) 80 | knn.pred = knn(train.X, test.X, train.mpg01, k=100) 81 | mean(knn.pred != mpg01.test) 82 | ``` 83 | k=1, 15.4% test error rate. 84 | k=10, 16.5% test error rate. 85 | k=100, 14.3% test error rate. 86 | K of 100 seems to perform the best. 100 nearest neighbors. 87 | -------------------------------------------------------------------------------- /ch4/12.Rmd: -------------------------------------------------------------------------------- 1 | 12 2 | ======================================================== 3 | 4 | ### a 5 | ```{r} 6 | Power = function() { 7 | 2^3 8 | } 9 | print(Power()) 10 | ``` 11 | 12 | ### b 13 | ```{r} 14 | Power2 = function(x, a) { 15 | x^a 16 | } 17 | Power2(3,8) 18 | ``` 19 | 20 | ### c 21 | ```{r} 22 | Power2(10, 3) 23 | Power2(8, 17) 24 | Power2(131, 3) 25 | ``` 26 | 27 | ### d 28 | ```{r} 29 | Power3 = function(x, a) { 30 | result = x^a 31 | return(result) 32 | } 33 | ``` 34 | 35 | ### e 36 | ```{r 12e} 37 | x = 1:10 38 | plot(x, Power3(x, 2), log="xy", ylab="Log of y = x^2", xlab="Log of x", 39 | main="Log of x^2 versus Log of x") 40 | ``` 41 | 42 | ### f 43 | ```{r 12f} 44 | PlotPower = function(x, a) { 45 | plot(x, Power3(x, a)) 46 | } 47 | PlotPower(1:10, 3) 48 | ``` 49 | -------------------------------------------------------------------------------- /ch4/12.md: -------------------------------------------------------------------------------- 1 | 12 2 | ======================================================== 3 | 4 | ### a 5 | 6 | ```r 7 | Power = function() { 8 | 2^3 9 | } 10 | print(Power()) 11 | ``` 12 | 13 | ``` 14 | ## [1] 8 15 | ``` 16 | 17 | 18 | ### b 19 | 20 | ```r 21 | Power2 = function(x, a) { 22 | x^a 23 | } 24 | Power2(3, 8) 25 | ``` 26 | 27 | ``` 28 | ## [1] 6561 29 | ``` 30 | 31 | 32 | ### c 33 | 34 | ```r 35 | Power2(10, 3) 36 | ``` 37 | 38 | ``` 39 | ## [1] 1000 40 | ``` 41 | 42 | ```r 43 | Power2(8, 17) 44 | ``` 45 | 46 | ``` 47 | ## [1] 2.252e+15 48 | ``` 49 | 50 | ```r 51 | Power2(131, 3) 52 | ``` 53 | 54 | ``` 55 | ## [1] 2248091 56 | ``` 57 | 58 | 59 | ### d 60 | 61 | ```r 62 | Power3 = function(x, a) { 63 | result = x^a 64 | return(result) 65 | } 66 | ``` 67 | 68 | 69 | ### e 70 | 71 | ```r 72 | x = 1:10 73 | plot(x, Power3(x, 2), log = "xy", ylab = "Log of y = x^2", xlab = "Log of x", 74 | main = "Log of x^2 versus Log of x") 75 | ``` 76 | 77 | ![plot of chunk 12e](figure/12e.png) 78 | 79 | 80 | ### f 81 | 82 | ```r 83 | PlotPower = function(x, a) { 84 | plot(x, Power3(x, a)) 85 | } 86 | PlotPower(1:10, 3) 87 | ``` 88 | 89 | ![plot of chunk 12f](figure/12f.png) 90 | 91 | -------------------------------------------------------------------------------- /ch4/13.Rmd: -------------------------------------------------------------------------------- 1 | 13 2 | ======================================================== 3 | 4 | ```{r} 5 | library(MASS) 6 | summary(Boston) 7 | attach(Boston) 8 | crime01 = rep(0, length(crim)) 9 | crime01[crim>median(crim)] = 1 10 | Boston = data.frame(Boston, crime01) 11 | 12 | train = 1:(dim(Boston)[1]/2) 13 | test = (dim(Boston)[1]/2+1):dim(Boston)[1] 14 | Boston.train = Boston[train,] 15 | Boston.test = Boston[test,] 16 | crime01.test = crime01[test] 17 | ``` 18 | 19 | ```{r} 20 | # logistic regression 21 | glm.fit = glm(crime01~.-crime01-crim, 22 | data=Boston, family=binomial, subset=train) 23 | glm.probs = predict(glm.fit, Boston.test, type="response") 24 | glm.pred = rep(0, length(glm.probs)) 25 | glm.pred[glm.probs > 0.5] = 1 26 | mean(glm.pred != crime01.test) 27 | ``` 28 | 18.2% test error rate. 29 | 30 | ```{r} 31 | glm.fit = glm(crime01~.-crime01-crim-chas-tax, 32 | data=Boston, family=binomial, subset=train) 33 | glm.probs = predict(glm.fit, Boston.test, type="response") 34 | glm.pred = rep(0, length(glm.probs)) 35 | glm.pred[glm.probs > 0.5] = 1 36 | mean(glm.pred != crime01.test) 37 | ``` 38 | 18.6% test error rate. 39 | 40 | ```{r} 41 | # LDA 42 | lda.fit = lda(crime01~.-crime01-crim, data=Boston, subset=train) 43 | lda.pred = predict(lda.fit, Boston.test) 44 | mean(lda.pred$class != crime01.test) 45 | ``` 46 | 13.4% test error rate. 47 | 48 | ```{r} 49 | lda.fit = lda(crime01~.-crime01-crim-chas-tax, data=Boston, subset=train) 50 | lda.pred = predict(lda.fit, Boston.test) 51 | mean(lda.pred$class != crime01.test) 52 | ``` 53 | 12.3% test error rate. 54 | 55 | ```{r} 56 | lda.fit = lda(crime01~.-crime01-crim-chas-tax-lstat-indus-age, 57 | data=Boston, subset=train) 58 | lda.pred = predict(lda.fit, Boston.test) 59 | mean(lda.pred$class != crime01.test) 60 | ``` 61 | 11.9% test error rate. 62 | 63 | ```{r} 64 | # KNN 65 | library(class) 66 | train.X = cbind(zn, indus, chas, nox, rm, age, dis, rad, tax, ptratio, black, 67 | lstat, medv)[train,] 68 | test.X = cbind(zn, indus, chas, nox, rm, age, dis, rad, tax, ptratio, black, 69 | lstat, medv)[test,] 70 | train.crime01 = crime01[train] 71 | set.seed(1) 72 | # KNN(k=1) 73 | knn.pred = knn(train.X, test.X, train.crime01, k=1) 74 | mean(knn.pred != crime01.test) 75 | ``` 76 | 45.8% test error rate. 77 | 78 | ```{r} 79 | # KNN(k=10) 80 | knn.pred = knn(train.X, test.X, train.crime01, k=10) 81 | mean(knn.pred != crime01.test) 82 | ``` 83 | 11.1% test error rate. 84 | 85 | ```{r} 86 | # KNN(k=100) 87 | knn.pred = knn(train.X, test.X, train.crime01, k=100) 88 | mean(knn.pred != crime01.test) 89 | ``` 90 | 49.0% test error rate. 91 | 92 | ```{r} 93 | # KNN(k=10) with subset of variables 94 | train.X = cbind(zn, nox, rm, dis, rad, ptratio, black, medv)[train,] 95 | test.X = cbind(zn, nox, rm, dis, rad, ptratio, black, medv)[test,] 96 | knn.pred = knn(train.X, test.X, train.crime01, k=10) 97 | mean(knn.pred != crime01.test) 98 | ``` 99 | 28.5% test error rate. 100 | -------------------------------------------------------------------------------- /ch4/2.Rmd: -------------------------------------------------------------------------------- 1 | 2 2 | ======================================================== 3 | 4 | Assuming that $f_k(x)$ is normal, the probability that an observation $x$ is in class $k$ is given by 5 | $$ 6 | p_k(x) = \frac {\pi_k 7 | \frac {1} {\sqrt{2 \pi} \sigma} 8 | \exp(- \frac {1} {2 \sigma^2} (x - \mu_k)^2) 9 | } 10 | {\sum { 11 | \pi_l 12 | \frac {1} {\sqrt{2 \pi} \sigma} 13 | \exp(- \frac {1} {2 \sigma^2} (x - \mu_l)^2) 14 | }} 15 | $$ 16 | while the discriminant function is given by 17 | $$ 18 | \delta_k(x) = x \frac {\mu_k} {\sigma^2} - \frac {\mu_k^2} {2 \sigma^2} 19 | + \log(\pi_k) 20 | $$ 21 | 22 | *Claim: Maximizing $p_k(x)$ is equivalent to maximizing $\delta_k(x)$.* 23 | 24 | *Proof.* Let $x$ remain fixed and observe that we are maximizing over the parameter $k$. Suppose that $\delta_k(x) \geq \delta_i(x)$. We will show that $f_k(x) \geq f_i(x)$. From our assumption we have 25 | $$ 26 | x \frac {\mu_k} {\sigma^2} - \frac {\mu_k^2} {2 \sigma^2} + \log(\pi_k) 27 | \geq 28 | x \frac {\mu_i} {\sigma^2} - \frac {\mu_i^2} {2 \sigma^2} + \log(\pi_i). 29 | $$ 30 | Exponentiation is a monotonically increasing function, so the following inequality holds 31 | $$ 32 | \pi_k \exp (x \frac {\mu_k} {\sigma^2} - \frac {\mu_k^2} {2 \sigma^2}) 33 | \geq 34 | \pi_i \exp (x \frac {\mu_i} {\sigma^2} - \frac {\mu_i^2} {2 \sigma^2}) 35 | $$ 36 | Multipy this inequality by the positive constant 37 | $$ 38 | c = \frac { 39 | \frac {1} {\sqrt{2 \pi} \sigma} 40 | \exp(- \frac {1} {2 \sigma^2} x^2) 41 | } 42 | {\sum { 43 | \pi_l 44 | \frac {1} {\sqrt{2 \pi} \sigma} 45 | \exp(- \frac {1} {2 \sigma^2} (x - \mu_l)^2) 46 | }} 47 | $$ 48 | and we have that 49 | $$ 50 | 51 | \frac {\pi_k 52 | \frac {1} {\sqrt{2 \pi} \sigma} 53 | \exp(- \frac {1} {2 \sigma^2} (x - \mu_k)^2) 54 | } 55 | {\sum { 56 | \pi_l 57 | \frac {1} {\sqrt{2 \pi} \sigma} 58 | \exp(- \frac {1} {2 \sigma^2} (x - \mu_l)^2) 59 | }} 60 | 61 | \geq 62 | 63 | \frac {\pi_i 64 | \frac {1} {\sqrt{2 \pi} \sigma} 65 | \exp(- \frac {1} {2 \sigma^2} (x - \mu_i)^2) 66 | } 67 | {\sum { 68 | \pi_l 69 | \frac {1} {\sqrt{2 \pi} \sigma} 70 | \exp(- \frac {1} {2 \sigma^2} (x - \mu_l)^2) 71 | }} 72 | $$ 73 | or equivalently, $f_k(x) \geq f_i(x)$. Reversing these steps also holds, so we have that maximizing $\delta_k$ is equivalent to maximizing $p_k$. -------------------------------------------------------------------------------- /ch4/2.md: -------------------------------------------------------------------------------- 1 | 2 2 | ======================================================== 3 | 4 | Assuming that $f_k(x)$ is normal, the probability that an observation $x$ is in class $k$ is given by 5 | $$ 6 | p_k(x) = \frac {\pi_k 7 | \frac {1} {\sqrt{2 \pi} \sigma} 8 | \exp(- \frac {1} {2 \sigma^2} (x - \mu_k)^2) 9 | } 10 | {\sum { 11 | \pi_l 12 | \frac {1} {\sqrt{2 \pi} \sigma} 13 | \exp(- \frac {1} {2 \sigma^2} (x - \mu_l)^2) 14 | }} 15 | $$ 16 | while the discriminant function is given by 17 | $$ 18 | \delta_k(x) = x \frac {\mu_k} {\sigma^2} - \frac {\mu_k^2} {2 \sigma^2} 19 | + \log(\pi_k) 20 | $$ 21 | 22 | *Claim: Maximizing $p_k(x)$ is equivalent to maximizing $\delta_k(x)$.* 23 | 24 | *Proof.* Let $x$ remain fixed and observe that we are maximizing over the parameter $k$. Suppose that $\delta_k(x) \geq \delta_i(x)$. We will show that $f_k(x) \geq f_i(x)$. From our assumption we have 25 | $$ 26 | x \frac {\mu_k} {\sigma^2} - \frac {\mu_k^2} {2 \sigma^2} + \log(\pi_k) 27 | \geq 28 | x \frac {\mu_i} {\sigma^2} - \frac {\mu_i^2} {2 \sigma^2} + \log(\pi_i). 29 | $$ 30 | Exponentiation is a monotonically increasing function, so the following inequality holds 31 | $$ 32 | \pi_k \exp (x \frac {\mu_k} {\sigma^2} - \frac {\mu_k^2} {2 \sigma^2}) 33 | \geq 34 | \pi_i \exp (x \frac {\mu_i} {\sigma^2} - \frac {\mu_i^2} {2 \sigma^2}) 35 | $$ 36 | Multipy this inequality by the positive constant 37 | $$ 38 | c = \frac { 39 | \frac {1} {\sqrt{2 \pi} \sigma} 40 | \exp(- \frac {1} {2 \sigma^2} x^2) 41 | } 42 | {\sum { 43 | \pi_l 44 | \frac {1} {\sqrt{2 \pi} \sigma} 45 | \exp(- \frac {1} {2 \sigma^2} (x - \mu_l)^2) 46 | }} 47 | $$ 48 | and we have that 49 | $$ 50 | 51 | \frac {\pi_k 52 | \frac {1} {\sqrt{2 \pi} \sigma} 53 | \exp(- \frac {1} {2 \sigma^2} (x - \mu_k)^2) 54 | } 55 | {\sum { 56 | \pi_l 57 | \frac {1} {\sqrt{2 \pi} \sigma} 58 | \exp(- \frac {1} {2 \sigma^2} (x - \mu_l)^2) 59 | }} 60 | 61 | \geq 62 | 63 | \frac {\pi_i 64 | \frac {1} {\sqrt{2 \pi} \sigma} 65 | \exp(- \frac {1} {2 \sigma^2} (x - \mu_i)^2) 66 | } 67 | {\sum { 68 | \pi_l 69 | \frac {1} {\sqrt{2 \pi} \sigma} 70 | \exp(- \frac {1} {2 \sigma^2} (x - \mu_l)^2) 71 | }} 72 | $$ 73 | or equivalently, $f_k(x) \geq f_i(x)$. Reversing these steps also holds, so we have that maximizing $\delta_k$ is equivalent to maximizing $p_k$. 74 | -------------------------------------------------------------------------------- /ch4/3.Rmd: -------------------------------------------------------------------------------- 1 | 3 2 | ======================================================== 3 | 4 | $$ 5 | p_k(x) = \frac {\pi_k 6 | \frac {1} {\sqrt{2 \pi} \sigma_k} 7 | \exp(- \frac {1} {2 \sigma_k^2} (x - \mu_k)^2) 8 | } 9 | {\sum { 10 | \pi_l 11 | \frac {1} {\sqrt{2 \pi} \sigma_l} 12 | \exp(- \frac {1} {2 \sigma_l^2} (x - \mu_l)^2) 13 | }} 14 | \\ 15 | \log(p_k(x)) = \frac {\log(\pi_k) + 16 | \log(\frac {1} {\sqrt{2 \pi} \sigma_k}) + 17 | - \frac {1} {2 \sigma_k^2} (x - \mu_k)^2 18 | } 19 | {\log(\sum { 20 | \pi_l 21 | \frac {1} {\sqrt{2 \pi} \sigma_l} 22 | \exp(- \frac {1} {2 \sigma_l^2} (x - \mu_l)^2) 23 | })} 24 | \\ 25 | \log(p_k(x)) 26 | \log(\sum { 27 | \pi_l 28 | \frac {1} {\sqrt{2 \pi} \sigma_l} 29 | \exp(- \frac {1} {2 \sigma_l^2} (x - \mu_l)^2) 30 | }) 31 | = \log(\pi_k) + 32 | \log(\frac {1} {\sqrt{2 \pi} \sigma_k}) + 33 | - \frac {1} {2 \sigma_k^2} (x - \mu_k)^2 34 | \\ 35 | \delta(x) 36 | = \log(\pi_k) + 37 | \log(\frac {1} {\sqrt{2 \pi} \sigma_k}) + 38 | - \frac {1} {2 \sigma_k^2} (x - \mu_k)^2 39 | $$ 40 | 41 | As you can see, $\delta(x)$ is a quadratic function of $x$. 42 | 43 | -------------------------------------------------------------------------------- /ch4/3.md: -------------------------------------------------------------------------------- 1 | 3 2 | ======================================================== 3 | 4 | $$ 5 | p_k(x) = \frac {\pi_k 6 | \frac {1} {\sqrt{2 \pi} \sigma_k} 7 | \exp(- \frac {1} {2 \sigma_k^2} (x - \mu_k)^2) 8 | } 9 | {\sum { 10 | \pi_l 11 | \frac {1} {\sqrt{2 \pi} \sigma_l} 12 | \exp(- \frac {1} {2 \sigma_l^2} (x - \mu_l)^2) 13 | }} 14 | \\ 15 | \log(p_k(x)) = \frac {\log(\pi_k) + 16 | \log(\frac {1} {\sqrt{2 \pi} \sigma_k}) + 17 | - \frac {1} {2 \sigma_k^2} (x - \mu_k)^2 18 | } 19 | {\log(\sum { 20 | \pi_l 21 | \frac {1} {\sqrt{2 \pi} \sigma_l} 22 | \exp(- \frac {1} {2 \sigma_l^2} (x - \mu_l)^2) 23 | })} 24 | \\ 25 | \log(p_k(x)) 26 | \log(\sum { 27 | \pi_l 28 | \frac {1} {\sqrt{2 \pi} \sigma_l} 29 | \exp(- \frac {1} {2 \sigma_l^2} (x - \mu_l)^2) 30 | }) 31 | = \log(\pi_k) + 32 | \log(\frac {1} {\sqrt{2 \pi} \sigma_k}) + 33 | - \frac {1} {2 \sigma_k^2} (x - \mu_k)^2 34 | \\ 35 | \delta(x) 36 | = \log(\pi_k) + 37 | \log(\frac {1} {\sqrt{2 \pi} \sigma_k}) + 38 | - \frac {1} {2 \sigma_k^2} (x - \mu_k)^2 39 | $$ 40 | 41 | As you can see, $\delta(x)$ is a quadratic function of $x$. 42 | 43 | -------------------------------------------------------------------------------- /ch4/4.Rmd: -------------------------------------------------------------------------------- 1 | 4 2 | ======================================================== 3 | 4 | ### a. 5 | On average, 10%. For simplicity, ignoring cases when X < 0.05 and X > 0.95. 6 | 7 | ### b. 8 | On average, 1% 9 | 10 | ### c. 11 | On average, $0.10^{100} * 100 = 10^{-98}$%. 12 | 13 | ### d. 14 | As $p$ increases linear, observations that are geometrically near decrease 15 | exponentially. 16 | 17 | ### e. 18 | $$ 19 | p = 1, l = 0.10 20 | \\ 21 | p = 2, l = \sqrt{0.10} ~ 0.32 22 | \\ 23 | p = 3, l = 0.10^{1/3} ~ 0.46 24 | \\ 25 | ... 26 | \\ 27 | p = N, l = 0.10^{1/N} 28 | $$ 29 | -------------------------------------------------------------------------------- /ch4/4.md: -------------------------------------------------------------------------------- 1 | 4 2 | ======================================================== 3 | 4 | ### a. 5 | On average, 10%. For simplicity, ignoring cases when X < 0.05 and X > 0.95. 6 | 7 | ### b. 8 | On average, 1% 9 | 10 | ### c. 11 | On average, $0.10^{100} * 100 = 10^{-98}$%. 12 | 13 | ### d. 14 | As $p$ increases linear, observations that are geometrically near decrease 15 | exponentially. 16 | 17 | ### e. 18 | $$ 19 | p = 1, l = 0.10 20 | \\ 21 | p = 2, l = \sqrt{0.10} ~ 0.32 22 | \\ 23 | p = 3, l = 0.10^{1/3} ~ 0.46 24 | \\ 25 | ... 26 | \\ 27 | p = N, l = 0.10^{1/N} 28 | $$ 29 | -------------------------------------------------------------------------------- /ch4/5.Rmd: -------------------------------------------------------------------------------- 1 | 5 2 | ======================================================== 3 | 4 | ### a. 5 | If the Bayes decision boundary is linear, we expect QDA to perform better on the 6 | training set because it's higher flexiblity will yield a closer fit. On the test 7 | set, we expect LDA to perform better than QDA because QDA could overfit the 8 | linearity of the Bayes decision boundary. 9 | 10 | ### b. 11 | If the Bayes decision bounary is non-linear, we expect QDA to perform better 12 | both on the training and test sets. 13 | 14 | ### c. 15 | We expect the test prediction accuracy of QDA relative to LDA to improve, in 16 | general, as the the sample size $n$ increases because a more flexibile method 17 | will yield a better fit as more samples can be fit and variance is offset by 18 | the larger sample sizes. 19 | 20 | ### d. 21 | False. With fewer sample points, the variance from using a more flexible method, 22 | such as QDA, would lead to overfit, yielding a higher test rate than LDA. 23 | -------------------------------------------------------------------------------- /ch4/5.md: -------------------------------------------------------------------------------- 1 | 5 2 | ======================================================== 3 | 4 | ### a. 5 | If the Bayes decision boundary is linear, we expect QDA to perform better on the 6 | training set because it's higher flexiblity will yield a closer fit. On the test 7 | set, we expect LDA to perform better than QDA because QDA could overfit the 8 | linearity of the Bayes decision boundary. 9 | 10 | ### b. 11 | If the Bayes decision bounary is non-linear, we expect QDA to perform better 12 | both on the training and test sets. 13 | 14 | ### c. 15 | We expect the test prediction accuracy of QDA relative to LDA to improve, in 16 | general, as the the sample size $n$ increases because a more flexibile method 17 | will yield a better fit as more samples can be fit and variance is offset by 18 | the larger sample sizes. 19 | 20 | ### d. 21 | False. With fewer sample points, the variance from using a more flexible method, 22 | such as QDA, would lead to overfit, yielding a higher test rate than LDA. 23 | -------------------------------------------------------------------------------- /ch4/6.Rmd: -------------------------------------------------------------------------------- 1 | 6 2 | ======================================================== 3 | 4 | $$ 5 | p(X) = \frac {\exp(\beta_0 + \beta_1 X_1 + \beta_2 X_2)} 6 | {1 + \exp(\beta_0 + \beta_1 X_1 + \beta_2 X_2)} 7 | \\ 8 | X_1 = hours studied, X_2 = undergrad GPA 9 | \\ 10 | \beta_0 = -6, \beta_1 = 0.05, \beta_2 = 1 11 | $$ 12 | 13 | ### a. 14 | $$ 15 | X = [40 hours, 3.5 GPA] 16 | \\ 17 | p(X) = \frac {\exp(-6 + 0.05 X_1 + X_2)} {1 + \exp(-6 + 0.05 X_1 + X_2)} 18 | \\ 19 | = \frac {\exp(-6 + 0.05 40 + 3.5)} {1 + \exp(-6 + 0.05 40 + 3.5)} 20 | \\ 21 | = \frac {\exp(-0.5)} {1 + \exp(-0.5)} 22 | \\ 23 | = 37.75\% 24 | $$ 25 | 26 | ### b. 27 | $$ 28 | X = [X_1 hours, 3.5 GPA] 29 | \\ 30 | p(X) = \frac {\exp(-6 + 0.05 X_1 + X_2)} {1 + \exp(-6 + 0.05 X_1 + X_2)} 31 | \\ 32 | 0.50 = \frac {\exp(-6 + 0.05 X_1 + 3.5)} {1 + \exp(-6 + 0.05 X_1 + 3.5)} 33 | \\ 34 | 0.50 (1 + \exp(-2.5 + 0.05 X_1)) = \exp(-2.5 + 0.05 X_1) 35 | \\ 36 | 0.50 + 0.50 \exp(-2.5 + 0.05 X_1)) = \exp(-2.5 + 0.05 X_1) 37 | \\ 38 | 0.50 = 0.50 \exp(-2.5 + 0.05 X_1) 39 | \\ 40 | \log(1) = -2.5 + 0.05 X_1 41 | \\ 42 | X_1 = 2.5 / 0.05 = 50 hours 43 | $$ 44 | 45 | 46 | -------------------------------------------------------------------------------- /ch4/6.md: -------------------------------------------------------------------------------- 1 | 6 2 | ======================================================== 3 | 4 | $$ 5 | p(X) = \frac {\exp(\beta_0 + \beta_1 X_1 + \beta_2 X_2)} 6 | {1 + \exp(\beta_0 + \beta_1 X_1 + \beta_2 X_2)} 7 | \\ 8 | X_1 = hours studied, X_2 = undergrad GPA 9 | \\ 10 | \beta_0 = -6, \beta_1 = 0.05, \beta_2 = 1 11 | $$ 12 | 13 | ### a. 14 | $$ 15 | X = [40 hours, 3.5 GPA] 16 | \\ 17 | p(X) = \frac {\exp(-6 + 0.05 X_1 + X_2)} {1 + \exp(-6 + 0.05 X_1 + X_2)} 18 | \\ 19 | = \frac {\exp(-6 + 0.05 40 + 3.5)} {1 + \exp(-6 + 0.05 40 + 3.5)} 20 | \\ 21 | = \frac {\exp(-0.5)} {1 + \exp(-0.5)} 22 | \\ 23 | = 37.75\% 24 | $$ 25 | 26 | ### b. 27 | $$ 28 | X = [X_1 hours, 3.5 GPA] 29 | \\ 30 | p(X) = \frac {\exp(-6 + 0.05 X_1 + X_2)} {1 + \exp(-6 + 0.05 X_1 + X_2)} 31 | \\ 32 | 0.50 = \frac {\exp(-6 + 0.05 X_1 + 3.5)} {1 + \exp(-6 + 0.05 X_1 + 3.5)} 33 | \\ 34 | 0.50 (1 + \exp(-2.5 + 0.05 X_1)) = \exp(-2.5 + 0.05 X_1) 35 | \\ 36 | 0.50 + 0.50 \exp(-2.5 + 0.05 X_1)) = \exp(-2.5 + 0.05 X_1) 37 | \\ 38 | 0.50 = 0.50 \exp(-2.5 + 0.05 X_1) 39 | \\ 40 | \log(1) = -2.5 + 0.05 X_1 41 | \\ 42 | X_1 = 2.5 / 0.05 = 50 hours 43 | $$ 44 | 45 | 46 | -------------------------------------------------------------------------------- /ch4/7.Rmd: -------------------------------------------------------------------------------- 1 | 7 2 | ======================================================== 3 | $$ 4 | p_k(x) = \frac {\pi_k 5 | \frac {1} {\sqrt{2 \pi} \sigma} 6 | \exp(- \frac {1} {2 \sigma^2} (x - \mu_k)^2) 7 | } 8 | {\sum { 9 | \pi_l 10 | \frac {1} {\sqrt{2 \pi} \sigma} 11 | \exp(- \frac {1} {2 \sigma^2} (x - \mu_l)^2) 12 | }} 13 | \\ 14 | p_{yes}(x)= \frac {\pi_{yes} 15 | \exp(- \frac {1} {2 \sigma^2} (x - \mu_{yes})^2) 16 | } 17 | {\sum { 18 | \pi_l 19 | \exp(- \frac {1} {2 \sigma^2} (x - \mu_l)^2) 20 | }} 21 | \\ 22 | = \frac {\pi_{yes} \exp(- \frac {1} {2 \sigma^2} (x - \mu_{yes})^2)} 23 | { 24 | \pi_{yes} \exp(- \frac {1} {2 \sigma^2} (x - \mu_{yes})^2) + 25 | \pi_{no} \exp(- \frac {1} {2 \sigma^2} (x - \mu_{no})^2) 26 | } 27 | \\ 28 | = \frac {0.80 \exp(- \frac {1} {2 * 36} (x - 10)^2)} 29 | { 30 | 0.80 \exp(- \frac {1} {2 * 36} (x - 10)^2) + 31 | 0.20 \exp(- \frac {1} {2 * 36} x^2) 32 | } 33 | \\ 34 | p_{yes}(4) = \frac {0.80 \exp(- \frac {1} {2 * 36} (4 - 10)^2)} 35 | { 36 | 0.80 \exp(- \frac {1} {2 * 36} (4 - 10)^2) + 37 | 0.20 \exp(- \frac {1} {2 * 36} 4^2) 38 | } 39 | = 75.2\% 40 | $$ 41 | -------------------------------------------------------------------------------- /ch4/7.md: -------------------------------------------------------------------------------- 1 | 7 2 | ======================================================== 3 | $$ 4 | p_k(x) = \frac {\pi_k 5 | \frac {1} {\sqrt{2 \pi} \sigma} 6 | \exp(- \frac {1} {2 \sigma^2} (x - \mu_k)^2) 7 | } 8 | {\sum { 9 | \pi_l 10 | \frac {1} {\sqrt{2 \pi} \sigma} 11 | \exp(- \frac {1} {2 \sigma^2} (x - \mu_l)^2) 12 | }} 13 | \\ 14 | = \frac {\pi_{yes} 15 | \exp(- \frac {1} {2 \sigma^2} (x - \mu_{yes})^2) 16 | } 17 | {\sum { 18 | \pi_l 19 | \exp(- \frac {1} {2 \sigma^2} (x - \mu_l)^2) 20 | }} 21 | \\ 22 | = \frac {\pi_{yes} \exp(- \frac {1} {2 \sigma^2} (x - \mu_{yes})^2)} 23 | { 24 | \pi_{yes} \exp(- \frac {1} {2 \sigma^2} (x - \mu_{yes})^2) + 25 | \pi_{no} \exp(- \frac {1} {2 \sigma^2} (x - \mu_{no})^2) 26 | } 27 | \\ 28 | = \frac {0.80 \exp(- \frac {1} {2 * 36} (x - 10)^2)} 29 | { 30 | 0.80 \exp(- \frac {1} {2 * 36} (x - 10)^2) + 31 | 0.20 \exp(- \frac {1} {2 * 36} x^2) 32 | } 33 | \\ 34 | p_{yes}(4) = \frac {0.80 \exp(- \frac {1} {2 * 36} (4 - 10)^2)} 35 | { 36 | 0.80 \exp(- \frac {1} {2 * 36} (4 - 10)^2) + 37 | 0.20 \exp(- \frac {1} {2 * 36} 4^2) 38 | } 39 | = 75.2\% 40 | $$ 41 | -------------------------------------------------------------------------------- /ch4/8.Rmd: -------------------------------------------------------------------------------- 1 | 8 2 | ======================================================== 3 | 4 | Given: 5 | 6 | Logistic regression: 20% training error rate, 30% test error rate 7 | KNN(K=1): average error rate of 18% 8 | 9 | For KNN with K=1, the training error rate is 0% because for any training 10 | observation, its nearest neighbor will be the response itself. So, KNN has a 11 | test error rate of 36%. I would choose logistic regression because of its lower 12 | test error rate of 30%. -------------------------------------------------------------------------------- /ch4/8.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 8 9 | 10 | 137 | 138 | 139 | 140 | 141 | 142 | 143 | 144 | 145 |

8

146 | 147 |

Given:

148 | 149 |

Logistic regression: 20% training error rate, 30% test error rate 150 | KNN(K=1): average error rate of 18%

151 | 152 |

For KNN with K=1, the training error rate is 0% because for any training 153 | observation, its nearest neighbor will be the response itself. So, KNN has a 154 | test error rate of 36%. I would choose logistic regression because of its lower 155 | test error rate of 30%.

156 | 157 | 158 | 159 | 160 | 161 | -------------------------------------------------------------------------------- /ch4/8.md: -------------------------------------------------------------------------------- 1 | 8 2 | ======================================================== 3 | 4 | Given: 5 | 6 | Logistic regression: 20% training error rate, 30% test error rate 7 | KNN(K=1): average error rate of 18% 8 | 9 | For KNN with K=1, the training error rate is 0% because for any training 10 | observation, its nearest neighbor will be the response itself. So, KNN has a 11 | test error rate of 36%. I would choose logistic regression because of its lower 12 | test error rate of 30%. 13 | -------------------------------------------------------------------------------- /ch4/9.Rmd: -------------------------------------------------------------------------------- 1 | 9 2 | ======================================================== 3 | 4 | ### a. 5 | 6 | $$ 7 | \frac {p(X)} {1 - p(X)} = 0.37 8 | \\ 9 | p(X) = 0.37 (1 - p(X)) 10 | \\ 11 | 1.37 p(X) = 0.37 12 | \\ 13 | p(X) = \frac {0.37} {1.37} = 27\% 14 | $$ 15 | 16 | ### b. 17 | $$ 18 | odds = \frac {p(X)} {1 - p(X)} = .16 / .84 = 0.19 19 | $$ -------------------------------------------------------------------------------- /ch4/9.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 9 8 | 9 | 136 | 137 | 138 | 139 | 140 | 142 | 143 | 144 | 145 | 146 | 147 | 148 |

9

149 | 150 |

a.

151 | 152 |

\[ 153 | \frac {p(X)} {1 - p(X)} = 0.37 154 | \\ 155 | p(X) = 0.37 (1 - p(X)) 156 | \\ 157 | 1.37 p(X) = 0.37 158 | \\ 159 | p(X) = \frac {0.37} {1.37} = 27\% 160 | \]

161 | 162 |

b.

163 | 164 |

\[ 165 | odds = \frac {p(X)} {1 - p(X)} = .16 / .84 = 0.19 166 | \]

167 | 168 | 169 | 170 | 171 | 172 | -------------------------------------------------------------------------------- /ch4/9.md: -------------------------------------------------------------------------------- 1 | 9 2 | ======================================================== 3 | 4 | ### a. 5 | 6 | $$ 7 | \frac {p(X)} {1 - p(X)} = 0.37 8 | \\ 9 | p(X) = 0.37 (1 - p(X)) 10 | \\ 11 | 1.37 p(X) = 0.37 12 | \\ 13 | p(X) = \frac {0.37} {1.37} = 27\% 14 | $$ 15 | 16 | ### b. 17 | $$ 18 | odds = \frac {p(X)} {1 - p(X)} = .16 / .84 = 0.19 19 | $$ 20 | -------------------------------------------------------------------------------- /ch4/figure/10a.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch4/figure/10a.png -------------------------------------------------------------------------------- /ch4/figure/11b.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch4/figure/11b.png -------------------------------------------------------------------------------- /ch4/figure/12e.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch4/figure/12e.png -------------------------------------------------------------------------------- /ch4/figure/12f.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch4/figure/12f.png -------------------------------------------------------------------------------- /ch5/1.Rmd: -------------------------------------------------------------------------------- 1 | Chapter 5: Exercise 1 2 | ======================================================== 3 | 4 | Using the following rules: 5 | 6 | $$ 7 | Var(X+Y) = Var(X) + Var(Y) + 2 Cov(X,Y) 8 | \\ 9 | Var(cX) = c^2 Var(X) 10 | \\ 11 | Cov(cX,Y) = Cov(X,cY) = c Cov(X,Y) 12 | $$ 13 | 14 | Minimizing two-asset financial portfolio: 15 | $$ 16 | Var(\alpha X + (1 - \alpha)Y) 17 | \\ 18 | = Var(\alpha X) + Var((1 - \alpha) Y) + 2 Cov(\alpha X, (1 - \alpha) Y) 19 | \\ 20 | = \alpha^2 Var(X) + (1 - \alpha)^2 Var(Y) + 2 \alpha (1 - \alpha) Cov(X, Y) 21 | \\ 22 | = \sigma_X^2 \alpha^2 + \sigma_Y^2 (1 - \alpha)^2 + 2 \sigma_{XY} (-\alpha^2 + 23 | \alpha) 24 | $$ 25 | 26 | Take the first derivative to find critical points: 27 | $$ 28 | 0 = \frac {d} {d\alpha} f(\alpha) 29 | \\ 30 | 0 = 2 \sigma_X^2 \alpha + 2 \sigma_Y^2 (1 - \alpha) (-1) + 2 \sigma_{XY} 31 | (-2 \alpha + 1) 32 | \\ 33 | 0 = \sigma_X^2 \alpha + \sigma_Y^2 (\alpha - 1) + \sigma_{XY} (-2 \alpha + 1) 34 | \\ 35 | 0 = (\sigma_X^2 + \sigma_Y^2 - 2 \sigma_{XY}) \alpha - \sigma_Y^2 + \sigma_{XY} 36 | \\ 37 | \alpha = \frac {\sigma_Y^2 - \sigma_{XY}} 38 | {\sigma_X^2 + \sigma_Y^2 - 2 \sigma_{XY}} 39 | $$ 40 | 41 | -------------------------------------------------------------------------------- /ch5/1.md: -------------------------------------------------------------------------------- 1 | Chapter 5: Exercise 1 2 | ======================================================== 3 | 4 | Using the following rules: 5 | 6 | $$ 7 | Var(X+Y) = Var(X) + Var(Y) + 2 Cov(X,Y) 8 | \\ 9 | Var(cX) = c^2 Var(X) 10 | \\ 11 | Cov(cX,Y) = Cov(X,cY) = c Cov(X,Y) 12 | $$ 13 | 14 | Minimizing two-asset financial portfolio: 15 | $$ 16 | Var(\alpha X + (1 - \alpha)Y) 17 | \\ 18 | = Var(\alpha X) + Var((1 - \alpha) Y) + 2 Cov(\alpha X, (1 - \alpha) Y) 19 | \\ 20 | = \alpha^2 Var(X) + (1 - \alpha)^2 Var(Y) + 2 \alpha (1 - \alpha) Cov(X, Y) 21 | \\ 22 | = \sigma_X^2 \alpha^2 + \sigma_Y^2 (1 - \alpha)^2 + 2 \sigma_{XY} (-\alpha^2 + 23 | \alpha) 24 | $$ 25 | 26 | Take the first derivative to find critical points: 27 | $$ 28 | 0 = \frac {d} {d\alpha} f(\alpha) 29 | \\ 30 | 0 = 2 \sigma_X^2 \alpha + 2 \sigma_Y^2 (1 - \alpha) (-1) + 2 \sigma_{XY} 31 | (-2 \alpha + 1) 32 | \\ 33 | 0 = \sigma_X^2 \alpha + \sigma_Y^2 (\alpha - 1) + \sigma_{XY} (-2 \alpha + 1) 34 | \\ 35 | 0 = (\sigma_X^2 + \sigma_Y^2 - 2 \sigma_{XY}) \alpha - \sigma_Y^2 + \sigma_{XY} 36 | \\ 37 | \alpha = \frac {\sigma_Y^2 - \sigma_{XY}} 38 | {\sigma_X^2 + \sigma_Y^2 - 2 \sigma_{XY}} 39 | $$ 40 | 41 | -------------------------------------------------------------------------------- /ch5/2.Rmd: -------------------------------------------------------------------------------- 1 | Chapter 5: Exercise 2 2 | ======================================================== 3 | 4 | ### a 5 | $1 - 1/n$ 6 | 7 | ### b 8 | $1 - 1/n$ 9 | 10 | ### c 11 | In bootstrap, we sample with replacement so each observation in the bootstrap 12 | sample has the same 1/n (independent) chance of equaling the jth observation. 13 | Applying the product rule for a total of n observations gives us $(1 - 1/n)^n$. 14 | 15 | ### d 16 | $Pr(in) = 1 - Pr(out) = 1 - (1 - 1/5)^5 = 1 - (4/5)^5 = 67.2\%$ 17 | 18 | ### e 19 | $Pr(in) = 1 - Pr(out) = 1 - (1 - 1/100)^{10} = 1 - (99/100)^{100} = 63.4\%$ 20 | 21 | ### f 22 | $1 - (1 - 1/10000)^{10000} = 63.2\%$ 23 | 24 | ### g 25 | ```{r 2g} 26 | pr = function(n) return(1 - (1 - 1/n)^n) 27 | x = 1:100000 28 | plot(x, pr(x)) 29 | ``` 30 | The plot quickly reaches an asymptote of about 63.2%. 31 | 32 | ### h 33 | ```{r} 34 | set.seed(1) 35 | store = rep(NA, 1e4) 36 | for (i in 1:1e4) { 37 | store[i] = sum(sample(1:100, rep=T) == 4) > 0 38 | } 39 | mean(store) 40 | ``` 41 | The numerical results show an approximate mean probability of 64.1%, close 42 | to our theoretically derived result. 43 | 44 | -------------------------------------------------------------------------------- /ch5/2.md: -------------------------------------------------------------------------------- 1 | Chapter 5: Exercise 2 2 | ======================================================== 3 | 4 | ### a 5 | $1 - 1/n$ 6 | 7 | ### b 8 | $1 - 1/n$ 9 | 10 | ### c 11 | In bootstrap, we sample with replacement so each observation in the bootstrap 12 | sample has the same 1/n (independent) chance of equaling the jth observation. 13 | Applying the product rule for a total of n observations gives us $(1 - 1/n)^n$. 14 | 15 | ### d 16 | $Pr(in) = 1 - Pr(out) = 1 - (1 - 1/5)^5 = 1 - (4/5)^5 = 67.2\%$ 17 | 18 | ### e 19 | $Pr(in) = 1 - Pr(out) = 1 - (1 - 1/100)^{10} = 1 - (99/100)^{100} = 63.4\%$ 20 | 21 | ### f 22 | $1 - (1 - 1/10000)^{10000} = 63.2\%$ 23 | 24 | ### g 25 | 26 | ```r 27 | pr = function(n) return(1 - (1 - 1/n)^n) 28 | x = 1:1e+05 29 | plot(x, pr(x)) 30 | ``` 31 | 32 | ![plot of chunk 2g](figure/2g.png) 33 | 34 | The plot quickly reaches an asymptote of about 63.2%. 35 | 36 | ### h 37 | 38 | 39 | 40 | -------------------------------------------------------------------------------- /ch5/3.Rmd: -------------------------------------------------------------------------------- 1 | Chapter 5: Exercise 3 2 | ======================================================== 3 | 4 | ### a 5 | k-fold cross-validation is implemented by taking the set of n observations and 6 | randomly splitting into k non-overlapping groups. Each of these groups acts as 7 | a validation set and the remainder as a training set. The test error is 8 | estimated by averaging the k resulting MSE estimates. 9 | 10 | ### b 11 | i. The validation set approach is conceptually simple and easily implemented as 12 | you are simply partitioning the existing training data into two sets. However, 13 | there are two drawbacks: (1.) the estimate of the test error rate can be highly 14 | variable depending on which observations are included in the training and 15 | validation sets. (2.) the validation set error rate may tend to overestimate 16 | the test error rate for the model fit on the entire data set. 17 | 18 | ii. LOOCV is a special case of k-fold cross-validation with k = n. Thus, LOOCV 19 | is the most computationally intense method since the model must be fit n times. 20 | Also, LOOCV has higher variance, but lower bias, than k-fold CV. 21 | -------------------------------------------------------------------------------- /ch5/3.md: -------------------------------------------------------------------------------- 1 | Chapter 5: Exercise 3 2 | ======================================================== 3 | 4 | ### a 5 | k-fold cross-validation is implemented by taking the set of n observations and 6 | randomly splitting into k non-overlapping groups. Each of these groups acts as 7 | a validation set and the remainder as a training set. The test error is 8 | estimated by averaging the k resulting MSE estimates. 9 | 10 | ### b 11 | i. The validation set approach is conceptually simple and easily implemented as 12 | you are simply partitioning the existing training data into two sets. However, 13 | there are two drawbacks: (1.) the estimate of the test error rate can be highly 14 | variable depending on which observations are included in the training and 15 | validation sets. (2.) the validation set error rate may tend to overestimate 16 | the test error rate for the model fit on the entire data set. 17 | 18 | ii. LOOCV is a special case of k-fold cross-validation with k = n. Thus, LOOCV 19 | is the most computationally intense method since the model must be fit n times. 20 | Also, LOOCV has higher variance, but lower bias, than k-fold CV. 21 | -------------------------------------------------------------------------------- /ch5/4.Rmd: -------------------------------------------------------------------------------- 1 | Chapter 5: Exercise 4 2 | ======================================================== 3 | 4 | If we suppose using some statistical learning method to make a prediction for 5 | the response $Y$ for a particular value of the predictor $X$ we might estimate 6 | the standard deviation of our prediction by using the bootstrap approach. The 7 | bootstrap approach works by repeatedly sampling observations (with replacement) 8 | from the original data set $B$ times, for some large value of $B$, each time 9 | fitting a new model and subsequently obtaining the RMSE of the estimates for all 10 | $B$ models. 11 | -------------------------------------------------------------------------------- /ch5/4.md: -------------------------------------------------------------------------------- 1 | Chapter 5: Exercise 4 2 | ======================================================== 3 | 4 | If we suppose using some statistical learning method to make a prediction for 5 | the response $Y$ for a particular value of the predictor $X$ we might estimate 6 | the standard deviation of our prediction by using the bootstrap approach. The 7 | bootstrap approach works by repeatedly sampling observations (with replacement) 8 | from the original data set $B$ times, for some large value of $B$, each time 9 | fitting a new model and subsequently obtaining the RMSE of the estimates for all 10 | $B$ models. 11 | -------------------------------------------------------------------------------- /ch5/5.Rmd: -------------------------------------------------------------------------------- 1 | Chapter 5: Exercise 5 2 | ======================================================== 3 | 4 | ```{r} 5 | library(ISLR) 6 | summary(Default) 7 | attach(Default) 8 | ``` 9 | 10 | ### a 11 | ```{r} 12 | set.seed(1) 13 | glm.fit = glm(default~income+balance, data=Default, family=binomial) 14 | ``` 15 | 16 | ### b 17 | ```{r} 18 | FiveB = function() { 19 | # i. 20 | train = sample(dim(Default)[1], dim(Default)[1]/2) 21 | # ii. 22 | glm.fit = glm(default~income+balance, data=Default, family=binomial, 23 | subset=train) 24 | # iii. 25 | glm.pred = rep("No", dim(Default)[1]/2) 26 | glm.probs = predict(glm.fit, Default[-train,], type="response") 27 | glm.pred[glm.probs>.5] = "Yes" 28 | # iv. 29 | return(mean(glm.pred != Default[-train,]$default)) 30 | } 31 | FiveB() 32 | ``` 33 | 2.86% test error rate from validation set approach. 34 | 35 | ### c 36 | ```{r} 37 | FiveB() 38 | FiveB() 39 | FiveB() 40 | ``` 41 | It seems to average around 2.6% test error rate. 42 | 43 | ### d 44 | ```{r} 45 | train = sample(dim(Default)[1], dim(Default)[1]/2) 46 | glm.fit = glm(default~income+balance+student, data=Default, family=binomial, 47 | subset=train) 48 | glm.pred = rep("No", dim(Default)[1]/2) 49 | glm.probs = predict(glm.fit, Default[-train,], type="response") 50 | glm.pred[glm.probs>.5] = "Yes" 51 | mean(glm.pred != Default[-train,]$default) 52 | ``` 53 | 2.64% test error rate, with student dummy variable. Using the validation set 54 | approach, it doesn't appear adding the student dummy variable leads to a 55 | reduction in the test error rate. 56 | 57 | -------------------------------------------------------------------------------- /ch5/5.md: -------------------------------------------------------------------------------- 1 | Chapter 5: Exercise 5 2 | ======================================================== 3 | 4 | 5 | ```r 6 | library(ISLR) 7 | summary(Default) 8 | ``` 9 | 10 | ``` 11 | ## default student balance income 12 | ## No :9667 No :7056 Min. : 0 Min. : 772 13 | ## Yes: 333 Yes:2944 1st Qu.: 482 1st Qu.:21340 14 | ## Median : 824 Median :34553 15 | ## Mean : 835 Mean :33517 16 | ## 3rd Qu.:1166 3rd Qu.:43808 17 | ## Max. :2654 Max. :73554 18 | ``` 19 | 20 | ```r 21 | attach(Default) 22 | ``` 23 | 24 | 25 | ### a 26 | 27 | ```r 28 | set.seed(1) 29 | glm.fit = glm(default ~ income + balance, data = Default, family = binomial) 30 | ``` 31 | 32 | 33 | ### b 34 | 35 | ```r 36 | FiveB = function() { 37 | # i. 38 | train = sample(dim(Default)[1], dim(Default)[1]/2) 39 | # ii. 40 | glm.fit = glm(default ~ income + balance, data = Default, family = binomial, 41 | subset = train) 42 | # iii. 43 | glm.pred = rep("No", dim(Default)[1]/2) 44 | glm.probs = predict(glm.fit, Default[-train, ], type = "response") 45 | glm.pred[glm.probs > 0.5] = "Yes" 46 | # iv. 47 | return(mean(glm.pred != Default[-train, ]$default)) 48 | } 49 | FiveB() 50 | ``` 51 | 52 | ``` 53 | ## [1] 0.0286 54 | ``` 55 | 56 | 2.86% test error rate from validation set approach. 57 | 58 | ### c 59 | 60 | ```r 61 | FiveB() 62 | ``` 63 | 64 | ``` 65 | ## [1] 0.0236 66 | ``` 67 | 68 | ```r 69 | FiveB() 70 | ``` 71 | 72 | ``` 73 | ## [1] 0.028 74 | ``` 75 | 76 | ```r 77 | FiveB() 78 | ``` 79 | 80 | ``` 81 | ## [1] 0.0268 82 | ``` 83 | 84 | It seems to average around 2.6% test error rate. 85 | 86 | ### d 87 | 88 | ```r 89 | train = sample(dim(Default)[1], dim(Default)[1]/2) 90 | glm.fit = glm(default ~ income + balance + student, data = Default, family = binomial, 91 | subset = train) 92 | glm.pred = rep("No", dim(Default)[1]/2) 93 | glm.probs = predict(glm.fit, Default[-train, ], type = "response") 94 | glm.pred[glm.probs > 0.5] = "Yes" 95 | mean(glm.pred != Default[-train, ]$default) 96 | ``` 97 | 98 | ``` 99 | ## [1] 0.0264 100 | ``` 101 | 102 | 2.64% test error rate, with student dummy variable. Using the validation set 103 | approach, it doesn't appear adding the student dummy variable leads to a 104 | reduction in the test error rate. 105 | 106 | -------------------------------------------------------------------------------- /ch5/6.Rmd: -------------------------------------------------------------------------------- 1 | Chapter 5: Exercise 6 2 | ======================================================== 3 | 4 | ```{r} 5 | library(ISLR) 6 | summary(Default) 7 | attach(Default) 8 | ``` 9 | 10 | ### a 11 | ```{r} 12 | set.seed(1) 13 | glm.fit = glm(default~income+balance, data=Default, family=binomial) 14 | summary(glm.fit) 15 | ``` 16 | 17 | ### b 18 | ```{r} 19 | boot.fn = function(data, index) 20 | return(coef(glm(default~income+balance, data=data, family=binomial, 21 | subset=index))) 22 | ``` 23 | 24 | ### c 25 | ```{r} 26 | library(boot) 27 | boot(Default, boot.fn, 50) 28 | ``` 29 | 30 | ### d 31 | Similar answers to the second and third significant digits. 32 | 33 | -------------------------------------------------------------------------------- /ch5/6.md: -------------------------------------------------------------------------------- 1 | Chapter 5: Exercise 6 2 | ======================================================== 3 | 4 | 5 | ```r 6 | library(ISLR) 7 | summary(Default) 8 | ``` 9 | 10 | ``` 11 | ## default student balance income 12 | ## No :9667 No :7056 Min. : 0 Min. : 772 13 | ## Yes: 333 Yes:2944 1st Qu.: 482 1st Qu.:21340 14 | ## Median : 824 Median :34553 15 | ## Mean : 835 Mean :33517 16 | ## 3rd Qu.:1166 3rd Qu.:43808 17 | ## Max. :2654 Max. :73554 18 | ``` 19 | 20 | ```r 21 | attach(Default) 22 | ``` 23 | 24 | 25 | ### a 26 | 27 | ```r 28 | set.seed(1) 29 | glm.fit = glm(default ~ income + balance, data = Default, family = binomial) 30 | summary(glm.fit) 31 | ``` 32 | 33 | ``` 34 | ## 35 | ## Call: 36 | ## glm(formula = default ~ income + balance, family = binomial, 37 | ## data = Default) 38 | ## 39 | ## Deviance Residuals: 40 | ## Min 1Q Median 3Q Max 41 | ## -2.473 -0.144 -0.057 -0.021 3.724 42 | ## 43 | ## Coefficients: 44 | ## Estimate Std. Error z value Pr(>|z|) 45 | ## (Intercept) -1.15e+01 4.35e-01 -26.54 <2e-16 *** 46 | ## income 2.08e-05 4.99e-06 4.17 3e-05 *** 47 | ## balance 5.65e-03 2.27e-04 24.84 <2e-16 *** 48 | ## --- 49 | ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 50 | ## 51 | ## (Dispersion parameter for binomial family taken to be 1) 52 | ## 53 | ## Null deviance: 2920.6 on 9999 degrees of freedom 54 | ## Residual deviance: 1579.0 on 9997 degrees of freedom 55 | ## AIC: 1585 56 | ## 57 | ## Number of Fisher Scoring iterations: 8 58 | ``` 59 | 60 | 61 | ### b 62 | 63 | ```r 64 | boot.fn = function(data, index) return(coef(glm(default ~ income + balance, 65 | data = data, family = binomial, subset = index))) 66 | ``` 67 | 68 | 69 | ### c 70 | 71 | ```r 72 | library(boot) 73 | boot(Default, boot.fn, 50) 74 | ``` 75 | 76 | ``` 77 | ## 78 | ## ORDINARY NONPARAMETRIC BOOTSTRAP 79 | ## 80 | ## 81 | ## Call: 82 | ## boot(data = Default, statistic = boot.fn, R = 50) 83 | ## 84 | ## 85 | ## Bootstrap Statistics : 86 | ## original bias std. error 87 | ## t1* -1.154e+01 1.181e-01 4.202e-01 88 | ## t2* 2.081e-05 -5.467e-08 4.542e-06 89 | ## t3* 5.647e-03 -6.975e-05 2.283e-04 90 | ``` 91 | 92 | 93 | ### d 94 | Similar answers to the second and third significant digits. 95 | 96 | -------------------------------------------------------------------------------- /ch5/7.Rmd: -------------------------------------------------------------------------------- 1 | Chapter 5: Exercise 7 2 | ======================================================== 3 | 4 | ```{r} 5 | library(ISLR) 6 | summary(Weekly) 7 | set.seed(1) 8 | attach(Weekly) 9 | ``` 10 | 11 | ### a 12 | ```{r} 13 | glm.fit = glm(Direction~Lag1+Lag2, data=Weekly, family=binomial) 14 | summary(glm.fit) 15 | ``` 16 | 17 | ### b 18 | ```{r} 19 | glm.fit = glm(Direction~Lag1+Lag2, data=Weekly[-1,], family=binomial) 20 | summary(glm.fit) 21 | ``` 22 | 23 | ### c 24 | ```{r} 25 | predict.glm(glm.fit, Weekly[1,], type="response") > 0.5 26 | ``` 27 | Prediction was UP, true Direction was DOWN. 28 | 29 | ### d 30 | ```{r} 31 | count = rep(0, dim(Weekly)[1]) 32 | for (i in 1:(dim(Weekly)[1])) { 33 | glm.fit = glm(Direction~Lag1+Lag2, data=Weekly[-i,], family=binomial) 34 | is_up = predict.glm(glm.fit, Weekly[i,], type="response") > 0.5 35 | is_true_up = Weekly[i,]$Direction == "Up" 36 | if (is_up != is_true_up) 37 | count[i] = 1 38 | } 39 | sum(count) 40 | ``` 41 | 490 errors. 42 | 43 | ### e 44 | ```{r} 45 | mean(count) 46 | ``` 47 | LOOCV estimates a test error rate of 45%. 48 | -------------------------------------------------------------------------------- /ch5/8.Rmd: -------------------------------------------------------------------------------- 1 | Chapter 5: Exercise 8 2 | ======================================================== 3 | 4 | ### a 5 | ```{r} 6 | set.seed(1) 7 | y = rnorm(100) 8 | x = rnorm(100) 9 | y = x - 2*x^2 + rnorm(100) 10 | ``` 11 | 12 | n = 100, p = 2. 13 | 14 | $Y = X - 2 X^2 + \epsilon$. 15 | 16 | ### b 17 | ```{r 8b} 18 | plot(x, y) 19 | ``` 20 | Quadratic plot. $X$ from about -2 to 2. $Y$ from about -8 to 2. 21 | 22 | ### c 23 | ```{r} 24 | library(boot) 25 | Data = data.frame(x,y) 26 | set.seed(1) 27 | # i. 28 | glm.fit = glm(y~x) 29 | cv.glm(Data, glm.fit)$delta 30 | # ii. 31 | glm.fit = glm(y~poly(x,2)) 32 | cv.glm(Data, glm.fit)$delta 33 | # iii. 34 | glm.fit = glm(y~poly(x,3)) 35 | cv.glm(Data, glm.fit)$delta 36 | # iv. 37 | glm.fit = glm(y~poly(x,4)) 38 | cv.glm(Data, glm.fit)$delta 39 | ``` 40 | 41 | ### d 42 | ```{r} 43 | set.seed(10) 44 | # i. 45 | glm.fit = glm(y~x) 46 | cv.glm(Data, glm.fit)$delta 47 | # ii. 48 | glm.fit = glm(y~poly(x,2)) 49 | cv.glm(Data, glm.fit)$delta 50 | # iii. 51 | glm.fit = glm(y~poly(x,3)) 52 | cv.glm(Data, glm.fit)$delta 53 | # iv. 54 | glm.fit = glm(y~poly(x,4)) 55 | cv.glm(Data, glm.fit)$delta 56 | ``` 57 | Exact same, because LOOCV will be the same since it evaluates n folds of a 58 | single observation. 59 | 60 | ### e 61 | The quadratic polynomial had the lowest LOOCV test error rate. This was 62 | expected because it matches the true form of $Y$. 63 | 64 | ### f 65 | ```{r} 66 | summary(glm.fit) 67 | ``` 68 | p-values show statistical significance of linear and quadratic terms, which 69 | agrees with the CV results. 70 | 71 | 72 | -------------------------------------------------------------------------------- /ch5/8.md: -------------------------------------------------------------------------------- 1 | Chapter 5: Exercise 8 2 | ======================================================== 3 | 4 | ### a 5 | 6 | ```r 7 | set.seed(1) 8 | y = rnorm(100) 9 | x = rnorm(100) 10 | y = x - 2 * x^2 + rnorm(100) 11 | ``` 12 | 13 | 14 | n = 100, p = 2. 15 | 16 | $Y = X - 2 X^2 + \epsilon$. 17 | 18 | ### b 19 | 20 | ```r 21 | plot(x, y) 22 | ``` 23 | 24 | ![plot of chunk 8b](figure/8b.png) 25 | 26 | Quadratic plot. $X$ from about -2 to 2. $Y$ from about -8 to 2. 27 | 28 | ### c 29 | 30 | ```r 31 | library(boot) 32 | Data = data.frame(x, y) 33 | set.seed(1) 34 | # i. 35 | glm.fit = glm(y ~ x) 36 | cv.glm(Data, glm.fit)$delta 37 | ``` 38 | 39 | ``` 40 | ## [1] 5.891 5.889 41 | ``` 42 | 43 | ```r 44 | # ii. 45 | glm.fit = glm(y ~ poly(x, 2)) 46 | cv.glm(Data, glm.fit)$delta 47 | ``` 48 | 49 | ``` 50 | ## [1] 1.087 1.086 51 | ``` 52 | 53 | ```r 54 | # iii. 55 | glm.fit = glm(y ~ poly(x, 3)) 56 | cv.glm(Data, glm.fit)$delta 57 | ``` 58 | 59 | ``` 60 | ## [1] 1.103 1.102 61 | ``` 62 | 63 | ```r 64 | # iv. 65 | glm.fit = glm(y ~ poly(x, 4)) 66 | cv.glm(Data, glm.fit)$delta 67 | ``` 68 | 69 | ``` 70 | ## [1] 1.115 1.114 71 | ``` 72 | 73 | 74 | ### d 75 | 76 | ```r 77 | set.seed(10) 78 | # i. 79 | glm.fit = glm(y ~ x) 80 | cv.glm(Data, glm.fit)$delta 81 | ``` 82 | 83 | ``` 84 | ## [1] 5.891 5.889 85 | ``` 86 | 87 | ```r 88 | # ii. 89 | glm.fit = glm(y ~ poly(x, 2)) 90 | cv.glm(Data, glm.fit)$delta 91 | ``` 92 | 93 | ``` 94 | ## [1] 1.087 1.086 95 | ``` 96 | 97 | ```r 98 | # iii. 99 | glm.fit = glm(y ~ poly(x, 3)) 100 | cv.glm(Data, glm.fit)$delta 101 | ``` 102 | 103 | ``` 104 | ## [1] 1.103 1.102 105 | ``` 106 | 107 | ```r 108 | # iv. 109 | glm.fit = glm(y ~ poly(x, 4)) 110 | cv.glm(Data, glm.fit)$delta 111 | ``` 112 | 113 | ``` 114 | ## [1] 1.115 1.114 115 | ``` 116 | 117 | Exact same, because LOOCV will be the same since it evaluates n folds of a 118 | single observation. 119 | 120 | ### e 121 | The quadratic polynomial had the lowest LOOCV test error rate. This was 122 | expected because it matches the true form of $Y$. 123 | 124 | ### f 125 | 126 | ```r 127 | summary(glm.fit) 128 | ``` 129 | 130 | ``` 131 | ## 132 | ## Call: 133 | ## glm(formula = y ~ poly(x, 4)) 134 | ## 135 | ## Deviance Residuals: 136 | ## Min 1Q Median 3Q Max 137 | ## -2.8913 -0.5244 0.0749 0.5932 2.7796 138 | ## 139 | ## Coefficients: 140 | ## Estimate Std. Error t value Pr(>|t|) 141 | ## (Intercept) -1.828 0.104 -17.55 <2e-16 *** 142 | ## poly(x, 4)1 2.316 1.041 2.22 0.029 * 143 | ## poly(x, 4)2 -21.059 1.041 -20.22 <2e-16 *** 144 | ## poly(x, 4)3 -0.305 1.041 -0.29 0.770 145 | ## poly(x, 4)4 -0.493 1.041 -0.47 0.637 146 | ## --- 147 | ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 148 | ## 149 | ## (Dispersion parameter for gaussian family taken to be 1.085) 150 | ## 151 | ## Null deviance: 552.21 on 99 degrees of freedom 152 | ## Residual deviance: 103.04 on 95 degrees of freedom 153 | ## AIC: 298.8 154 | ## 155 | ## Number of Fisher Scoring iterations: 2 156 | ``` 157 | 158 | p-values show statistical significance of linear and quadratic terms, which 159 | agrees with the CV results. 160 | 161 | 162 | -------------------------------------------------------------------------------- /ch5/9.Rmd: -------------------------------------------------------------------------------- 1 | Chapter 5: Exercise 9 2 | ======================================================== 3 | 4 | ```{r} 5 | library(MASS) 6 | summary(Boston) 7 | set.seed(1) 8 | attach(Boston) 9 | ``` 10 | 11 | ### a 12 | ```{r} 13 | medv.mean = mean(medv) 14 | medv.mean 15 | ``` 16 | 17 | ### b 18 | ```{r} 19 | medv.err = sd(medv) / sqrt(length(medv)) 20 | medv.err 21 | ``` 22 | 23 | ### c 24 | ```{r} 25 | boot.fn = function(data, index) return(mean(data[index])) 26 | library(boot) 27 | bstrap = boot(medv, boot.fn, 1000) 28 | bstrap 29 | ``` 30 | Similar to answer from (b) up to two significant digits. (0.4119 vs 0.4089) 31 | 32 | ### d 33 | ```{r} 34 | t.test(medv) 35 | c(bstrap$t0 - 2*0.4119, bstrap$t0 + 2*0.4119) 36 | ``` 37 | Bootstrap estimate only 0.02 away for t.test estimate. 38 | 39 | ### e 40 | ```{r} 41 | medv.med = median(medv) 42 | medv.med 43 | ``` 44 | 45 | ### f 46 | ```{r} 47 | boot.fn = function(data, index) return(median(data[index])) 48 | boot(medv, boot.fn, 1000) 49 | ``` 50 | Median of 21.2 with SE of 0.380. Small standard error relative to median value. 51 | 52 | ### g 53 | ```{r} 54 | medv.tenth = quantile(medv, c(0.1)) 55 | medv.tenth 56 | ``` 57 | 58 | ### h 59 | ```{r} 60 | boot.fn = function(data, index) return(quantile(data[index], c(0.1))) 61 | boot(medv, boot.fn, 1000) 62 | ``` 63 | Tenth-percentile of 12.75 with SE of 0.511. Small standard error relative to 64 | tenth-percentile value. 65 | -------------------------------------------------------------------------------- /ch5/figure/2g.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch5/figure/2g.png -------------------------------------------------------------------------------- /ch5/figure/8b.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch5/figure/8b.png -------------------------------------------------------------------------------- /ch5/lab.R: -------------------------------------------------------------------------------- 1 | # Chaper 5 Lab: Cross-Validation and the Bootstrap 2 | 3 | # The Validation Set Approach 4 | 5 | library(ISLR) 6 | set.seed(1) 7 | train=sample(392,196) 8 | lm.fit=lm(mpg~horsepower,data=Auto,subset=train) 9 | attach(Auto) 10 | mean((mpg-predict(lm.fit,Auto))[-train]^2) 11 | lm.fit2=lm(mpg~poly(horsepower,2),data=Auto,subset=train) 12 | mean((mpg-predict(lm.fit2,Auto))[-train]^2) 13 | lm.fit3=lm(mpg~poly(horsepower,3),data=Auto,subset=train) 14 | mean((mpg-predict(lm.fit3,Auto))[-train]^2) 15 | set.seed(2) 16 | train=sample(392,196) 17 | lm.fit=lm(mpg~horsepower,subset=train) 18 | mean((mpg-predict(lm.fit,Auto))[-train]^2) 19 | lm.fit2=lm(mpg~poly(horsepower,2),data=Auto,subset=train) 20 | mean((mpg-predict(lm.fit2,Auto))[-train]^2) 21 | lm.fit3=lm(mpg~poly(horsepower,3),data=Auto,subset=train) 22 | mean((mpg-predict(lm.fit3,Auto))[-train]^2) 23 | 24 | # Leave-One-Out Cross-Validation 25 | 26 | glm.fit=glm(mpg~horsepower,data=Auto) 27 | coef(glm.fit) 28 | lm.fit=lm(mpg~horsepower,data=Auto) 29 | coef(lm.fit) 30 | library(boot) 31 | glm.fit=glm(mpg~horsepower,data=Auto) 32 | cv.err=cv.glm(Auto,glm.fit) 33 | cv.err$delta 34 | cv.error=rep(0,5) 35 | for (i in 1:5){ 36 | glm.fit=glm(mpg~poly(horsepower,i),data=Auto) 37 | cv.error[i]=cv.glm(Auto,glm.fit)$delta[1] 38 | } 39 | cv.error 40 | 41 | # k-Fold Cross-Validation 42 | 43 | set.seed(17) 44 | cv.error.10=rep(0,10) 45 | for (i in 1:10){ 46 | glm.fit=glm(mpg~poly(horsepower,i),data=Auto) 47 | cv.error.10[i]=cv.glm(Auto,glm.fit,K=10)$delta[1] 48 | } 49 | cv.error.10 50 | 51 | # The Bootstrap 52 | 53 | alpha.fn=function(data,index){ 54 | X=data$X[index] 55 | Y=data$Y[index] 56 | return((var(Y)-cov(X,Y))/(var(X)+var(Y)-2*cov(X,Y))) 57 | } 58 | alpha.fn(Portfolio,1:100) 59 | set.seed(1) 60 | alpha.fn(Portfolio,sample(100,100,replace=T)) 61 | boot(Portfolio,alpha.fn,R=1000) 62 | 63 | # Estimating the Accuracy of a Linear Regression Model 64 | 65 | boot.fn=function(data,index) 66 | return(coef(lm(mpg~horsepower,data=data,subset=index))) 67 | boot.fn(Auto,1:392) 68 | set.seed(1) 69 | boot.fn(Auto,sample(392,392,replace=T)) 70 | boot.fn(Auto,sample(392,392,replace=T)) 71 | boot(Auto,boot.fn,1000) 72 | summary(lm(mpg~horsepower,data=Auto))$coef 73 | boot.fn=function(data,index) 74 | coefficients(lm(mpg~horsepower+I(horsepower^2),data=data,subset=index)) 75 | set.seed(1) 76 | boot(Auto,boot.fn,1000) 77 | summary(lm(mpg~horsepower+I(horsepower^2),data=Auto))$coef 78 | 79 | -------------------------------------------------------------------------------- /ch6/1.Rmd: -------------------------------------------------------------------------------- 1 | Chapter 6: Exercise 1 2 | ======================================================== 3 | 4 | ### a 5 | Best subset selection has the smallest training RSS because the other two 6 | methods determine models with a path dependency on which predictors they pick 7 | first as they iterate to the k'th model. 8 | 9 | ### b (*) 10 | Best subset selection may have the smallest test RSS because it considers more 11 | models then the other methods. However, the other models might have better luck 12 | picking a model that fits the test data better. 13 | 14 | ### c 15 | i. True. 16 | ii. True. 17 | iii. False. 18 | iv. False. 19 | v. False. 20 | 21 | -------------------------------------------------------------------------------- /ch6/1.md: -------------------------------------------------------------------------------- 1 | Chapter 6: Exercise 1 2 | ======================================================== 3 | 4 | ### a 5 | Best subset selection has the smallest training RSS because the other two 6 | methods determine models with a path dependency on which predictors they pick 7 | first as they iterate to the k'th model. 8 | 9 | ### b (*) 10 | Best subset selection may have the smallest test RSS because it considers more 11 | models then the other methods. However, the other models might have better luck 12 | picking a model that fits the test data better. 13 | 14 | ### c 15 | i. True. 16 | ii. True. 17 | iii. False. 18 | iv. False. 19 | v. False. 20 | 21 | -------------------------------------------------------------------------------- /ch6/10.Rmd: -------------------------------------------------------------------------------- 1 | Chapter 6: Exercise 10 2 | ====================== 3 | 4 | ## a 5 | ```{r} 6 | set.seed(1) 7 | p = 20 8 | n = 1000 9 | x = matrix(rnorm(n*p), n, p) 10 | B = rnorm(p) 11 | B[3] = 0 12 | B[4] = 0 13 | B[9] = 0 14 | B[19] = 0 15 | B[10] = 0 16 | eps = rnorm(p) 17 | y = x %*% B + eps 18 | ``` 19 | 20 | ## b 21 | ```{r} 22 | train = sample(seq(1000), 100, replace = FALSE) 23 | y.train = y[train,] 24 | y.test = y[-train,] 25 | x.train = x[train,] 26 | x.test = x[-train,] 27 | ``` 28 | 29 | ## c 30 | ```{r} 31 | library(leaps) 32 | regfit.full = regsubsets(y~., data=data.frame(x=x.train, y=y.train), nvmax=p) 33 | val.errors = rep(NA, p) 34 | x_cols = colnames(x, do.NULL=FALSE, prefix="x.") 35 | for (i in 1:p) { 36 | coefi = coef(regfit.full, id=i) 37 | pred = as.matrix(x.train[, x_cols %in% names(coefi)]) %*% coefi[names(coefi) %in% x_cols] 38 | val.errors[i] = mean((y.train - pred)^2) 39 | } 40 | plot(val.errors, ylab="Training MSE", pch=19, type="b") 41 | ``` 42 | 43 | ## d 44 | ```{r} 45 | val.errors = rep(NA, p) 46 | for (i in 1:p) { 47 | coefi = coef(regfit.full, id=i) 48 | pred = as.matrix(x.test[, x_cols %in% names(coefi)]) %*% coefi[names(coefi) %in% x_cols] 49 | val.errors[i] = mean((y.test - pred)^2) 50 | } 51 | plot(val.errors, ylab="Test MSE", pch=19, type="b") 52 | ``` 53 | 54 | ## e 55 | ```{r} 56 | which.min(val.errors) 57 | ``` 58 | 16 parameter model has the smallest test MSE. 59 | 60 | ## f 61 | ```{r} 62 | coef(regfit.full, id=16) 63 | ``` 64 | Caught all but one zeroed out coefficient at x.19. 65 | 66 | ## g 67 | ```{r} 68 | val.errors = rep(NA, p) 69 | a = rep(NA, p) 70 | b = rep(NA, p) 71 | for (i in 1:p) { 72 | coefi = coef(regfit.full, id=i) 73 | a[i] = length(coefi)-1 74 | b[i] = sqrt( 75 | sum((B[x_cols %in% names(coefi)] - coefi[names(coefi) %in% x_cols])^2) + 76 | sum(B[!(x_cols %in% names(coefi))])^2) 77 | } 78 | plot(x=a, y=b, xlab="number of coefficients", 79 | ylab="error between estimated and true coefficients") 80 | which.min(b) 81 | ``` 82 | Model with 9 coefficients (10 with intercept) minimizes the error between the 83 | estimated and true coefficients. Test error is minimized with 16 parameter model. 84 | A better fit of true coefficients as measured here doesn't mean the model will have 85 | a lower test MSE. 86 | -------------------------------------------------------------------------------- /ch6/11.Rmd: -------------------------------------------------------------------------------- 1 | Chapter 6: Exercise 11 2 | ====================== 3 | 4 | ## a 5 | ```{r} 6 | set.seed(1) 7 | library(MASS) 8 | library(leaps) 9 | library(glmnet) 10 | ``` 11 | 12 | ### Best subset selection 13 | ```{r} 14 | predict.regsubsets = function(object, newdata, id, ...) { 15 | form = as.formula(object$call[[2]]) 16 | mat = model.matrix(form, newdata) 17 | coefi = coef(object, id = id) 18 | mat[, names(coefi)] %*% coefi 19 | } 20 | 21 | k = 10 22 | p = ncol(Boston)-1 23 | folds = sample(rep(1:k, length=nrow(Boston))) 24 | cv.errors = matrix(NA, k, p) 25 | for (i in 1:k) { 26 | best.fit = regsubsets(crim~., data=Boston[folds!=i,], nvmax=p) 27 | for (j in 1:p) { 28 | pred = predict(best.fit, Boston[folds==i, ], id=j) 29 | cv.errors[i,j] = mean((Boston$crim[folds==i] - pred)^2) 30 | } 31 | } 32 | rmse.cv = sqrt(apply(cv.errors, 2, mean)) 33 | plot(rmse.cv, pch=19, type="b") 34 | which.min(rmse.cv) 35 | rmse.cv[which.min(rmse.cv)] 36 | ``` 37 | 38 | ### Lasso 39 | ```{r} 40 | x = model.matrix(crim~.-1, data=Boston) 41 | y = Boston$crim 42 | cv.lasso = cv.glmnet(x, y, type.measure="mse") 43 | plot(cv.lasso) 44 | coef(cv.lasso) 45 | sqrt(cv.lasso$cvm[cv.lasso$lambda == cv.lasso$lambda.1se]) 46 | ``` 47 | 48 | ### Ridge regression 49 | ```{r} 50 | x = model.matrix(crim~.-1, data=Boston) 51 | y = Boston$crim 52 | cv.ridge = cv.glmnet(x, y, type.measure="mse", alpha=0) 53 | plot(cv.ridge) 54 | coef(cv.ridge) 55 | sqrt(cv.ridge$cvm[cv.ridge$lambda == cv.ridge$lambda.1se]) 56 | ``` 57 | 58 | ### PCR 59 | ```{r} 60 | library(pls) 61 | pcr.fit = pcr(crim~., data=Boston, scale=TRUE, validation="CV") 62 | summary(pcr.fit) 63 | ``` 64 | 13 component pcr fit has lowest CV/adjCV RMSEP. 65 | 66 | ## b 67 | See above answers for cross-validate mean squared errors of selected models. 68 | 69 | ## c 70 | I would choose the 9 parameter best subset model because it had the best 71 | cross-validated RMSE, next to PCR, but it was simpler model than the 13 72 | component PCR model. -------------------------------------------------------------------------------- /ch6/2.Rmd: -------------------------------------------------------------------------------- 1 | Chapter 6: Exercise 2 2 | ======================================================== 3 | 4 | ### a (Lasso) 5 | iii. Less flexible and better predictions because of less variance, more bias 6 | 7 | ### b (Ridge regression) 8 | Same as lasso. iii. 9 | 10 | ### c (Non-linear methods) 11 | ii. More flexible, less bias, more variance 12 | -------------------------------------------------------------------------------- /ch6/2.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | Chapter 6: Exercise 2 8 | 9 | 136 | 137 | 138 | 139 | 140 | 141 | 142 | 143 | 144 |

Chapter 6: Exercise 2

145 | 146 |

a (Lasso)

147 | 148 |

iii. Less flexible and better predictions because of less variance, more bias

149 | 150 |

b (Ridge regression)

151 | 152 |

Same as lasso. iii.

153 | 154 |

c (Non-linear methods)

155 | 156 |

ii. More flexible, less bias, more variance

157 | 158 | 159 | 160 | 161 | 162 | -------------------------------------------------------------------------------- /ch6/2.md: -------------------------------------------------------------------------------- 1 | Chapter 6: Exercise 2 2 | ======================================================== 3 | 4 | ### a (Lasso) 5 | iii. Less flexible and better predictions because of less variance, more bias 6 | 7 | ### b (Ridge regression) 8 | Same as lasso. iii. 9 | 10 | ### c (Non-linear methods) 11 | ii. More flexible, less bias, more variance 12 | -------------------------------------------------------------------------------- /ch6/3.Rmd: -------------------------------------------------------------------------------- 1 | Chapter 6: Exercise 3 2 | ======================================================== 3 | 4 | ### a 5 | (iv) Steadily decreases: As we increase $s$ from $0$, all $\beta$ 's increase from $0$ to their least square estimate values. Training error for $0$ $\beta$ s is the maximum and it steadily decreases to the Ordinary Least Square RSS 6 | 7 | ### b 8 | (ii) Decrease initially, and then eventually start increasing in a U shape: When $s = 0$, all $\beta$ s are $0$, the model is extremely simple and has a high test RSS. As we increase $s$, $beta$ s assume non-zero values and model starts fitting well on test data and so test RSS decreases. Eventually, as $beta$ s approach their full blown OLS values, they start overfitting to the training data, increasing test RSS. 9 | 10 | ### c 11 | (iii) Steadily increase: When $s = 0$, the model effectively predicts a constant and has almost no variance. As we increase $s$, the models includes more $\beta$ s and their values start increasing. At this point, the values of $\beta$ s become highly dependent on training data, thus increasing the variance. 12 | 13 | ### d 14 | (iv) Steadily decrease: When $s = 0$, the model effectively predicts a constant and hence the prediction is far from actual value. Thus bias is high. As $s$ increases, more $\beta$ s become non-zero and thus the model continues to fit training data better. And thus, bias decreases. 15 | 16 | #### e 17 | (v) Remains constant: By definition, irreducible error is model independent and hence irrespective of the choice of $s$, remains constant. -------------------------------------------------------------------------------- /ch6/3.md: -------------------------------------------------------------------------------- 1 | Chapter 6: Exercise 3 2 | ======================================================== 3 | 4 | ### a 5 | (iv) Steadily decreases: As we increase $s$ from $0$, all $\beta$ 's increase from $0$ to their least square estimate values. Training error for $0$ $\beta$ s is the maximum and it steadily decreases to the Ordinary Least Square RSS 6 | 7 | ### b 8 | (ii) Decrease initially, and then eventually start increasing in a U shape: When $s = 0$, all $\beta$ s are $0$, the model is extremely simple and has a high test RSS. As we increase $s$, $beta$ s assume non-zero values and model starts fitting well on test data and so test RSS decreases. Eventually, as $beta$ s approach their full blown OLS values, they start overfitting to the training data, increasing test RSS. 9 | 10 | ### c 11 | (iii) Steadily increase: When $s = 0$, the model effectively predicts a constant and has almost no variance. As we increase $s$, the models includes more $\beta$ s and their values start increasing. At this point, the values of $\beta$ s become highly dependent on training data, thus increasing the variance. 12 | 13 | ### d 14 | (iv) Steadily decrease: When $s = 0$, the model effectively predicts a constant and hence the prediction is far from actual value. Thus bias is high. As $s$ increases, more $\beta$ s become non-zero and thus the model continues to fit training data better. And thus, bias decreases. 15 | 16 | #### e 17 | (v) Remains constant: By definition, irreducible error is model independent and hence irrespective of the choice of $s$, remains constant. 18 | -------------------------------------------------------------------------------- /ch6/4.Rmd: -------------------------------------------------------------------------------- 1 | Chapter 6: Exercise 4 2 | ======================================================== 3 | 4 | ### a 5 | (iii) Steadily increase: As we increase $\lambda$ from $0$, all $\beta$ 's decrease from their least square estimate values to $0$. Training error for full-blown-OLS $\beta$ s is the minimum and it steadily increases as $\beta$ s are reduced to $0$. 6 | 7 | ### b 8 | (ii) Decrease initially, and then eventually start increasing in a U shape: When $\lambda = 0$, all $\beta$ s have their least square estimate values. In this case, the model tries to fit hard to training data and hence test RSS is high. As we increase $\lambda$, $beta$ s start reducing to zero and some of the overfitting is reduced. Thus, test RSS initially decreases. Eventually, as $beta$ s approach $0$, the model becomes too simple and test RSS increases. 9 | 10 | ### c 11 | (iv) Steadily decreases: When $\lambda = 0$, the $\beta$ s have their least square estimate values. The actual estimates heavily depend on the training data and hence variance is high. As we increase $\lambda$, $\beta$ s start decreasing and model becomes simpler. In the limiting case of $\lambda$ approaching infinity, all $beta$ s reduce to zero and model predicts a constant and has no variance. 12 | 13 | ### d 14 | (iii) Steadily increases: When $\lambda = 0$, $\beta$ s have their least-square estimate values and hence have the least bias. As $\lambda$ increases, $\beta$ s start reducing towards zero, the model fits less accurately to training data and hence bias increases. In the limiting case of $\lambda$ approaching infinity, the model predicts a constant and hence bias is maximum. 15 | 16 | #### e 17 | (v) Remains constant: By definition, irreducible error is model independent and hence irrespective of the choice of $\lambda$, remains constant. -------------------------------------------------------------------------------- /ch6/4.md: -------------------------------------------------------------------------------- 1 | Chapter 6: Exercise 4 2 | ======================================================== 3 | 4 | ### a 5 | (iii) Steadily increase: As we increase $\lambda$ from $0$, all $\beta$ 's decrease from their least square estimate values to $0$. Training error for full-blown-OLS $\beta$ s is the minimum and it steadily increases as $\beta$ s are reduced to $0$. 6 | 7 | ### b 8 | (ii) Decrease initially, and then eventually start increasing in a U shape: When $\lambda = 0$, all $\beta$ s have their least square estimate values. In this case, the model tries to fit hard to training data and hence test RSS is high. As we increase $\lambda$, $beta$ s start reducing to zero and some of the overfitting is reduced. Thus, test RSS initially decreases. Eventually, as $beta$ s approach $0$, the model becomes too simple and test RSS increases. 9 | 10 | ### c 11 | (iv) Steadily decreases: When $\lambda = 0$, the $\beta$ s have their least square estimate values. The actual estimates heavily depend on the training data and hence variance is high. As we increase $\lambda$, $\beta$ s start decreasing and model becomes simpler. In the limiting case of $\lambda$ approaching infinity, all $beta$ s reduce to zero and model predicts a constant and has no variance. 12 | 13 | ### d 14 | (iii) Steadily increases: When $\lambda = 0$, $\beta$ s have their least-square estimate values and hence have the least bias. As $\lambda$ increases, $\beta$ s start reducing towards zero, the model fits less accurately to training data and hence bias increases. In the limiting case of $\lambda$ approaching infinity, the model predicts a constant and hence bias is maximum. 15 | 16 | #### e 17 | (v) Remains constant: By definition, irreducible error is model independent and hence irrespective of the choice of $\lambda$, remains constant. 18 | -------------------------------------------------------------------------------- /ch6/5.Rmd: -------------------------------------------------------------------------------- 1 | Chapter 6: Exercise 5 2 | ======================================================== 3 | 4 | ### a 5 | A general form of Ridge regression optimization looks like 6 | 7 | Minimize: $\sum\limits_{i=1}^n {(y_i - \hat{\beta}_0 - \sum\limits_{j=1}^p {\hat{\beta}_jx_j} )^2} + \lambda \sum\limits_{i=1}^p \hat{\beta}_i^2$ 8 | 9 | In this case, $\hat{\beta}_0 = 0$ and $n = p = 2$. So, the optimization looks like: 10 | 11 | Minimize: $(y_1 - \hat{\beta}_1x_{11} - \hat{\beta}_2x_{12})^2 + (y_2 - \hat{\beta}_1x_{21} - \hat{\beta}_2x_{22})^2 + \lambda (\hat{\beta}_1^2 + \hat{\beta}_2^2)$ 12 | 13 | ### b 14 | Now we are given that, $x_{11} = x_{12} = x_1$ and $x_{21} = x_{22} = x_2$. We take derivatives of above expression with respect to both $\hat{\beta_1}$ and $\hat{\beta_2}$ and setting them equal to zero find that, 15 | $\hat{\beta^*}_1 = \frac{x_1y_1 + x_2y_2 - \hat{\beta^*}_2(x_1^2 + x_2^2)}{\lambda + x_1^2 + x_2^2}$ and 16 | $\hat{\beta^*}_2 = \frac{x_1y_1 + x_2y_2 - \hat{\beta^*}_1(x_1^2 + x_2^2)}{\lambda + x_1^2 + x_2^2}$ 17 | 18 | Symmetry in these expressions suggests that $\hat{\beta^*}_1 = \hat{\beta^*}_2$ 19 | 20 | ### c 21 | Like Ridge regression, 22 | 23 | Minimize: $(y_1 - \hat{\beta}_1x_{11} - \hat{\beta}_2x_{12})^2 + (y_2 - \hat{\beta}_1x_{21} - \hat{\beta}_2x_{22})^2 + \lambda (| \hat{\beta}_1 | + | \hat{\beta}_2 |)$ 24 | 25 | ### d 26 | Here is a geometric interpretation of the solutions for the equation in *c* above. We use the alternate form of Lasso constraints $| \hat{\beta}_1 | + | \hat{\beta}_2 | < s$. 27 | 28 | The Lasso constraint take the form $| \hat{\beta}_1 | + | \hat{\beta}_2 | < s$, which when plotted take the familiar shape of a diamond centered at origin $(0, 0)$. Next consider the squared optimization constraint $(y_1 - \hat{\beta}_1x_{11} - \hat{\beta}_2x_{12})^2 + (y_2 - \hat{\beta}_1x_{21} - \hat{\beta}_2x_{22})^2$. We use the facts $x_{11} = x_{12}$, $x_{21} = x_{22}$, $x_{11} + x_{21} = 0$, $x_{12} + x_{22} = 0$ and $y_1 + y_2 = 0$ to simplify it to 29 | 30 | Minimize: $2.(y_1 - (\hat{\beta}_1 + \hat{\beta}_2)x_{11})^2$. 31 | 32 | This optimization problem has a simple solution: $\hat{\beta}_1 + \hat{\beta}_2 = \frac{y_1}{x_{11}}$. This is a line parallel to the edge of Lasso-diamond $\hat{\beta}_1 + \hat{\beta}_2 = s$. Now solutions to the original Lasso optimization problem are contours of the function $(y_1 - (\hat{\beta}_1 + \hat{\beta}_2)x_{11})^2$ that touch the Lasso-diamond $\hat{\beta}_1 + \hat{\beta}_2 = s$. Finally, as $\hat{\beta}_1$ and $\hat{\beta}_2$ very along the line $\hat{\beta}_1 + \hat{\beta}_2 = \frac{y_1}{x_{11}}$, these contours touch the Lasso-diamond edge $\hat{\beta}_1 + \hat{\beta}_2 = s$ at different points. As a result, the entire edge $\hat{\beta}_1 + \hat{\beta}_2 = s$ is a potential solution to the Lasso optimization problem! 33 | 34 | Similar argument can be made for the opposite Lasso-diamond edge: $\hat{\beta}_1 + \hat{\beta}_2 = -s$. 35 | 36 | Thus, the Lasso problem does not have a unique solution. The general form of solution is given by two line segments: 37 | 38 | $\hat{\beta}_1 + \hat{\beta}_2 = s; \hat{\beta}_1 \geq 0; \hat{\beta}_2 \geq 0$ 39 | and 40 | $\hat{\beta}_1 + \hat{\beta}_2 = -s; \hat{\beta}_1 \leq 0; \hat{\beta}_2 \leq 0$ 41 | 42 | -------------------------------------------------------------------------------- /ch6/5.md: -------------------------------------------------------------------------------- 1 | Chapter 6: Exercise 5 2 | ======================================================== 3 | 4 | ### a 5 | A general form of Ridge regression optimization looks like 6 | 7 | Minimize: $\sum\limits_{i=1}^n {(y_i - \hat{\beta}_0 - \sum\limits_{j=1}^p {\hat{\beta}_jx_j} )^2} + \lambda \sum\limits_{i=1}^p \hat{\beta}_i^2$ 8 | 9 | In this case, $\hat{\beta}_0 = 0$ and $n = p = 2$. So, the optimization looks like: 10 | 11 | Minimize: $(y_1 - \hat{\beta}_1x_{11} - \hat{\beta}_2x_{12})^2 + (y_2 - \hat{\beta}_1x_{21} - \hat{\beta}_2x_{22})^2 + \lambda (\hat{\beta}_1^2 + \hat{\beta}_2^2)$ 12 | 13 | ### b 14 | Now we are given that, $x_{11} = x_{12} = x_1$ and $x_{21} = x_{22} = x_2$. We take derivatives of above expression with respect to both $\hat{\beta_1}$ and $\hat{\beta_2}$ and setting them equal to zero find that, 15 | $\hat{\beta^*}_1 = \frac{x_1y_1 + x_2y_2 - \hat{\beta^*}_2(x_1^2 + x_2^2)}{\lambda + x_1^2 + x_2^2}$ and 16 | $\hat{\beta^*}_2 = \frac{x_1y_1 + x_2y_2 - \hat{\beta^*}_1(x_1^2 + x_2^2)}{\lambda + x_1^2 + x_2^2}$ 17 | 18 | Symmetry in these expressions suggests that $\hat{\beta^*}_1 = \hat{\beta^*}_2$ 19 | 20 | ### c 21 | Like Ridge regression, 22 | 23 | Minimize: $(y_1 - \hat{\beta}_1x_{11} - \hat{\beta}_2x_{12})^2 + (y_2 - \hat{\beta}_1x_{21} - \hat{\beta}_2x_{22})^2 + \lambda (| \hat{\beta}_1 | + | \hat{\beta}_2 |)$ 24 | 25 | ### d 26 | Here is a geometric interpretation of the solutions for the equation in *c* above. We use the alternate form of Lasso constraints $| \hat{\beta}_1 | + | \hat{\beta}_2 | < s$. 27 | 28 | The Lasso constraint take the form $| \hat{\beta}_1 | + | \hat{\beta}_2 | < s$, which when plotted take the familiar shape of a diamond centered at origin $(0, 0)$. Next consider the squared optimization constraint $(y_1 - \hat{\beta}_1x_{11} - \hat{\beta}_2x_{12})^2 + (y_2 - \hat{\beta}_1x_{21} - \hat{\beta}_2x_{22})^2$. We use the facts $x_{11} = x_{12}$, $x_{21} = x_{22}$, $x_{11} + x_{21} = 0$, $x_{12} + x_{22} = 0$ and $y_1 + y_2 = 0$ to simplify it to 29 | 30 | Minimize: $2.(y_1 - (\hat{\beta}_1 + \hat{\beta}_2)x_{11})^2$. 31 | 32 | This optimization problem has a simple solution: $\hat{\beta}_1 + \hat{\beta}_2 = \frac{y_1}{x_{11}}$. This is a line parallel to the edge of Lasso-diamond $\hat{\beta}_1 + \hat{\beta}_2 = s$. Now solutions to the original Lasso optimization problem are contours of the function $(y_1 - (\hat{\beta}_1 + \hat{\beta}_2)x_{11})^2$ that touch the Lasso-diamond $\hat{\beta}_1 + \hat{\beta}_2 = s$. Finally, as $\hat{\beta}_1$ and $\hat{\beta}_2$ very along the line $\hat{\beta}_1 + \hat{\beta}_2 = \frac{y_1}{x_{11}}$, these contours touch the Lasso-diamond edge $\hat{\beta}_1 + \hat{\beta}_2 = s$ at different points. As a result, the entire edge $\hat{\beta}_1 + \hat{\beta}_2 = s$ is a potential solution to the Lasso optimization problem! 33 | 34 | Similar argument can be made for the opposite Lasso-diamond edge: $\hat{\beta}_1 + \hat{\beta}_2 = -s$. 35 | 36 | Thus, the Lasso problem does not have a unique solution. The general form of solution is given by two line segments: 37 | 38 | $\hat{\beta}_1 + \hat{\beta}_2 = s; \hat{\beta}_1 \geq 0; \hat{\beta}_2 \geq 0$ 39 | and 40 | $\hat{\beta}_1 + \hat{\beta}_2 = -s; \hat{\beta}_1 \leq 0; \hat{\beta}_2 \leq 0$ 41 | 42 | -------------------------------------------------------------------------------- /ch6/6.Rmd: -------------------------------------------------------------------------------- 1 | Chapter 6: Exercise 6 2 | ======================================================== 3 | 4 | ### a 5 | For $p=1$, (6.12) takes the form $(y - \beta)^2 + \lambda\beta^2$. We plot this function for $y = 2, \lambda = 2$. 6 | ```{r 6a} 7 | y = 2 8 | lambda = 2 9 | betas = seq(-10, 10, 0.1) 10 | func = (y - betas)^2 + lambda * betas^2 11 | plot(betas, func, pch=20, xlab="beta", ylab="Ridge optimization") 12 | est.beta = y / (1+lambda) 13 | est.func = (y - est.beta)^2 + lambda * est.beta^2 14 | points(est.beta, est.func, col="red", pch=4, lwd=5, cex=est.beta) 15 | ``` 16 | The red cross shows that function is indeed minimized at $\beta = y / (1 + \lambda)$. 17 | 18 | ### b 19 | For $p=1$, (6.13) takes the form $(y - \beta)^2 + \lambda | \beta |$. We plot this function for $y = 2, \lambda = 2$. 20 | ```{r 6b} 21 | y = 2 22 | lambda = 2 23 | betas = seq(-3, 3, 0.01) 24 | func = (y - betas)^2 + lambda * abs(betas) 25 | plot(betas, func, pch=20, xlab="beta", ylab="Lasso optimization") 26 | est.beta = y - lambda / 2 27 | est.func = (y - est.beta)^2 + lambda * abs(est.beta) 28 | points(est.beta, est.func, col="red", pch=4, lwd=5, cex=est.beta) 29 | ``` 30 | The red cross shows that function is indeed minimized at $\beta = y - \lambda / 2$. 31 | -------------------------------------------------------------------------------- /ch6/6.md: -------------------------------------------------------------------------------- 1 | Chapter 6: Exercise 6 2 | ======================================================== 3 | 4 | ### a 5 | For $p=1$, (6.12) takes the form $(y - \beta)^2 + \lambda\beta^2$. We plot this function for $y = 2, \lambda = 2$. 6 | 7 | ```r 8 | y = 2 9 | lambda = 2 10 | betas = seq(-10, 10, 0.1) 11 | func = (y - betas)^2 + lambda * betas^2 12 | plot(betas, func, pch = 20, xlab = "beta", ylab = "Ridge optimization") 13 | est.beta = y/(1 + lambda) 14 | est.func = (y - est.beta)^2 + lambda * est.beta^2 15 | points(est.beta, est.func, col = "red", pch = 4, lwd = 5, cex = est.beta) 16 | ``` 17 | 18 | ![plot of chunk 6a](figure/6a.png) 19 | 20 | The red cross shows that function is indeed minimized at $\beta = y / (1 + \lambda)$. 21 | 22 | ### b 23 | For $p=1$, (6.13) takes the form $(y - \beta)^2 + \lambda | \beta |$. We plot this function for $y = 2, \lambda = 2$. 24 | 25 | ```r 26 | y = 2 27 | lambda = 2 28 | betas = seq(-3, 3, 0.01) 29 | func = (y - betas)^2 + lambda * abs(betas) 30 | plot(betas, func, pch = 20, xlab = "beta", ylab = "Lasso optimization") 31 | est.beta = y - lambda/2 32 | est.func = (y - est.beta)^2 + lambda * abs(est.beta) 33 | points(est.beta, est.func, col = "red", pch = 4, lwd = 5, cex = est.beta) 34 | ``` 35 | 36 | ![plot of chunk 6b](figure/6b.png) 37 | 38 | The red cross shows that function is indeed minimized at $\beta = y - \lambda / 2$. 39 | -------------------------------------------------------------------------------- /ch6/figure/6a.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch6/figure/6a.png -------------------------------------------------------------------------------- /ch6/figure/6b.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch6/figure/6b.png -------------------------------------------------------------------------------- /ch6/figure/8c1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch6/figure/8c1.png -------------------------------------------------------------------------------- /ch6/figure/8c2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch6/figure/8c2.png -------------------------------------------------------------------------------- /ch6/figure/8c3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch6/figure/8c3.png -------------------------------------------------------------------------------- /ch6/figure/8d.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch6/figure/8d.png -------------------------------------------------------------------------------- /ch6/figure/8e.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch6/figure/8e.png -------------------------------------------------------------------------------- /ch6/figure/9e.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch6/figure/9e.png -------------------------------------------------------------------------------- /ch6/figure/9f.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch6/figure/9f.png -------------------------------------------------------------------------------- /ch6/figure/9g.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch6/figure/9g.png -------------------------------------------------------------------------------- /ch6/figure/unnamed-chunk-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch6/figure/unnamed-chunk-1.png -------------------------------------------------------------------------------- /ch6/figure/unnamed-chunk-2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch6/figure/unnamed-chunk-2.png -------------------------------------------------------------------------------- /ch6/figure/unnamed-chunk-3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch6/figure/unnamed-chunk-3.png -------------------------------------------------------------------------------- /ch6/figure/unnamed-chunk-4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch6/figure/unnamed-chunk-4.png -------------------------------------------------------------------------------- /ch7/1.Rmd: -------------------------------------------------------------------------------- 1 | Chapter 7: Exercise 1 2 | ===================== 3 | 4 | ## a 5 | For $x \le \xi$, $f_1(x)$ has coefficients 6 | $a_1 = \beta_0, b_1 = \beta_1, c_1 = \beta_2, d_1 = \beta_3$. 7 | 8 | ## b 9 | For $x \gt \xi$, $f(x)$ has the form of: 10 | $$ 11 | \beta_0 + \beta_1 x + \beta_2 x^2 + \beta_3 x^3 + \beta_4 (x - \xi)^3 12 | \\ 13 | = \beta_0 + \beta_1 x + \beta_2 x^2 + \beta_3 x^3 + \beta_4 (x^3 - 3 x^2 \xi + 3 x \xi^2 - \xi^3) 14 | \\ 15 | = (\beta_0 - \beta_4 \xi^3) + (\beta_1 + 3 \beta_4 \xi^2) x + (\beta_2 - 3 \beta_4 \xi) x^2 + (\beta_3 + \beta_4) x^3 16 | $$ 17 | 18 | Thus, $a_2 = \beta_0 - \beta_4 \xi^3, b_2 = \beta_1 + 3 \beta_4 \xi^2, c_2 = \beta_2 - 3 \beta_4 \xi, d_2 = \beta_3 + \beta_4$. 19 | 20 | ## c 21 | $$ 22 | f_1(\xi) = \beta_0 + \beta_1 \xi + \beta_2 \xi^2 + \beta_3 \xi^3 23 | \\ 24 | f_2(\xi) = (\beta_0 - \beta_4 \xi^3) + (\beta_1 + 3 \beta_4 \xi^2) \xi + (\beta_2 - 3 \beta_4 \xi) \xi^2 + (\beta_3 + \beta_4) \xi^3 25 | \\ 26 | = \beta_0 - \beta_4 \xi^3 + \beta_1 \xi + 3 \beta_4 \xi^3 + \beta_2 \xi^2 - 3 \beta_4 \xi^3 + \beta_3 \xi^3 + \beta_4 \xi^3 27 | \\ 28 | = \beta_0 + \beta_1 \xi + \beta_2 \xi^2 + 3 \beta_4 \xi^3 - 3 \beta_4 \xi^3 + \beta_3 \xi^3 + \beta_4 \xi^3 - \beta_4 \xi^3 29 | \\ 30 | = \beta_0 + \beta_1 \xi + \beta_2 \xi^2 + \beta_3 \xi^3 31 | $$ 32 | 33 | ## d 34 | $$ 35 | f'(x) = b_1 + 2 c_1 x + 3 d_1 x^2 36 | \\ 37 | f_1'(\xi) = \beta_1 + 2 \beta_2 \xi + 3 \beta_3 \xi^2 38 | \\ 39 | f_2'(\xi) = \beta_1 + 3 \beta_4 \xi^2 + 2 (\beta_2 - 3 \beta_4 \xi) \xi + 3 (\beta_3 + \beta_4) \xi^2 40 | \\ 41 | = \beta_1 + 3 \beta_4 \xi^2 + 2 \beta_2 \xi - 6 \beta_4 \xi^2 + 3 \beta_3 \xi^2 + 3 \beta_4 \xi^2 42 | \\ 43 | = \beta_1 + 2 \beta_2 \xi + 3 \beta_3 \xi^2 + 3 \beta_4 \xi^2 + 3 \beta_4 \xi^2 - 6 \beta_4 \xi^2 44 | \\ 45 | = \beta_1 + 2 \beta_2 \xi + 3 \beta_3 \xi^2 46 | $$ 47 | 48 | 49 | ## e 50 | $$ 51 | f''(x) = 2 c_1 + 6 d_1 x 52 | \\ 53 | f_1''(\xi) = 2 \beta_2 + 6 \beta_3 \xi 54 | \\ 55 | f_2''(\xi) = 2 (\beta_2 - 3 \beta_4 \xi) + 6 (\beta_3 + \beta_4) \xi 56 | \\ 57 | = 2 \beta_2 + 6 \beta_3 \xi 58 | $$ 59 | -------------------------------------------------------------------------------- /ch7/1.md: -------------------------------------------------------------------------------- 1 | Chapter 7: Exercise 1 2 | ===================== 3 | 4 | ## a 5 | For $x \le \xi$, $f_1(x)$ has coefficients 6 | $a_1 = \beta_0, b_1 = \beta_1, c_1 = \beta_2, d_1 = \beta_3$. 7 | 8 | ## b 9 | For $x \gt \xi$, $f(x)$ has the form of: 10 | $$ 11 | \beta_0 + \beta_1 x + \beta_2 x^2 + \beta_3 x^3 + \beta_4 (x - \xi)^3 12 | \\ 13 | = \beta_0 + \beta_1 x + \beta_2 x^2 + \beta_3 x^3 + \beta_4 (x^3 - 3 x^2 \xi + 3 x \xi^2 - \xi^3) 14 | \\ 15 | = (\beta_0 - \beta_4 \xi^3) + (\beta_1 + 3 \beta_4 \xi^2) x + (\beta_2 - 3 \beta_4 \xi) x^2 + (\beta_3 + \beta_4) x^3 16 | $$ 17 | 18 | Thus, $a_2 = \beta_0 - \beta_4 \xi^3, b_2 = \beta_1 + 3 \beta_4 \xi^2, c_2 = \beta_2 - 3 \beta_4 \xi, d_2 = \beta_3 + \beta_4$. 19 | 20 | ## c 21 | $$ 22 | f_1(\xi) = \beta_0 + \beta_1 \xi + \beta_2 \xi^2 + \beta_3 \xi^3 23 | \\ 24 | f_2(\xi) = (\beta_0 - \beta_4 \xi^3) + (\beta_1 + 3 \beta_4 \xi^2) \xi + (\beta_2 - 3 \beta_4 \xi) \xi^2 + (\beta_3 + \beta_4) \xi^3 25 | \\ 26 | = \beta_0 - \beta_4 \xi^3 + \beta_1 \xi + 3 \beta_4 \xi^3 + \beta_2 \xi^2 - 3 \beta_4 \xi^3 + \beta_3 \xi^3 + \beta_4 \xi^3 27 | \\ 28 | = \beta_0 + \beta_1 \xi + \beta_2 \xi^2 + 3 \beta_4 \xi^3 - 3 \beta_4 \xi^3 + \beta_3 \xi^3 + \beta_4 \xi^3 - \beta_4 \xi^3 29 | \\ 30 | = \beta_0 + \beta_1 \xi + \beta_2 \xi^2 + \beta_3 \xi^3 31 | $$ 32 | 33 | ## d 34 | $$ 35 | f'(x) = b_1 + 2 c_1 x + 3 d_1 x^2 36 | \\ 37 | f_1'(\xi) = \beta_1 + 2 \beta_2 \xi + 3 \beta_3 \xi^2 38 | \\ 39 | f_2'(\xi) = \beta_1 + 3 \beta_4 \xi^2 + 2 (\beta_2 - 3 \beta_4 \xi) \xi + 3 (\beta_3 + \beta_4) \xi^2 40 | \\ 41 | = \beta_1 + 3 \beta_4 \xi^2 + 2 \beta_2 \xi - 6 \beta_4 \xi^2 + 3 \beta_3 \xi^2 + 3 \beta_4 \xi^2 42 | \\ 43 | = \beta_1 + 2 \beta_2 \xi + 3 \beta_3 \xi^2 + 3 \beta_4 \xi^2 + 3 \beta_4 \xi^2 - 6 \beta_4 \xi^2 44 | \\ 45 | = \beta_1 + 2 \beta_2 \xi + 3 \beta_3 \xi^2 46 | $$ 47 | 48 | 49 | ## e 50 | $$ 51 | f''(x) = 2 c_1 + 6 d_1 x 52 | \\ 53 | f_1''(\xi) = 2 \beta_2 + 6 \beta_3 \xi 54 | \\ 55 | f_2''(\xi) = 2 (\beta_2 - 3 \beta_4 \xi) + 6 (\beta_3 + \beta_4) \xi 56 | \\ 57 | = 2 \beta_2 + 6 \beta_3 \xi 58 | $$ 59 | -------------------------------------------------------------------------------- /ch7/10.Rmd: -------------------------------------------------------------------------------- 1 | Chapter 7: Exercise 10 2 | ======================================================== 3 | 4 | ### a 5 | ```{r 10a} 6 | set.seed(1) 7 | library(ISLR) 8 | library(leaps) 9 | attach(College) 10 | train = sample(length(Outstate), length(Outstate)/2) 11 | test = -train 12 | College.train = College[train, ] 13 | College.test = College[test, ] 14 | reg.fit = regsubsets(Outstate~., data=College.train, nvmax=17, method="forward") 15 | reg.summary = summary(reg.fit) 16 | par(mfrow=c(1, 3)) 17 | plot(reg.summary$cp,xlab="Number of Variables",ylab="Cp",type='l') 18 | min.cp = min(reg.summary$cp) 19 | std.cp = sd(reg.summary$cp) 20 | abline(h=min.cp+0.2*std.cp, col="red", lty=2) 21 | abline(h=min.cp-0.2*std.cp, col="red", lty=2) 22 | plot(reg.summary$bic,xlab="Number of Variables",ylab="BIC",type='l') 23 | min.bic = min(reg.summary$bic) 24 | std.bic = sd(reg.summary$bic) 25 | abline(h=min.bic+0.2*std.bic, col="red", lty=2) 26 | abline(h=min.bic-0.2*std.bic, col="red", lty=2) 27 | plot(reg.summary$adjr2,xlab="Number of Variables",ylab="Adjusted R2",type='l', ylim=c(0.4, 0.84)) 28 | max.adjr2 = max(reg.summary$adjr2) 29 | std.adjr2 = sd(reg.summary$adjr2) 30 | abline(h=max.adjr2+0.2*std.adjr2, col="red", lty=2) 31 | abline(h=max.adjr2-0.2*std.adjr2, col="red", lty=2) 32 | ``` 33 | All cp, BIC and adjr2 scores show that size 6 is the minimum size for the subset for which the scores are withing 0.2 standard deviations of optimum. We pick 6 as the best subset size and find best 6 variables using entire data. 34 | ```{r} 35 | reg.fit = regsubsets(Outstate~., data=College, method="forward") 36 | coefi = coef(reg.fit, id=6) 37 | names(coefi) 38 | ``` 39 | 40 | ### b 41 | ```{r 10b} 42 | library(gam) 43 | gam.fit = gam(Outstate ~ Private + s(Room.Board, df=2) + s(PhD, df=2) + s(perc.alumni, df=2) + s(Expend, df=5) + s(Grad.Rate, df=2), data=College.train) 44 | par(mfrow=c(2, 3)) 45 | plot(gam.fit, se=T, col="blue") 46 | ``` 47 | 48 | ### c 49 | ```{r} 50 | gam.pred = predict(gam.fit, College.test) 51 | gam.err = mean((College.test$Outstate - gam.pred)^2) 52 | gam.err 53 | gam.tss = mean((College.test$Outstate - mean(College.test$Outstate))^2) 54 | test.rss = 1 - gam.err / gam.tss 55 | test.rss 56 | ``` 57 | We obtain a test R-squared of 0.77 using GAM with 6 predictors. This is a slight improvement over a test RSS of 0.74 obtained using OLS. 58 | 59 | ### d 60 | ```{r} 61 | summary(gam.fit) 62 | ``` 63 | Non-parametric Anova test shows a strong evidence of non-linear relationship between response and Expend, and a moderately strong non-linear relationship (using p value of 0.05) between response and Grad.Rate or PhD. -------------------------------------------------------------------------------- /ch7/11.Rmd: -------------------------------------------------------------------------------- 1 | Chapter 7: Exercise 11 2 | ======================================================== 3 | 4 | ### a 5 | We create variables according to the equation $Y = -2.1 + 1.3X_1 + 0.54X_2$. 6 | ```{r} 7 | set.seed(1) 8 | X1 = rnorm(100) 9 | X2 = rnorm(100) 10 | eps = rnorm(100, sd=0.1) 11 | Y = -2.1 + 1.3 * X1 + 0.54 * X2 + eps 12 | ``` 13 | 14 | ### b 15 | Create a list of 1000 $\hat{beta}_0$, $\hat{beta}_1$ and $\hat{beta}_2$. Initialize first of the $\hat{\beta}_1$ to 10. 16 | 17 | ```{r} 18 | beta0 = rep(NA, 1000) 19 | beta1 = rep(NA, 1000) 20 | beta2 = rep(NA, 1000) 21 | beta1[1] = 10 22 | ``` 23 | 24 | ### c, d, e 25 | Accumulate results of 1000 iterations in the beta arrays. 26 | ```{r 11e} 27 | for (i in 1:1000) { 28 | a = Y - beta1[i] * X1 29 | beta2[i] = lm(a~X2)$coef[2] 30 | a = Y - beta2[i] * X2 31 | lm.fit = lm(a~X1) 32 | if (i < 1000) { 33 | beta1[i+1] = lm.fit$coef[2] 34 | } 35 | beta0[i] = lm.fit$coef[1] 36 | } 37 | plot(1:1000, beta0, type="l", xlab="iteration", ylab="betas", ylim=c(-2.2, 1.6), col="green") 38 | lines(1:1000, beta1, col="red") 39 | lines(1:1000, beta2, col="blue") 40 | legend('center', c("beta0","beta1","beta2"), lty=1, col=c("green","red","blue")) 41 | ``` 42 | The coefficients quickly attain their least square values. 43 | 44 | ### f 45 | ```{r 11f} 46 | lm.fit = lm(Y~X1+X2) 47 | plot(1:1000, beta0, type="l", xlab="iteration", ylab="betas", ylim=c(-2.2, 1.6), col="green") 48 | lines(1:1000, beta1, col="red") 49 | lines(1:1000, beta2, col="blue") 50 | abline(h=lm.fit$coef[1], lty="dashed", lwd=3, col=rgb(0, 0, 0, alpha=0.4)) 51 | abline(h=lm.fit$coef[2], lty="dashed", lwd=3, col=rgb(0, 0, 0, alpha=0.4)) 52 | abline(h=lm.fit$coef[3], lty="dashed", lwd=3, col=rgb(0, 0, 0, alpha=0.4)) 53 | legend('center', c("beta0","beta1","beta2", "multiple regression"), lty=c(1, 1, 1, 2), col=c("green","red","blue", "black")) 54 | ``` 55 | Dotted lines show that the estimated multiple regression coefficients match exactly with the coefficients obtained using backfitting. 56 | 57 | ### g 58 | When the relationship between $Y$ and $X$'s is linear, one iteration is sufficient to attain a good approximation of true regression coefficients. -------------------------------------------------------------------------------- /ch7/11.md: -------------------------------------------------------------------------------- 1 | Chapter 7: Exercise 11 2 | ======================================================== 3 | 4 | ### a 5 | We create variables according to the equation $Y = -2.1 + 1.3X_1 + 0.54X_2$. 6 | 7 | ```r 8 | set.seed(1) 9 | X1 = rnorm(100) 10 | X2 = rnorm(100) 11 | eps = rnorm(100, sd = 0.1) 12 | Y = -2.1 + 1.3 * X1 + 0.54 * X2 + eps 13 | ``` 14 | 15 | 16 | ### b 17 | Create a list of 1000 $\hat{beta}_0$, $\hat{beta}_1$ and $\hat{beta}_2$. Initialize first of the $\hat{\beta}_1$ to 10. 18 | 19 | 20 | ```r 21 | beta0 = rep(NA, 1000) 22 | beta1 = rep(NA, 1000) 23 | beta2 = rep(NA, 1000) 24 | beta1[1] = 10 25 | ``` 26 | 27 | 28 | ### c, d, e 29 | Accumulate results of 1000 iterations in the beta arrays. 30 | 31 | ```r 32 | for (i in 1:1000) { 33 | a = Y - beta1[i] * X1 34 | beta2[i] = lm(a ~ X2)$coef[2] 35 | a = Y - beta2[i] * X2 36 | lm.fit = lm(a ~ X1) 37 | if (i < 1000) { 38 | beta1[i + 1] = lm.fit$coef[2] 39 | } 40 | beta0[i] = lm.fit$coef[1] 41 | } 42 | plot(1:1000, beta0, type = "l", xlab = "iteration", ylab = "betas", ylim = c(-2.2, 43 | 1.6), col = "green") 44 | lines(1:1000, beta1, col = "red") 45 | lines(1:1000, beta2, col = "blue") 46 | legend("center", c("beta0", "beta1", "beta2"), lty = 1, col = c("green", "red", 47 | "blue")) 48 | ``` 49 | 50 | ![plot of chunk 11e](figure/11e.png) 51 | 52 | The coefficients quickly attain their least square values. 53 | 54 | ### f 55 | 56 | ```r 57 | lm.fit = lm(Y ~ X1 + X2) 58 | plot(1:1000, beta0, type = "l", xlab = "iteration", ylab = "betas", ylim = c(-2.2, 59 | 1.6), col = "green") 60 | lines(1:1000, beta1, col = "red") 61 | lines(1:1000, beta2, col = "blue") 62 | abline(h = lm.fit$coef[1], lty = "dashed", lwd = 3, col = rgb(0, 0, 0, alpha = 0.4)) 63 | abline(h = lm.fit$coef[2], lty = "dashed", lwd = 3, col = rgb(0, 0, 0, alpha = 0.4)) 64 | abline(h = lm.fit$coef[3], lty = "dashed", lwd = 3, col = rgb(0, 0, 0, alpha = 0.4)) 65 | legend("center", c("beta0", "beta1", "beta2", "multiple regression"), lty = c(1, 66 | 1, 1, 2), col = c("green", "red", "blue", "black")) 67 | ``` 68 | 69 | ![plot of chunk 11f](figure/11f.png) 70 | 71 | Dotted lines show that the estimated multiple regression coefficients match exactly with the coefficients obtained using backfitting. 72 | 73 | ### g 74 | When the relationship between $Y$ and $X$'s is linear, one iteration is sufficient to attain a good approximation of true regression coefficients. 75 | -------------------------------------------------------------------------------- /ch7/12.Rmd: -------------------------------------------------------------------------------- 1 | Chapter 7: Exercise 12 2 | ====================== 3 | 4 | ```{r} 5 | set.seed(1) 6 | p = 100 7 | n = 1000 8 | x = matrix(ncol=p, nrow=n) 9 | coefi = rep(NA, p) 10 | for (i in 1:p) { 11 | x[,i] = rnorm(n) 12 | coefi[i] = rnorm(1) * 100 13 | } 14 | y = x %*% coefi + rnorm(n) 15 | ``` 16 | 17 | ```{r} 18 | beta = rep(0, p) 19 | max_iterations = 1000 20 | errors = rep(NA, max_iterations + 1) 21 | iter = 2 22 | errors[1] = Inf 23 | errors[2] = sum((y - x %*% beta)^2) 24 | threshold = 0.0001 25 | while (iter < max_iterations && errors[iter-1] - errors[iter] > threshold) { 26 | for (i in 1:p) { 27 | a = y - x %*% beta + beta[i] * x[,i] 28 | beta[i] = lm(a~x[,i])$coef[2] 29 | } 30 | iter = iter + 1 31 | errors[iter] = sum((y - x %*% beta)^2) 32 | print(c(iter-2, errors[iter-1], errors[iter])) 33 | } 34 | ``` 35 | 10 iterations to get to a "good" approximation defined by the threshold on sum of squared errors between subsequent iterations. The error increases on the 11th 36 | iteration. 37 | 38 | ```{r} 39 | plot(1:11, errors[3:13]) 40 | ``` 41 | -------------------------------------------------------------------------------- /ch7/12.md: -------------------------------------------------------------------------------- 1 | Chapter 7: Exercise 12 2 | ====================== 3 | 4 | 5 | ```r 6 | set.seed(1) 7 | p = 100 8 | n = 1000 9 | x = matrix(ncol = p, nrow = n) 10 | coefi = rep(NA, p) 11 | for (i in 1:p) { 12 | x[, i] = rnorm(n) 13 | coefi[i] = rnorm(1) * 100 14 | } 15 | y = x %*% coefi + rnorm(n) 16 | ``` 17 | 18 | 19 | 20 | ```r 21 | beta = rep(0, p) 22 | max_iterations = 1000 23 | errors = rep(NA, max_iterations + 1) 24 | iter = 2 25 | errors[1] = Inf 26 | errors[2] = sum((y - x %*% beta)^2) 27 | threshold = 1e-04 28 | while (iter < max_iterations && errors[iter - 1] - errors[iter] > threshold) { 29 | for (i in 1:p) { 30 | a = y - x %*% beta + beta[i] * x[, i] 31 | beta[i] = lm(a ~ x[, i])$coef[2] 32 | } 33 | iter = iter + 1 34 | errors[iter] = sum((y - x %*% beta)^2) 35 | print(c(iter - 2, errors[iter - 1], errors[iter])) 36 | } 37 | ``` 38 | 39 | ``` 40 | ## [1] 1.000e+00 1.016e+09 3.747e+07 41 | ## [1] 2 37472751 1669889 42 | ## [1] 3 1669889 77924 43 | ## [1] 4 77924 6157 44 | ## [1] 5 6157 1277 45 | ## [1] 6.0 1277.0 928.3 46 | ## [1] 7.0 928.3 904.8 47 | ## [1] 8.0 904.8 903.2 48 | ## [1] 9.0 903.2 903.1 49 | ## [1] 10.0 903.1 903.1 50 | ## [1] 11.0 903.1 903.1 51 | ``` 52 | 53 | 10 iterations to get to a "good" approximation defined by the threshold on sum of squared errors between subsequent iterations. The error increases on the 11th 54 | iteration. 55 | 56 | 57 | ```r 58 | plot(1:11, errors[3:13]) 59 | ``` 60 | 61 | ![plot of chunk unnamed-chunk-3](figure/unnamed-chunk-3.png) 62 | 63 | -------------------------------------------------------------------------------- /ch7/2.Rmd: -------------------------------------------------------------------------------- 1 | Chapter 7: Exercise 2 2 | ===================== 3 | 4 | ## a. 5 | $g(x) = k$ because RSS term is ignored and $g(x) = k$ would minimize the area 6 | under the curve of $g^{(0)}$. 7 | 8 | ## b. 9 | $g(x) \alpha x^2$. $g(x)$ would be quadratic to minimize the area under the curve 10 | of its first derivative. 11 | 12 | ## c. 13 | $g(x) \alpha x^3$. $g(x)$ would be cubic to minimize the area under the curve 14 | of its second derivative. See Eqn 7.11. 15 | 16 | ## d. 17 | $g(x) \alpha x^4$. $g(x)$ would be quartic to minimize the area under the curve 18 | of its third derivative. 19 | 20 | ## e. 21 | The penalty term no longer matters. This is the formula for linear regression, 22 | to choose g based on minimizing RSS. 23 | 24 | -------------------------------------------------------------------------------- /ch7/2.md: -------------------------------------------------------------------------------- 1 | Chapter 7: Exercise 2 2 | ===================== 3 | 4 | ## a. 5 | $g(x) = k$ because RSS term is ignored and $g(x) = k$ would minimize the area 6 | under the curve of $g^{(0)}$. 7 | 8 | ## b. 9 | $g(x) \alpha x^2$. $g(x)$ would be quadratic to minimize the area under the curve 10 | of its first derivative. 11 | 12 | ## c. 13 | $g(x) \alpha x^3$. $g(x)$ would be cubic to minimize the area under the curve 14 | of its second derivative. See Eqn 7.11. 15 | 16 | ## d. 17 | $g(x) \alpha x^4$. $g(x)$ would be quartic to minimize the area under the curve 18 | of its third derivative. 19 | 20 | ## e. 21 | The penalty term no longer matters. This is the formula for linear regression, 22 | to choose g based on minimizing RSS. 23 | 24 | -------------------------------------------------------------------------------- /ch7/3.Rmd: -------------------------------------------------------------------------------- 1 | Chapter 7: Exercise 3 2 | ===================== 3 | 4 | ```{r} 5 | x = -2:2 6 | y = 1 + x + -2 * (x-1)^2 * I(x>1) 7 | plot(x, y) 8 | ``` 9 | -------------------------------------------------------------------------------- /ch7/4.Rmd: -------------------------------------------------------------------------------- 1 | Chapter 7: Exercise 4 2 | ===================== 3 | 4 | ```{r} 5 | x = -2:2 6 | y = c(1 + 0 + 0, # x = -2 7 | 1 + 0 + 0, # x = -1 8 | 1 + 1 + 0, # x = 0 9 | 1 + (1-0) + 0, # x = 1 10 | 1 + (1-1) + 0 # x =2 11 | ) 12 | plot(x,y) 13 | ``` 14 | -------------------------------------------------------------------------------- /ch7/4.md: -------------------------------------------------------------------------------- 1 | Chapter 7: Exercise 4 2 | ===================== 3 | 4 | 5 | ```r 6 | x = -2:2 7 | y = c(1 + 0 + 0, # x = -2 8 | 1 + 0 + 0, # x = -1 9 | 1 + 1 + 0, # x = 0 10 | 1 + (1-0) + 0, # x = 1 11 | 1 + (1-1) + 0 # x =2 12 | ) 13 | plot(x,y) 14 | ``` 15 | 16 | ![plot of chunk unnamed-chunk-1](figure/unnamed-chunk-1.png) 17 | 18 | -------------------------------------------------------------------------------- /ch7/5.Rmd: -------------------------------------------------------------------------------- 1 | Chapter 7: Exercise 5 2 | ===================== 3 | 4 | ## a 5 | We'd expect $\hat{g_2}$ to have the smaller training RSS because it will be a 6 | higher order polynomial due to the order of the derivative penalty function. 7 | 8 | ## b 9 | We'd expect $\hat{g_1}$ to have the smaller test RSS because $\hat{g_2}$ could 10 | overfit with the extra degree of freedom. 11 | 12 | ## c 13 | Trick question. $\hat{g_1} = \hat{g_2}$ when $\lambda = 0$. -------------------------------------------------------------------------------- /ch7/5.md: -------------------------------------------------------------------------------- 1 | Chapter 7: Exercise 5 2 | ===================== 3 | 4 | ## a 5 | We'd expect $\hat{g_2}$ to have the smaller training RSS because it will be a 6 | higher order polynomial due to the order of the derivative penalty function. 7 | 8 | ## b 9 | We'd expect $\hat{g_1}$ to have the smaller test RSS because $\hat{g_2}$ could 10 | overfit with the extra degree of freedom. 11 | 12 | ## c 13 | Trick question. $\hat{g_1} = \hat{g_2}$ when $\lambda = 0$. 14 | -------------------------------------------------------------------------------- /ch7/6.Rmd: -------------------------------------------------------------------------------- 1 | Chapter 7: Exercise 6 2 | ======================================================== 3 | 4 | ### a 5 | Load $Wage$ dataset. Keep an array of all cross-validation errors. We are performing K-fold cross validation with $K=10$. 6 | ```{r 6a} 7 | set.seed(1) 8 | library(ISLR) 9 | library(boot) 10 | all.deltas = rep(NA, 10) 11 | for (i in 1:10) { 12 | glm.fit = glm(wage~poly(age, i), data=Wage) 13 | all.deltas[i] = cv.glm(Wage, glm.fit, K=10)$delta[2] 14 | } 15 | plot(1:10, all.deltas, xlab="Degree", ylab="CV error", type="l", pch=20, lwd=2, ylim=c(1590, 1700)) 16 | min.point = min(all.deltas) 17 | sd.points = sd(all.deltas) 18 | abline(h=min.point + 0.2 * sd.points, col="red", lty="dashed") 19 | abline(h=min.point - 0.2 * sd.points, col="red", lty="dashed") 20 | legend("topright", "0.2-standard deviation lines", lty="dashed", col="red") 21 | ``` 22 | The cv-plot with standard deviation lines show that $d=3$ is the smallest degree giving reasonably small cross-validation error. 23 | 24 | We now find best degree using Anova. 25 | 26 | ```{r} 27 | fit.1 = lm(wage~poly(age, 1), data=Wage) 28 | fit.2 = lm(wage~poly(age, 2), data=Wage) 29 | fit.3 = lm(wage~poly(age, 3), data=Wage) 30 | fit.4 = lm(wage~poly(age, 4), data=Wage) 31 | fit.5 = lm(wage~poly(age, 5), data=Wage) 32 | fit.6 = lm(wage~poly(age, 6), data=Wage) 33 | fit.7 = lm(wage~poly(age, 7), data=Wage) 34 | fit.8 = lm(wage~poly(age, 8), data=Wage) 35 | fit.9 = lm(wage~poly(age, 9), data=Wage) 36 | fit.10 = lm(wage~poly(age, 10), data=Wage) 37 | anova(fit.1, fit.2, fit.3, fit.4, fit.5, fit.6, fit.7, fit.8, fit.9, fit.10) 38 | ``` 39 | Anova shows that all polynomials above degree $3$ are insignificant at $1%$ significance level. 40 | 41 | We now plot the polynomial prediction on the data 42 | ```{r 6aa} 43 | plot(wage~age, data=Wage, col="darkgrey") 44 | agelims = range(Wage$age) 45 | age.grid = seq(from=agelims[1], to=agelims[2]) 46 | lm.fit = lm(wage~poly(age, 3), data=Wage) 47 | lm.pred = predict(lm.fit, data.frame(age=age.grid)) 48 | lines(age.grid, lm.pred, col="blue", lwd=2) 49 | ``` 50 | 51 | ### b 52 | We use cut points of up to 10. 53 | ```{r 6b} 54 | all.cvs = rep(NA, 10) 55 | for (i in 2:10) { 56 | Wage$age.cut = cut(Wage$age, i) 57 | lm.fit = glm(wage~age.cut, data=Wage) 58 | all.cvs[i] = cv.glm(Wage, lm.fit, K=10)$delta[2] 59 | } 60 | plot(2:10, all.cvs[-1], xlab="Number of cuts", ylab="CV error", type="l", pch=20, lwd=2) 61 | ``` 62 | The cross validation shows that test error is minimum for $k=8$ cuts. 63 | 64 | We now train the entire data with step function using $8$ cuts and plot it. 65 | ```{r 6bb} 66 | lm.fit = glm(wage~cut(age, 8), data=Wage) 67 | agelims = range(Wage$age) 68 | age.grid = seq(from=agelims[1], to=agelims[2]) 69 | lm.pred = predict(lm.fit, data.frame(age=age.grid)) 70 | plot(wage~age, data=Wage, col="darkgrey") 71 | lines(age.grid, lm.pred, col="red", lwd=2) 72 | ``` -------------------------------------------------------------------------------- /ch7/7.Rmd: -------------------------------------------------------------------------------- 1 | Chapter 7: Exercise 7 2 | ===================== 3 | 4 | ```{r} 5 | library(ISLR) 6 | set.seed(1) 7 | ``` 8 | 9 | ```{r 7.1,fig.width=16} 10 | summary(Wage$maritl) 11 | summary(Wage$jobclass) 12 | par(mfrow=c(1,2)) 13 | plot(Wage$maritl, Wage$wage) 14 | plot(Wage$jobclass, Wage$wage) 15 | ``` 16 | It appears a married couple makes more money on average than other groups. It 17 | also appears that Informational jobs are higher-wage than Industrial jobs on 18 | average. 19 | 20 | ## Polynomial and Step functions 21 | ```{r 7.2} 22 | fit = lm(wage~maritl, data=Wage) 23 | deviance(fit) 24 | fit = lm(wage~jobclass, data=Wage) 25 | deviance(fit) 26 | fit = lm(wage~maritl+jobclass, data=Wage) 27 | deviance(fit) 28 | ``` 29 | 30 | ## Splines 31 | Unable to fit splines on categorical variables. 32 | 33 | ## GAMs 34 | ```{r} 35 | library(gam) 36 | fit = gam(wage~maritl+jobclass+s(age,4), data=Wage) 37 | deviance(fit) 38 | ``` 39 | 40 | Without more advanced techniques, we cannot fit splines to categorical 41 | variables (factors). `maritl` and `jobclass` do add statistically significant 42 | improvements to the previously discussed models. 43 | 44 | -------------------------------------------------------------------------------- /ch7/7.md: -------------------------------------------------------------------------------- 1 | Chapter 7: Exercise 7 2 | ===================== 3 | 4 | 5 | ```r 6 | library(ISLR) 7 | set.seed(1) 8 | ``` 9 | 10 | 11 | 12 | ```r 13 | summary(Wage$maritl) 14 | ``` 15 | 16 | ``` 17 | ## 1. Never Married 2. Married 3. Widowed 4. Divorced 18 | ## 648 2074 19 204 19 | ## 5. Separated 20 | ## 55 21 | ``` 22 | 23 | ```r 24 | summary(Wage$jobclass) 25 | ``` 26 | 27 | ``` 28 | ## 1. Industrial 2. Information 29 | ## 1544 1456 30 | ``` 31 | 32 | ```r 33 | par(mfrow = c(1, 2)) 34 | plot(Wage$maritl, Wage$wage) 35 | plot(Wage$jobclass, Wage$wage) 36 | ``` 37 | 38 | ![plot of chunk 7.1](figure/7_1.png) 39 | 40 | It appears a married couple makes more money on average than other groups. It 41 | also appears that Informational jobs are higher-wage than Industrial jobs on 42 | average. 43 | 44 | ## Polynomial and Step functions 45 | 46 | ```r 47 | fit = lm(wage ~ maritl, data = Wage) 48 | deviance(fit) 49 | ``` 50 | 51 | ``` 52 | ## [1] 4858941 53 | ``` 54 | 55 | ```r 56 | fit = lm(wage ~ jobclass, data = Wage) 57 | deviance(fit) 58 | ``` 59 | 60 | ``` 61 | ## [1] 4998547 62 | ``` 63 | 64 | ```r 65 | fit = lm(wage ~ maritl + jobclass, data = Wage) 66 | deviance(fit) 67 | ``` 68 | 69 | ``` 70 | ## [1] 4654752 71 | ``` 72 | 73 | 74 | ## Splines 75 | Unable to fit splines on categorical variables. 76 | 77 | ## GAMs 78 | 79 | ```r 80 | library(gam) 81 | ``` 82 | 83 | ``` 84 | ## Loading required package: splines Loaded gam 1.09 85 | ``` 86 | 87 | ```r 88 | fit = gam(wage ~ maritl + jobclass + s(age, 4), data = Wage) 89 | deviance(fit) 90 | ``` 91 | 92 | ``` 93 | ## [1] 4476501 94 | ``` 95 | 96 | 97 | Without more advanced techniques, we cannot fit splines to categorical 98 | variables (factors). `maritl` and `jobclass` do add statistically significant 99 | improvements to the previously discussed models. 100 | 101 | -------------------------------------------------------------------------------- /ch7/8.Rmd: -------------------------------------------------------------------------------- 1 | Chapter 7: Exercise 8 2 | ===================== 3 | 4 | ```{r 8.1} 5 | library(ISLR) 6 | set.seed(1) 7 | pairs(Auto) 8 | ``` 9 | mpg appears inversely proportional to cylinders, displacement, horsepower, 10 | weight. 11 | 12 | ## Polynomial 13 | ```{r} 14 | rss = rep(NA, 10) 15 | fits = list() 16 | for (d in 1:10) { 17 | fits[[d]] = lm(mpg~poly(displacement, d), data=Auto) 18 | rss[d] = deviance(fits[[d]]) 19 | } 20 | rss 21 | anova(fits[[1]], fits[[2]], fits[[3]], fits[[4]]) 22 | ``` 23 | Training RSS decreases over time. Quadratic polynomic sufficient from 24 | ANOVA-perspective. 25 | 26 | ```{r} 27 | library(glmnet) 28 | library(boot) 29 | cv.errs = rep(NA, 15) 30 | for (d in 1:15) { 31 | fit = glm(mpg~poly(displacement, d), data=Auto) 32 | cv.errs[d] = cv.glm(Auto, fit, K=10)$delta[2] 33 | } 34 | which.min(cv.errs) 35 | cv.errs 36 | ``` 37 | Surprisingly, cross-validation selected a 10th-degree polynomial. 38 | 39 | ## Step functions 40 | ```{r 8.step} 41 | cv.errs = rep(NA,10) 42 | for (c in 2:10) { 43 | Auto$dis.cut = cut(Auto$displacement, c) 44 | fit = glm(mpg~dis.cut, data=Auto) 45 | cv.errs[c] = cv.glm(Auto, fit, K=10)$delta[2] 46 | } 47 | which.min(cv.errs) 48 | cv.errs 49 | ``` 50 | 51 | ## Splines 52 | ```{r} 53 | library(splines) 54 | cv.errs = rep(NA,10) 55 | for (df in 3:10) { 56 | fit = glm(mpg~ns(displacement, df=df), data=Auto) 57 | cv.errs[df] = cv.glm(Auto, fit, K=10)$delta[2] 58 | } 59 | which.min(cv.errs) 60 | cv.errs 61 | ``` 62 | 63 | ## GAMs 64 | ```{r} 65 | library(gam) 66 | fit = gam(mpg~s(displacement, 4) + s(horsepower, 4), data=Auto) 67 | summary(fit) 68 | ``` 69 | -------------------------------------------------------------------------------- /ch7/9.Rmd: -------------------------------------------------------------------------------- 1 | Chapter 6: Exercise 9 2 | ======================================================== 3 | 4 | Load the Boston dataset 5 | ```{r} 6 | set.seed(1) 7 | library(MASS) 8 | attach(Boston) 9 | ``` 10 | 11 | ### a 12 | 13 | ```{r 9a} 14 | lm.fit = lm(nox~poly(dis, 3), data=Boston) 15 | summary(lm.fit) 16 | dislim = range(dis) 17 | dis.grid = seq(from=dislim[1], to=dislim[2], by=0.1) 18 | lm.pred = predict(lm.fit, list(dis=dis.grid)) 19 | plot(nox~dis, data=Boston, col="darkgrey") 20 | lines(dis.grid, lm.pred, col="red", lwd=2) 21 | ``` 22 | 23 | Summary shows that all polynomial terms are significant while predicting nox using dis. Plot shows a smooth curve fitting the data fairly well. 24 | 25 | 26 | ### b 27 | We plot polynomials of degrees 1 to 10 and save train RSS. 28 | ```{r} 29 | all.rss = rep(NA, 10) 30 | for (i in 1:10) { 31 | lm.fit = lm(nox~poly(dis, i), data=Boston) 32 | all.rss[i] = sum(lm.fit$residuals^2) 33 | } 34 | all.rss 35 | ``` 36 | As expected, train RSS monotonically decreases with degree of polynomial. 37 | 38 | ### c 39 | We use a 10-fold cross validation to pick the best polynomial degree. 40 | ```{r 9c} 41 | library(boot) 42 | all.deltas = rep(NA, 10) 43 | for (i in 1:10) { 44 | glm.fit = glm(nox~poly(dis, i), data=Boston) 45 | all.deltas[i] = cv.glm(Boston, glm.fit, K=10)$delta[2] 46 | } 47 | plot(1:10, all.deltas, xlab="Degree", ylab="CV error", type="l", pch=20, lwd=2) 48 | ``` 49 | A 10-fold CV shows that the CV error reduces as we increase degree from 1 to 3, stay almost constant till degree 5, and the starts increasing for higher degrees. We pick 4 as the best polynomial degree. 50 | 51 | ### d 52 | We see that dis has limits of about 1 and 13 respectively. We split this range in roughly equal 4 intervals and establish knots at $[4, 7, 11]$. Note: bs function in R expects either df or knots argument. If both are specified, knots are ignored. 53 | ```{r 9d} 54 | library(splines) 55 | sp.fit = lm(nox~bs(dis, df=4, knots=c(4, 7, 11)), data=Boston) 56 | summary(sp.fit) 57 | sp.pred = predict(sp.fit, list(dis=dis.grid)) 58 | plot(nox~dis, data=Boston, col="darkgrey") 59 | lines(dis.grid, sp.pred, col="red", lwd=2) 60 | ``` 61 | The summary shows that all terms in spline fit are significant. Plot shows that the spline fits data well except at the extreme values of $dis$, (especially $dis > 10$). 62 | 63 | ### e 64 | We fit regression splines with dfs between 3 and 16. 65 | ```{r} 66 | all.cv = rep(NA, 16) 67 | for (i in 3:16) { 68 | lm.fit = lm(nox~bs(dis, df=i), data=Boston) 69 | all.cv[i] = sum(lm.fit$residuals^2) 70 | } 71 | all.cv[-c(1, 2)] 72 | ``` 73 | Train RSS monotonically decreases till df=14 and then slightly increases for df=15 and df=16. 74 | 75 | ### f 76 | Finally, we use a 10-fold cross validation to find best df. We try all integer values of df between 3 and 16. 77 | ```{r 9f} 78 | all.cv = rep(NA, 16) 79 | for (i in 3:16) { 80 | lm.fit = glm(nox~bs(dis, df=i), data=Boston) 81 | all.cv[i] = cv.glm(Boston, lm.fit, K=10)$delta[2] 82 | } 83 | plot(3:16, all.cv[-c(1, 2)], lwd=2, type="l", xlab="df", ylab="CV error") 84 | ``` 85 | CV error is more jumpy in this case, but attains minimum at df=10. We pick $10$ as the optimal degrees of freedom. -------------------------------------------------------------------------------- /ch7/figure/10a.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch7/figure/10a.png -------------------------------------------------------------------------------- /ch7/figure/10b.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch7/figure/10b.png -------------------------------------------------------------------------------- /ch7/figure/11e.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch7/figure/11e.png -------------------------------------------------------------------------------- /ch7/figure/11f.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch7/figure/11f.png -------------------------------------------------------------------------------- /ch7/figure/6a.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch7/figure/6a.png -------------------------------------------------------------------------------- /ch7/figure/6aa.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch7/figure/6aa.png -------------------------------------------------------------------------------- /ch7/figure/6b.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch7/figure/6b.png -------------------------------------------------------------------------------- /ch7/figure/6bb.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch7/figure/6bb.png -------------------------------------------------------------------------------- /ch7/figure/7_1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch7/figure/7_1.png -------------------------------------------------------------------------------- /ch7/figure/8_1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch7/figure/8_1.png -------------------------------------------------------------------------------- /ch7/figure/9a.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch7/figure/9a.png -------------------------------------------------------------------------------- /ch7/figure/9c.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch7/figure/9c.png -------------------------------------------------------------------------------- /ch7/figure/9d.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch7/figure/9d.png -------------------------------------------------------------------------------- /ch7/figure/9f.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch7/figure/9f.png -------------------------------------------------------------------------------- /ch7/figure/unnamed-chunk-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch7/figure/unnamed-chunk-1.png -------------------------------------------------------------------------------- /ch8/1.Rmd: -------------------------------------------------------------------------------- 1 | Chapter 8: Exercise 1 2 | ======================================================== 3 | 4 | ```{r label="1"} 5 | par(xpd=NA) 6 | plot(NA, NA, type="n", xlim=c(0,100), ylim=c(0,100), xlab="X", ylab="Y") 7 | # t1: x = 40; (40, 0) (40, 100) 8 | lines(x=c(40,40),y=c(0,100)) 9 | text(x=40, y=108, labels=c("t1"), col="red") 10 | # t2: y = 75; (0, 75) (40, 75) 11 | lines(x=c(0,40), y=c(75,75)) 12 | text(x=-8, y=75, labels=c("t2"), col="red") 13 | # t3: x = 75; (75,0) (75, 100) 14 | lines(x=c(75,75),y=c(0,100)) 15 | text(x=75, y=108, labels=c("t3"), col="red") 16 | # t4: x = 20; (20,0) (20, 75) 17 | lines(x=c(20,20),y=c(0,75)) 18 | text(x=20, y=80, labels=c("t4"), col="red") 19 | # t5: y=25; (75,25) (100,25) 20 | lines(x=c(75,100),y=c(25,25)) 21 | text(x=70, y=25, labels=c("t5"), col="red") 22 | 23 | text(x=(40+75)/2, y=50, labels=c("R1")) 24 | text(x=20, y=(100+75)/2, labels=c("R2")) 25 | text(x=(75+100)/2, y=(100+25)/2, labels=c("R3")) 26 | text(x=(75+100)/2, y=25/2, labels=c("R4")) 27 | text(x=30, y=75/2, labels=c("R5")) 28 | text(x=10, y=75/2, labels=c("R6")) 29 | ``` 30 | 31 | ``` 32 | [ X<40 ] 33 | | | 34 | [Y<75] [X<75] 35 | | | | | 36 | [X<20] R2 R1 [Y<25] 37 | | | | | 38 | R6 R5 R4 R3 39 | ``` 40 | 41 | 42 | -------------------------------------------------------------------------------- /ch8/1.md: -------------------------------------------------------------------------------- 1 | Chapter 8: Exercise 1 2 | ======================================================== 3 | 4 | 5 | ```r 6 | par(xpd = NA) 7 | plot(NA, NA, type = "n", xlim = c(0, 100), ylim = c(0, 100), xlab = "X", ylab = "Y") 8 | # t1: x = 40; (40, 0) (40, 100) 9 | lines(x = c(40, 40), y = c(0, 100)) 10 | text(x = 40, y = 108, labels = c("t1"), col = "red") 11 | # t2: y = 75; (0, 75) (40, 75) 12 | lines(x = c(0, 40), y = c(75, 75)) 13 | text(x = -8, y = 75, labels = c("t2"), col = "red") 14 | # t3: x = 75; (75,0) (75, 100) 15 | lines(x = c(75, 75), y = c(0, 100)) 16 | text(x = 75, y = 108, labels = c("t3"), col = "red") 17 | # t4: x = 20; (20,0) (20, 75) 18 | lines(x = c(20, 20), y = c(0, 75)) 19 | text(x = 20, y = 80, labels = c("t4"), col = "red") 20 | # t5: y=25; (75,25) (100,25) 21 | lines(x = c(75, 100), y = c(25, 25)) 22 | text(x = 70, y = 25, labels = c("t5"), col = "red") 23 | 24 | text(x = (40 + 75)/2, y = 50, labels = c("R1")) 25 | text(x = 20, y = (100 + 75)/2, labels = c("R2")) 26 | text(x = (75 + 100)/2, y = (100 + 25)/2, labels = c("R3")) 27 | text(x = (75 + 100)/2, y = 25/2, labels = c("R4")) 28 | text(x = 30, y = 75/2, labels = c("R5")) 29 | text(x = 10, y = 75/2, labels = c("R6")) 30 | ``` 31 | 32 | ![plot of chunk 1](figure/1.png) 33 | 34 | 35 | ``` 36 | [ X<40 ] 37 | | | 38 | [Y<75] [X<75] 39 | | | | | 40 | [X<20] R2 R1 [Y<25] 41 | | | | | 42 | R6 R5 R4 R3 43 | ``` 44 | 45 | 46 | -------------------------------------------------------------------------------- /ch8/10.Rmd: -------------------------------------------------------------------------------- 1 | Chapter 8: Exercise 10 2 | ======================================================== 3 | 4 | ### a 5 | ```{r 10a} 6 | library(ISLR) 7 | sum(is.na(Hitters$Salary)) 8 | Hitters = Hitters[-which(is.na(Hitters$Salary)), ] 9 | sum(is.na(Hitters$Salary)) 10 | Hitters$Salary = log(Hitters$Salary) 11 | ``` 12 | 13 | ### b 14 | ```{r 10b} 15 | train = 1:200 16 | Hitters.train = Hitters[train, ] 17 | Hitters.test = Hitters[-train, ] 18 | ``` 19 | 20 | ### c 21 | ```{r 10c} 22 | library(gbm) 23 | set.seed(103) 24 | pows = seq(-10, -0.2, by=0.1) 25 | lambdas = 10 ^ pows 26 | length.lambdas = length(lambdas) 27 | train.errors = rep(NA, length.lambdas) 28 | test.errors = rep(NA, length.lambdas) 29 | for (i in 1:length.lambdas) { 30 | boost.hitters = gbm(Salary~., data=Hitters.train, distribution="gaussian", n.trees=1000, shrinkage=lambdas[i]) 31 | train.pred = predict(boost.hitters, Hitters.train, n.trees=1000) 32 | test.pred = predict(boost.hitters, Hitters.test, n.trees=1000) 33 | train.errors[i] = mean((Hitters.train$Salary - train.pred)^2) 34 | test.errors[i] = mean((Hitters.test$Salary - test.pred)^2) 35 | } 36 | 37 | plot(lambdas, train.errors, type="b", xlab="Shrinkage", ylab="Train MSE", col="blue", pch=20) 38 | ``` 39 | 40 | ### d 41 | ```{r 10d} 42 | plot(lambdas, test.errors, type="b", xlab="Shrinkage", ylab="Test MSE", col="red", pch=20) 43 | min(test.errors) 44 | lambdas[which.min(test.errors)] 45 | ``` 46 | Minimum test error is obtained at $\lambda = 0.05$. 47 | 48 | ### e 49 | ```{r 10e} 50 | lm.fit = lm(Salary~., data=Hitters.train) 51 | lm.pred = predict(lm.fit, Hitters.test) 52 | mean((Hitters.test$Salary - lm.pred)^2) 53 | library(glmnet) 54 | set.seed(134) 55 | x = model.matrix(Salary~., data=Hitters.train) 56 | y = Hitters.train$Salary 57 | x.test = model.matrix(Salary~., data=Hitters.test) 58 | lasso.fit = glmnet(x, y, alpha=1) 59 | lasso.pred = predict(lasso.fit, s=0.01, newx=x.test) 60 | mean((Hitters.test$Salary - lasso.pred)^2) 61 | ``` 62 | Both linear model and regularization like Lasso have higher test MSE than boosting. 63 | 64 | ### f 65 | ```{r 10f} 66 | boost.best = gbm(Salary~., data=Hitters.train, distribution="gaussian", n.trees=1000, shrinkage=lambdas[which.min(test.errors)]) 67 | summary(boost.best) 68 | ``` 69 | $\tt{CAtBat}$, $\tt{CRBI}$ and $\tt{CWalks}$ are three most important variables in that order. 70 | 71 | ### g 72 | ```{r 10g} 73 | library(randomForest) 74 | set.seed(21) 75 | rf.hitters = randomForest(Salary~., data=Hitters.train, ntree=500, mtry=19) 76 | rf.pred = predict(rf.hitters, Hitters.test) 77 | mean((Hitters.test$Salary - rf.pred)^2) 78 | ``` 79 | Test MSE for bagging is about $0.23$, which is slightly lower than the best test MSE for boosting. -------------------------------------------------------------------------------- /ch8/11.Rmd: -------------------------------------------------------------------------------- 1 | Chapter 8: Exercise 11 2 | ======================================================== 3 | 4 | ### a 5 | ```{r 11a} 6 | library(ISLR) 7 | train = 1:1000 8 | Caravan$Purchase = ifelse(Caravan$Purchase == "Yes", 1, 0) 9 | Caravan.train = Caravan[train, ] 10 | Caravan.test = Caravan[-train, ] 11 | ``` 12 | 13 | ### b 14 | ```{r 11b} 15 | library(gbm) 16 | set.seed(342) 17 | boost.caravan = gbm(Purchase~., data=Caravan.train, n.trees=1000, shrinkage=0.01, distribution="bernoulli") 18 | summary(boost.caravan) 19 | ``` 20 | $\tt{PPERSAUT}$, $\tt{MKOOPKLA}$ and $\tt{MOPLHOOG}$ are three most important variables in that order. 21 | 22 | ### c 23 | ```{r 11c} 24 | boost.prob = predict(boost.caravan, Caravan.test, n.trees=1000, type="response") 25 | boost.pred = ifelse(boost.prob >0.2, 1, 0) 26 | table(Caravan.test$Purchase, boost.pred) 27 | 34 / (137 + 34) 28 | ``` 29 | About $20$% of people predicted to make purchase actually end up making one. 30 | ```{r} 31 | lm.caravan = glm(Purchase~., data=Caravan.train, family=binomial) 32 | lm.prob = predict(lm.caravan, Caravan.test, type="response") 33 | lm.pred = ifelse(lm.prob > 0.2, 1, 0) 34 | table(Caravan.test$Purchase, lm.pred) 35 | 58 / (350 + 58) 36 | ``` 37 | About $14$% of people predicted to make purchase using logistic regression actually end up making one. This is lower than boosting. -------------------------------------------------------------------------------- /ch8/12.Rmd: -------------------------------------------------------------------------------- 1 | Chapter 8: Exercise 12 2 | ====================== 3 | 4 | In this exercise I chose to examine the `Weekly` stock market data from the ISLR 5 | package. 6 | 7 | ```{r} 8 | set.seed(1) 9 | library(ISLR) 10 | summary(Weekly) 11 | train = sample(nrow(Weekly), 2/3 * nrow(Weekly)) 12 | test = -train 13 | ``` 14 | 15 | ## Logistic regression 16 | ```{r} 17 | glm.fit = glm(Direction~.-Year-Today, data=Weekly[train,], family="binomial") 18 | glm.probs = predict(glm.fit, newdata=Weekly[test, ], type = "response") 19 | glm.pred = rep("Down", length(glm.probs)) 20 | glm.pred[glm.probs > 0.5] = "Up" 21 | table(glm.pred, Weekly$Direction[test]) 22 | mean(glm.pred != Weekly$Direction[test]) 23 | ``` 24 | 25 | ## Boosting 26 | ```{r} 27 | library(gbm) 28 | Weekly$BinomialDirection = ifelse(Weekly$Direction == "Up", 1, 0) 29 | boost.weekly = gbm(BinomialDirection~.-Year-Today-Direction, 30 | data=Weekly[train,], 31 | distribution="bernoulli", 32 | n.trees=5000) 33 | yhat.boost = predict(boost.weekly, newdata=Weekly[test,], n.trees=5000) 34 | yhat.pred = rep(0, length(yhat.boost)) 35 | yhat.pred[yhat.boost > 0.5] = 1 36 | table(yhat.pred, Weekly$BinomialDirection[test]) 37 | mean(yhat.pred != Weekly$BinomialDirection[test]) 38 | ``` 39 | 40 | ## Bagging 41 | ```{r} 42 | Weekly = Weekly[,!(names(Weekly) %in% c("BinomialDirection"))] 43 | library(randomForest) 44 | bag.weekly = randomForest(Direction~.-Year-Today, 45 | data=Weekly, 46 | subset=train, 47 | mtry=6) 48 | yhat.bag = predict(bag.weekly, newdata=Weekly[test,]) 49 | table(yhat.bag, Weekly$Direction[test]) 50 | mean(yhat.bag != Weekly$Direction[test]) 51 | ``` 52 | 53 | ## Random forests 54 | ```{r} 55 | rf.weekly = randomForest(Direction~.-Year-Today, 56 | data=Weekly, 57 | subset=train, 58 | mtry=2) 59 | yhat.bag = predict(rf.weekly, newdata=Weekly[test,]) 60 | table(yhat.bag, Weekly$Direction[test]) 61 | mean(yhat.bag != Weekly$Direction[test]) 62 | ``` 63 | 64 | ## Best performance summary 65 | Boosting resulted in the lowest validation set test error rate. 66 | -------------------------------------------------------------------------------- /ch8/2.Rmd: -------------------------------------------------------------------------------- 1 | Chapter 8: Exercise 2 2 | ===================== 3 | 4 | Based on Algorithm 8.2, the first stump will consist of a split on a single 5 | variable. By induction, the residuals of that first fit will result in a second 6 | stump fit to a another distinct, single variable. (* This is my intuition, not 7 | sure if my proof is rigorous enough to support that claim). 8 | 9 | $f(X) = \sum_{j=1}^{p} f_j(X_j)$ 10 | 11 | 0) $\hat{f}(x) = 0, r_i = y_i$ 12 | 13 | 1) a) $\hat{f}^1(x) = \beta_{1_1} I(X_1 < t_1) + \beta_{0_1}$ 14 | 15 | 1) b) $\hat{f}(x) = \lambda\hat{f}^1(x)$ 16 | 17 | 1) c) $r_i = y_i - \lambda\hat{f}^1(x_i)$ 18 | 19 | To maximize the fit to the residuals, another distinct stump must be fit in the 20 | next and subsequent iterations will each fit $X_j$-distinct stumps. The 21 | following is the jth iteration, where $b=j$: 22 | 23 | j) a) $\hat{f}^j(x) = \beta_{1_j} I(X_j < t_j) + \beta_{0_j}$ 24 | 25 | j) b) $\hat{f}(x) = \lambda\hat{f}^1(X_1) + \dots + \hat{f}^j(X_j) + \dots + 26 | \hat{f}^{p-1}(X_{p-1}) + \hat{f}^p(X_p)$ 27 | 28 | Since each iteration's fit is a distinct variable stump, there are only $p$ 29 | fits based on "j) b)". 30 | 31 | $$f(X) = \sum_{j=1}^{p} f_j(X_j)$$ 32 | -------------------------------------------------------------------------------- /ch8/2.md: -------------------------------------------------------------------------------- 1 | Chapter 8: Exercise 2 2 | ===================== 3 | 4 | Based on Algorithm 8.2, the first stump will consist of a split on a single 5 | variable. By induction, the residuals of that first fit will result in a second 6 | stump fit to a another distinct, single variable. (* This is my intuition, not 7 | sure if my proof is rigorous enough to support that claim). 8 | 9 | $f(X) = \sum_{j=1}^{p} f_j(X_j)$ 10 | 11 | 0) $\hat{f}(x) = 0, r_i = y_i$ 12 | 13 | 1) a) $\hat{f}^1(x) = \beta_{1_1} I(X_1 < t_1) + \beta_{0_1}$ 14 | 15 | 1) b) $\hat{f}(x) = \lambda\hat{f}^1(x)$ 16 | 17 | 1) c) $r_i = y_i - \lambda\hat{f}^1(x_i)$ 18 | 19 | To maximize the fit to the residuals, another distinct stump must be fit in the 20 | next and subsequent iterations will each fit $X_j$-distinct stumps. The 21 | following is the jth iteration, where $b=j$: 22 | 23 | j) a) $\hat{f}^j(x) = \beta_{1_j} I(X_j < t_j) + \beta_{0_j}$ 24 | 25 | j) b) $\hat{f}(x) = \lambda\hat{f}^1(X_1) + \dots + \hat{f}^j(X_j) + \dots + 26 | \hat{f}^{p-1}(X_{p-1}) + \hat{f}^p(X_p)$ 27 | 28 | Since each iteration's fit is a distinct variable stump, there are only $p$ 29 | fits based on "j) b)". 30 | 31 | $$f(X) = \sum_{j=1}^{p} f_j(X_j)$$ 32 | -------------------------------------------------------------------------------- /ch8/3.Rmd: -------------------------------------------------------------------------------- 1 | Chapter 8: Exercise 3 2 | ===================== 3 | 4 | ```{r label="3"} 5 | p = seq(0, 1, .01) 6 | gini = p * (1-p) * 2 7 | entropy = - (p * log(p) + (1-p) * log(1-p)) 8 | class.err = 1 - pmax(p, 1-p) 9 | matplot(p, cbind(gini, entropy, class.err), col=c("red", "green", "blue")) 10 | ``` 11 | -------------------------------------------------------------------------------- /ch8/3.md: -------------------------------------------------------------------------------- 1 | Chapter 8: Exercise 3 2 | ===================== 3 | 4 | 5 | ```r 6 | p = seq(0, 1, 0.01) 7 | gini = p * (1 - p) * 2 8 | entropy = -(p * log(p) + (1 - p) * log(1 - p)) 9 | class.err = 1 - pmax(p, 1 - p) 10 | matplot(p, cbind(gini, entropy, class.err), col = c("red", "green", "blue")) 11 | ``` 12 | 13 | ![plot of chunk 3](figure/3.png) 14 | 15 | -------------------------------------------------------------------------------- /ch8/4.Rmd: -------------------------------------------------------------------------------- 1 | Chapter 8: Exercise 4 2 | ===================== 3 | 4 | ## a 5 | ``` 6 | [X1 < 1] 7 | | | 8 | [X2 < 1] 5 9 | | | 10 | [X1 < 0] 15 11 | | | 12 | 3 [X2<0] 13 | | | 14 | 10 0 15 | ``` 16 | 17 | ## b 18 | ```{r label="4b"} 19 | par(xpd=NA) 20 | plot(NA, NA, type="n", xlim=c(-2,2), ylim=c(-3,3), xlab="X1", ylab="X2") 21 | # X2 < 1 22 | lines(x=c(-2,2), y=c(1,1)) 23 | # X1 < 1 with X2 < 1 24 | lines(x=c(1,1), y=c(-3,1)) 25 | text(x=(-2+1)/2, y=-1, labels=c(-1.80)) 26 | text(x=1.5, y=-1, labels=c(0.63)) 27 | # X2 < 2 with X2 >= 1 28 | lines(x=c(-2,2), y=c(2,2)) 29 | text(x=0, y=2.5, labels=c(2.49)) 30 | # X1 < 0 with X2<2 and X2>=1 31 | lines(x=c(0,0), y=c(1,2)) 32 | text(x=-1, y=1.5, labels=c(-1.06)) 33 | text(x=1, y=1.5, labels=c(0.21)) 34 | ``` -------------------------------------------------------------------------------- /ch8/4.md: -------------------------------------------------------------------------------- 1 | Chapter 8: Exercise 4 2 | ===================== 3 | 4 | ## a 5 | ``` 6 | [X1 < 1] 7 | | | 8 | [X2 < 1] 5 9 | | | 10 | [X1 < 0] 15 11 | | | 12 | 3 [X2<0] 13 | | | 14 | 10 0 15 | ``` 16 | 17 | ## b 18 | 19 | ```r 20 | par(xpd = NA) 21 | plot(NA, NA, type = "n", xlim = c(-2, 2), ylim = c(-3, 3), xlab = "X1", ylab = "X2") 22 | # X2 < 1 23 | lines(x = c(-2, 2), y = c(1, 1)) 24 | # X1 < 1 with X2 < 1 25 | lines(x = c(1, 1), y = c(-3, 1)) 26 | text(x = (-2 + 1)/2, y = -1, labels = c(-1.8)) 27 | text(x = 1.5, y = -1, labels = c(0.63)) 28 | # X2 < 2 with X2 >= 1 29 | lines(x = c(-2, 2), y = c(2, 2)) 30 | text(x = 0, y = 2.5, labels = c(2.49)) 31 | # X1 < 0 with X2<2 and X2>=1 32 | lines(x = c(0, 0), y = c(1, 2)) 33 | text(x = -1, y = 1.5, labels = c(-1.06)) 34 | text(x = 1, y = 1.5, labels = c(0.21)) 35 | ``` 36 | 37 | ![plot of chunk 4b](figure/4b.png) 38 | 39 | -------------------------------------------------------------------------------- /ch8/5.Rmd: -------------------------------------------------------------------------------- 1 | Chapter 8: Exercise 5 2 | ===================== 3 | 4 | ```{r} 5 | p = c(0.1, 0.15, 0.2, 0.2, 0.55, 0.6, 0.6, 0.65, 0.7, 0.75) 6 | ``` 7 | 8 | ## Majority approach 9 | ```{r} 10 | sum(p>=0.5) > sum(p<0.5) 11 | ``` 12 | The number of red predictions is greater than the number of green predictions 13 | based on a 50% threshold, thus RED. 14 | 15 | ## Average approach 16 | ```{r} 17 | mean(p) 18 | ``` 19 | The average of the probabilities is less than the 50% threshold, thus GREEN. 20 | -------------------------------------------------------------------------------- /ch8/5.md: -------------------------------------------------------------------------------- 1 | Chapter 8: Exercise 5 2 | ===================== 3 | 4 | 5 | ```r 6 | p = c(0.1, 0.15, 0.2, 0.2, 0.55, 0.6, 0.6, 0.65, 0.7, 0.75) 7 | ``` 8 | 9 | 10 | ## Majority approach 11 | 12 | ```r 13 | sum(p >= 0.5) > sum(p < 0.5) 14 | ``` 15 | 16 | ``` 17 | ## [1] TRUE 18 | ``` 19 | 20 | The number of red predictions is greater than the number of green predictions 21 | based on a 50% threshold, thus RED. 22 | 23 | ## Average approach 24 | 25 | ```r 26 | mean(p) 27 | ``` 28 | 29 | ``` 30 | ## [1] 0.45 31 | ``` 32 | 33 | The average of the probabilities is less than the 50% threshold, thus GREEN. 34 | -------------------------------------------------------------------------------- /ch8/6.Rmd: -------------------------------------------------------------------------------- 1 | Chapter 8: Exercise 6 2 | ===================== 3 | 4 | ### Provide a detailed explanation of the algorithm that is used to fit a regression tree. 5 | 6 | Read section 8.1.1, including Algorithm 8.1. 7 | -------------------------------------------------------------------------------- /ch8/6.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | Chapter 8: Exercise 6 9 | 10 | 137 | 138 | 139 | 140 | 141 | 142 | 143 | 144 | 145 |

Chapter 8: Exercise 6

146 | 147 |

Provide a detailed explanation of the algorithm that is used to fit a regression tree.

148 | 149 |

Read section 8.1.1, including Algorithm 8.1.

150 | 151 | 152 | 153 | 154 | 155 | -------------------------------------------------------------------------------- /ch8/6.md: -------------------------------------------------------------------------------- 1 | Chapter 8: Exercise 6 2 | ===================== 3 | 4 | ### Provide a detailed explanation of the algorithm that is used to fit a regression tree. 5 | 6 | Read section 8.1.1, including Algorithm 8.1. 7 | -------------------------------------------------------------------------------- /ch8/7.Rmd: -------------------------------------------------------------------------------- 1 | Chapter 8: Exercise 7 2 | ======================================================== 3 | 4 | We will try a range of $\tt{ntree}$ from 1 to 500 and $\tt{mtry}$ taking typical values of $p$, $p/2$, $\sqrt{p}$. For Boston data, $p = 13$. We use an alternate call to $\tt{randomForest}$ which takes $\tt{xtest}$ and $\tt{ytest}$ as additional arguments and computes test MSE on-the-fly. Test MSE of all tree sizes can be obtained by accessing $\tt{mse}$ list member of $\tt{test}$ list member of the model. 5 | 6 | ```{r 9a} 7 | library(MASS) 8 | library(randomForest) 9 | set.seed(1101) 10 | 11 | # Construct the train and test matrices 12 | train = sample(dim(Boston)[1], dim(Boston)[1] / 2) 13 | X.train = Boston[train, -14] 14 | X.test = Boston[-train, -14] 15 | Y.train = Boston[train, 14] 16 | Y.test = Boston[-train, 14] 17 | 18 | p = dim(Boston)[2] - 1 19 | p.2 = p / 2 20 | p.sq = sqrt(p) 21 | 22 | rf.boston.p = randomForest(X.train, Y.train, xtest=X.test, ytest=Y.test, mtry=p, ntree=500) 23 | rf.boston.p.2 = randomForest(X.train, Y.train, xtest=X.test, ytest=Y.test, mtry=p.2, ntree=500) 24 | rf.boston.p.sq = randomForest(X.train, Y.train, xtest=X.test, ytest=Y.test, mtry=p.sq, ntree=500) 25 | 26 | plot(1:500, rf.boston.p$test$mse, col="green", type="l", xlab="Number of Trees", ylab="Test MSE", ylim=c(10, 19)) 27 | lines(1:500, rf.boston.p.2$test$mse, col="red", type="l") 28 | lines(1:500, rf.boston.p.sq$test$mse, col="blue", type="l") 29 | legend("topright", c("m=p", "m=p/2", "m=sqrt(p)"), col=c("green", "red", "blue"), cex=1, lty=1) 30 | ``` 31 | The plot shows that test MSE for single tree is quite high (around 18). It is reduced by adding more trees to the model and stabilizes around a few hundred trees. Test MSE for including all variables at split is slightly higher (around 11) as compared to both using half or square-root number of variables (both slightly less than 10). 32 | 33 | -------------------------------------------------------------------------------- /ch8/7.md: -------------------------------------------------------------------------------- 1 | Chapter 8: Exercise 7 2 | ======================================================== 3 | 4 | We will try a range of $\tt{ntree}$ from 1 to 500 and $\tt{mtry}$ taking typical values of $p$, $p/2$, $\sqrt{p}$. For Boston data, $p = 13$. We use an alternate call to $\tt{randomForest}$ which takes $\tt{xtest}$ and $\tt{ytest}$ as additional arguments and computes test MSE on-the-fly. Test MSE of all tree sizes can be obtained by accessing $\tt{mse}$ list member of $\tt{test}$ list member of the model. 5 | 6 | 7 | ```r 8 | library(MASS) 9 | library(randomForest) 10 | ``` 11 | 12 | ``` 13 | ## randomForest 4.6-7 14 | ## Type rfNews() to see new features/changes/bug fixes. 15 | ``` 16 | 17 | ```r 18 | set.seed(1101) 19 | 20 | # Construct the train and test matrices 21 | train = sample(dim(Boston)[1], dim(Boston)[1]/2) 22 | X.train = Boston[train, -14] 23 | X.test = Boston[-train, -14] 24 | Y.train = Boston[train, 14] 25 | Y.test = Boston[-train, 14] 26 | 27 | p = dim(Boston)[2] - 1 28 | p.2 = p/2 29 | p.sq = sqrt(p) 30 | 31 | rf.boston.p = randomForest(X.train, Y.train, xtest = X.test, ytest = Y.test, 32 | mtry = p, ntree = 500) 33 | rf.boston.p.2 = randomForest(X.train, Y.train, xtest = X.test, ytest = Y.test, 34 | mtry = p.2, ntree = 500) 35 | rf.boston.p.sq = randomForest(X.train, Y.train, xtest = X.test, ytest = Y.test, 36 | mtry = p.sq, ntree = 500) 37 | 38 | plot(1:500, rf.boston.p$test$mse, col = "green", type = "l", xlab = "Number of Trees", 39 | ylab = "Test MSE", ylim = c(10, 19)) 40 | lines(1:500, rf.boston.p.2$test$mse, col = "red", type = "l") 41 | lines(1:500, rf.boston.p.sq$test$mse, col = "blue", type = "l") 42 | legend("topright", c("m=p", "m=p/2", "m=sqrt(p)"), col = c("green", "red", "blue"), 43 | cex = 1, lty = 1) 44 | ``` 45 | 46 | ![plot of chunk 9a](figure/9a.png) 47 | 48 | The plot shows that test MSE for single tree is quite high (around 18). It is reduced by adding more trees to the model and stabilizes around a few hundred trees. Test MSE for including all variables at split is slightly higher (around 11) as compared to both using half or square-root number of variables (both slightly less than 10). 49 | 50 | -------------------------------------------------------------------------------- /ch8/8.Rmd: -------------------------------------------------------------------------------- 1 | Chapter 8: Exercise 8 2 | ======================================================== 3 | 4 | ### a 5 | ```{r 8a} 6 | library(ISLR) 7 | attach(Carseats) 8 | set.seed(1) 9 | 10 | train = sample(dim(Carseats)[1], dim(Carseats)[1] / 2) 11 | Carseats.train = Carseats[train, ] 12 | Carseats.test = Carseats[-train, ] 13 | ``` 14 | 15 | ### b 16 | ```{r b8} 17 | library(tree) 18 | tree.carseats = tree(Sales~., data=Carseats.train) 19 | summary(tree.carseats) 20 | plot(tree.carseats) 21 | text(tree.carseats, pretty=0) 22 | pred.carseats = predict(tree.carseats, Carseats.test) 23 | mean((Carseats.test$Sales - pred.carseats)^2) 24 | ``` 25 | The test MSE is about $4.15$. 26 | 27 | ### c 28 | ```{r 8c} 29 | cv.carseats = cv.tree(tree.carseats, FUN=prune.tree) 30 | par(mfrow=c(1, 2)) 31 | plot(cv.carseats$size, cv.carseats$dev, type="b") 32 | plot(cv.carseats$k, cv.carseats$dev, type="b") 33 | 34 | # Best size = 9 35 | pruned.carseats = prune.tree(tree.carseats, best=9) 36 | par(mfrow=c(1, 1)) 37 | plot(pruned.carseats) 38 | text(pruned.carseats, pretty=0) 39 | pred.pruned = predict(pruned.carseats, Carseats.test) 40 | mean((Carseats.test$Sales - pred.pruned)^2) 41 | ``` 42 | Pruning the tree in this case increases the test MSE to $4.99$. 43 | 44 | ### d 45 | ```{r 9d} 46 | library(randomForest) 47 | bag.carseats = randomForest(Sales~., data=Carseats.train, mtry=10, ntree=500, importance=T) 48 | bag.pred = predict(bag.carseats, Carseats.test) 49 | mean((Carseats.test$Sales - bag.pred)^2) 50 | importance(bag.carseats) 51 | ``` 52 | Bagging improves the test MSE to $2.58$. We also see that $\tt{Price}$, $\tt{ShelveLoc}$ and $\tt{Age}$ are three most important predictors of $\tt{Sale}$. 53 | 54 | ### e 55 | ```{r 9e} 56 | rf.carseats = randomForest(Sales~., data=Carseats.train, mtry=5, ntree=500, importance=T) 57 | rf.pred = predict(rf.carseats, Carseats.test) 58 | mean((Carseats.test$Sales - rf.pred)^2) 59 | importance(rf.carseats) 60 | ``` 61 | In this case, random forest worsens the MSE on test set to $2.87$. Changing $m$ varies test MSE between $2.6$ to $3$. We again see that $\tt{Price}$, $\tt{ShelveLoc}$ and $\tt{Age}$ are three most important predictors of $\tt{Sale}$. -------------------------------------------------------------------------------- /ch8/9.Rmd: -------------------------------------------------------------------------------- 1 | Chapter 8: Exercise 9 2 | ======================================================== 3 | 4 | ### a 5 | ```{r 9a} 6 | library(ISLR) 7 | attach(OJ) 8 | set.seed(1013) 9 | 10 | train = sample(dim(OJ)[1], 800) 11 | OJ.train = OJ[train, ] 12 | OJ.test = OJ[-train, ] 13 | ``` 14 | 15 | ### b 16 | ```{r 9b} 17 | library(tree) 18 | oj.tree = tree(Purchase~., data=OJ.train) 19 | summary(oj.tree) 20 | ``` 21 | The tree only uses two variables: $\tt{LoyalCH}$ and $\tt{PriceDiff}$. It has $7$ terminal nodes. Training error rate (misclassification error) for the tree is $0.155$. 22 | 23 | ### c 24 | ```{r 9c} 25 | oj.tree 26 | ``` 27 | Let's pick terminal node labeled "10)". The splitting variable at this node is $\tt{PriceDiff}$. The splitting value of this node is $0.05$. There are $79$ points in the subtree below this node. The deviance for all points contained in region below this node is $80$. A * in the line denotes that this is in fact a terminal node. The prediction at this node is $\tt{Sales}$ = $\tt{MM}$. About $19$% points in this node have $\tt{CH}$ as value of $\tt{Sales}$. Remaining $81$% points have $\tt{MM}$ as value of $\tt{Sales}$. 28 | 29 | ### d 30 | ```{r 9d} 31 | plot(oj.tree) 32 | text(oj.tree, pretty=0) 33 | ``` 34 | $\tt{LoyalCH}$ is the most important variable of the tree, in fact top 3 nodes contain $\tt{LoyalCH}$. If $\tt{LoyalCH} < 0.27$, the tree predicts $\tt{MM}$. If $\tt{LoyalCH} > 0.76$, the tree predicts $\tt{CH}$. For intermediate values of $\tt{LoyalCH}$, the decision also depends on the value of $\tt{PriceDiff}$. 35 | 36 | ### e 37 | ```{r 9e} 38 | oj.pred = predict(oj.tree, OJ.test, type="class") 39 | table(OJ.test$Purchase, oj.pred) 40 | ``` 41 | 42 | ### f 43 | ```{r 9f} 44 | cv.oj = cv.tree(oj.tree, FUN=prune.tree) 45 | ``` 46 | 47 | ### g 48 | ```{r 9g} 49 | plot(cv.oj$size, cv.oj$dev, type="b", xlab="Tree Size", ylab="Deviance") 50 | ``` 51 | 52 | ### h 53 | Size of 6 gives lowest cross-validation error. 54 | 55 | ### i 56 | ```{r 9i} 57 | oj.pruned = prune.tree(oj.tree, best=6) 58 | ``` 59 | 60 | ### j 61 | ```{r 9j} 62 | summary(oj.pruned) 63 | ``` 64 | Misclassification error of pruned tree is exactly same as that of original tree --- $0.155$. 65 | 66 | ### k 67 | ```{r 9k} 68 | pred.unpruned = predict(oj.tree, OJ.test, type="class") 69 | misclass.unpruned = sum(OJ.test$Purchase != pred.unpruned) 70 | misclass.unpruned / length(pred.unpruned) 71 | pred.pruned = predict(oj.pruned, OJ.test, type="class") 72 | misclass.pruned = sum(OJ.test$Purchase != pred.pruned) 73 | misclass.pruned / length(pred.pruned) 74 | ``` 75 | Pruned and unpruned trees have same test error rate of $0.189$. -------------------------------------------------------------------------------- /ch8/figure/1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch8/figure/1.png -------------------------------------------------------------------------------- /ch8/figure/10c.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch8/figure/10c.png -------------------------------------------------------------------------------- /ch8/figure/10d.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch8/figure/10d.png -------------------------------------------------------------------------------- /ch8/figure/10f.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch8/figure/10f.png -------------------------------------------------------------------------------- /ch8/figure/11b.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch8/figure/11b.png -------------------------------------------------------------------------------- /ch8/figure/3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch8/figure/3.png -------------------------------------------------------------------------------- /ch8/figure/4b.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch8/figure/4b.png -------------------------------------------------------------------------------- /ch8/figure/8c.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch8/figure/8c.png -------------------------------------------------------------------------------- /ch8/figure/8c1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch8/figure/8c1.png -------------------------------------------------------------------------------- /ch8/figure/8c2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch8/figure/8c2.png -------------------------------------------------------------------------------- /ch8/figure/9a.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch8/figure/9a.png -------------------------------------------------------------------------------- /ch8/figure/9d.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch8/figure/9d.png -------------------------------------------------------------------------------- /ch8/figure/9g.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch8/figure/9g.png -------------------------------------------------------------------------------- /ch8/figure/b8.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch8/figure/b8.png -------------------------------------------------------------------------------- /ch8/figure/f.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch8/figure/f.png -------------------------------------------------------------------------------- /ch8/lab.R: -------------------------------------------------------------------------------- 1 | # Chapter 8 Lab: Decision Trees 2 | 3 | # Fitting Classification Trees 4 | 5 | library(tree) 6 | library(ISLR) 7 | attach(Carseats) 8 | High=ifelse(Sales<=8,"No","Yes") 9 | Carseats=data.frame(Carseats,High) 10 | tree.carseats=tree(High~.-Sales,Carseats) 11 | summary(tree.carseats) 12 | plot(tree.carseats) 13 | text(tree.carseats,pretty=0) 14 | tree.carseats 15 | set.seed(2) 16 | train=sample(1:nrow(Carseats), 200) 17 | Carseats.test=Carseats[-train,] 18 | High.test=High[-train] 19 | tree.carseats=tree(High~.-Sales,Carseats,subset=train) 20 | tree.pred=predict(tree.carseats,Carseats.test,type="class") 21 | table(tree.pred,High.test) 22 | (86+57)/200 23 | set.seed(3) 24 | cv.carseats=cv.tree(tree.carseats,FUN=prune.misclass) 25 | names(cv.carseats) 26 | cv.carseats 27 | par(mfrow=c(1,2)) 28 | plot(cv.carseats$size,cv.carseats$dev,type="b") 29 | plot(cv.carseats$k,cv.carseats$dev,type="b") 30 | prune.carseats=prune.misclass(tree.carseats,best=9) 31 | plot(prune.carseats) 32 | text(prune.carseats,pretty=0) 33 | tree.pred=predict(prune.carseats,Carseats.test,type="class") 34 | table(tree.pred,High.test) 35 | (94+60)/200 36 | prune.carseats=prune.misclass(tree.carseats,best=15) 37 | plot(prune.carseats) 38 | text(prune.carseats,pretty=0) 39 | tree.pred=predict(prune.carseats,Carseats.test,type="class") 40 | table(tree.pred,High.test) 41 | (86+62)/200 42 | 43 | # Fitting Regression Trees 44 | 45 | library(MASS) 46 | set.seed(1) 47 | train = sample(1:nrow(Boston), nrow(Boston)/2) 48 | tree.boston=tree(medv~.,Boston,subset=train) 49 | summary(tree.boston) 50 | plot(tree.boston) 51 | text(tree.boston,pretty=0) 52 | cv.boston=cv.tree(tree.boston) 53 | plot(cv.boston$size,cv.boston$dev,type='b') 54 | prune.boston=prune.tree(tree.boston,best=5) 55 | plot(prune.boston) 56 | text(prune.boston,pretty=0) 57 | yhat=predict(tree.boston,newdata=Boston[-train,]) 58 | boston.test=Boston[-train,"medv"] 59 | plot(yhat,boston.test) 60 | abline(0,1) 61 | mean((yhat-boston.test)^2) 62 | 63 | # Bagging and Random Forests 64 | 65 | library(randomForest) 66 | set.seed(1) 67 | bag.boston=randomForest(medv~.,data=Boston,subset=train,mtry=13,importance=TRUE) 68 | bag.boston 69 | yhat.bag = predict(bag.boston,newdata=Boston[-train,]) 70 | plot(yhat.bag, boston.test) 71 | abline(0,1) 72 | mean((yhat.bag-boston.test)^2) 73 | bag.boston=randomForest(medv~.,data=Boston,subset=train,mtry=13,ntree=25) 74 | yhat.bag = predict(bag.boston,newdata=Boston[-train,]) 75 | mean((yhat.bag-boston.test)^2) 76 | set.seed(1) 77 | rf.boston=randomForest(medv~.,data=Boston,subset=train,mtry=6,importance=TRUE) 78 | yhat.rf = predict(rf.boston,newdata=Boston[-train,]) 79 | mean((yhat.rf-boston.test)^2) 80 | importance(rf.boston) 81 | varImpPlot(rf.boston) 82 | 83 | # Boosting 84 | 85 | library(gbm) 86 | set.seed(1) 87 | boost.boston=gbm(medv~.,data=Boston[train,],distribution="gaussian",n.trees=5000,interaction.depth=4) 88 | summary(boost.boston) 89 | par(mfrow=c(1,2)) 90 | plot(boost.boston,i="rm") 91 | plot(boost.boston,i="lstat") 92 | yhat.boost=predict(boost.boston,newdata=Boston[-train,],n.trees=5000) 93 | mean((yhat.boost-boston.test)^2) 94 | boost.boston=gbm(medv~.,data=Boston[train,],distribution="gaussian",n.trees=5000,interaction.depth=4,shrinkage=0.2,verbose=F) 95 | yhat.boost=predict(boost.boston,newdata=Boston[-train,],n.trees=5000) 96 | mean((yhat.boost-boston.test)^2) 97 | 98 | -------------------------------------------------------------------------------- /ch9/1.Rmd: -------------------------------------------------------------------------------- 1 | Chapter 9: Exercise 1 2 | ===================== 3 | 4 | ```{r 1} 5 | x1 = -10:10 6 | x2 = 1 + 3 * x1 7 | plot(x1, x2, type="l", col="red") 8 | text(c(0), c(-20), "greater than 0", col="red") 9 | text(c(0), c(20), "less than 0", col="red") 10 | lines(x1, 1 - x1/2) 11 | text(c(0), c(-15), "less than 0") 12 | text(c(0), c(15), "greater than 0") 13 | ``` -------------------------------------------------------------------------------- /ch9/1.md: -------------------------------------------------------------------------------- 1 | Chapter 9: Exercise 1 2 | ===================== 3 | 4 | 5 | ```r 6 | x1 = -10:10 7 | x2 = 1 + 3 * x1 8 | plot(x1, x2, type = "l", col = "red") 9 | text(c(0), c(-20), "greater than 0", col = "red") 10 | text(c(0), c(20), "less than 0", col = "red") 11 | lines(x1, 1 - x1/2) 12 | text(c(0), c(-15), "less than 0") 13 | text(c(0), c(15), "greater than 0") 14 | ``` 15 | 16 | ![plot of chunk 1](figure/1.png) 17 | 18 | -------------------------------------------------------------------------------- /ch9/2.Rmd: -------------------------------------------------------------------------------- 1 | Chapter 9: Exercise 2 2 | ===================== 3 | 4 | $(1+X_1)^2 + (2-X_2)^2 = 4$ is a circle with radius 2 and center (-1, 2). 5 | 6 | ## a 7 | ```{r 2a} 8 | radius = 2 9 | plot(NA, NA, type="n", xlim=c(-4,2), ylim=c(-1,5), asp=1, xlab="X1", ylab="X2") 10 | symbols(c(-1), c(2), circles=c(radius), add=TRUE, inches=FALSE) 11 | ``` 12 | 13 | ## b 14 | ```{r 2b} 15 | radius = 2 16 | plot(NA, NA, type="n", xlim=c(-4,2), ylim=c(-1,5), asp=1, xlab="X1", ylab="X2") 17 | symbols(c(-1), c(2), circles=c(radius), add=TRUE, inches=FALSE) 18 | text(c(-1), c(2), "< 4") 19 | text(c(-4), c(2), "> 4") 20 | ``` 21 | 22 | ## c 23 | To restate the boundary, outside the circle is blue, inside and on is red. 24 | 25 | ```{r 2c} 26 | radius = 2 27 | plot(c(0, -1, 2, 3), c(0, 1, 2, 8), col=c("blue", "red", "blue", "blue"), 28 | type="p", asp=1, xlab="X1", ylab="X2") 29 | symbols(c(-1), c(2), circles=c(radius), add=TRUE, inches=FALSE) 30 | ``` 31 | 32 | ## d 33 | The decision boundary is a sum of quadratic terms when expanded. 34 | 35 | $$ 36 | (1+X_1)^2 + (2-X_2)^2 > 4 \\ 37 | 1 + 2 X_1 + X_1^2 + 4 - 4 X_2 + X_2^2 > 4 \\ 38 | 5 + 2 X_1 - 4 X_2 + X_1^2 + X_2^2 > 4 39 | $$ -------------------------------------------------------------------------------- /ch9/2.md: -------------------------------------------------------------------------------- 1 | Chapter 9: Exercise 2 2 | ===================== 3 | 4 | $(1+X_1)^2 + (2-X_2)^2 = 4$ is a circle with radius 2 and center (-1, 2). 5 | 6 | ## a 7 | 8 | ```r 9 | radius = 2 10 | plot(NA, NA, type = "n", xlim = c(-4, 2), ylim = c(-1, 5), asp = 1, xlab = "X1", 11 | ylab = "X2") 12 | symbols(c(-1), c(2), circles = c(radius), add = TRUE, inches = FALSE) 13 | ``` 14 | 15 | ![plot of chunk 2a](figure/2a.png) 16 | 17 | 18 | ## b 19 | 20 | ```r 21 | radius = 2 22 | plot(NA, NA, type = "n", xlim = c(-4, 2), ylim = c(-1, 5), asp = 1, xlab = "X1", 23 | ylab = "X2") 24 | symbols(c(-1), c(2), circles = c(radius), add = TRUE, inches = FALSE) 25 | text(c(-1), c(2), "< 4") 26 | text(c(-4), c(2), "> 4") 27 | ``` 28 | 29 | ![plot of chunk 2b](figure/2b.png) 30 | 31 | 32 | ## c 33 | To restate the boundary, outside the circle is blue, inside and on is red. 34 | 35 | 36 | ```r 37 | radius = 2 38 | plot(c(0, -1, 2, 3), c(0, 1, 2, 8), col = c("blue", "red", "blue", "blue"), 39 | type = "p", asp = 1, xlab = "X1", ylab = "X2") 40 | symbols(c(-1), c(2), circles = c(radius), add = TRUE, inches = FALSE) 41 | ``` 42 | 43 | ![plot of chunk 2c](figure/2c.png) 44 | 45 | 46 | ## d 47 | The decision boundary is a sum of quadratic terms when expanded. 48 | 49 | $$ 50 | (1+X_1)^2 + (2-X_2)^2 > 4 \\ 51 | 1 + 2 X_1 + X_1^2 + 4 - 4 X_2 + X_2^2 > 4 \\ 52 | 5 + 2 X_1 - 4 X_2 + X_1^2 + X_2^2 > 4 53 | $$ 54 | -------------------------------------------------------------------------------- /ch9/3.Rmd: -------------------------------------------------------------------------------- 1 | Chapter 9: Exercise 3 2 | ===================== 3 | 4 | ## a 5 | 6 | ```{r 3a} 7 | x1 = c(3,2,4,1,2,4,4) 8 | x2 = c(4,2,4,4,1,3,1) 9 | colors = c("red", "red", "red", "red", "blue", "blue", "blue") 10 | plot(x1,x2,col=colors,xlim=c(0,5),ylim=c(0,5)) 11 | ``` 12 | 13 | ## b 14 | The maximal margin classifier has to be in between observations #2, #3 and #5, #6. 15 | 16 | $$ 17 | (2,2), (4,4) \\ 18 | (2,1), (4,3) \\ 19 | => (2,1.5), (4,3.5) \\ 20 | b = (3.5 - 1.5) / (4 - 2) = 1 \\ 21 | a = X_2 - X_1 = 1.5 - 2 = -0.5 22 | $$ 23 | 24 | ```{r 3b} 25 | plot(x1,x2,col=colors,xlim=c(0,5),ylim=c(0,5)) 26 | abline(-0.5, 1) 27 | ``` 28 | 29 | ## c 30 | $0.5 - X_1 + X_2 > 0$ 31 | 32 | ## d 33 | ```{r 3d} 34 | plot(x1,x2,col=colors,xlim=c(0,5),ylim=c(0,5)) 35 | abline(-0.5, 1) 36 | abline(-1, 1, lty=2) 37 | abline(0, 1, lty=2) 38 | ``` 39 | 40 | ## e 41 | ```{r 3e} 42 | plot(x1,x2,col=colors,xlim=c(0,5),ylim=c(0,5)) 43 | abline(-0.5, 1) 44 | arrows(2,1,2,1.5) 45 | arrows(2,2,2,1.5) 46 | arrows(4,4,4,3.5) 47 | arrows(4,3,4,3.5) 48 | ``` 49 | 50 | ## f 51 | A slight movement of observation #7 (4,1) blue would not have an effect on the 52 | maximal margin hyperplane since its movement would be outside of the margin. 53 | 54 | ## g 55 | ```{r 3g} 56 | plot(x1,x2,col=colors,xlim=c(0,5),ylim=c(0,5)) 57 | abline(-0.8, 1) 58 | ``` 59 | $-0.8 - X_1 + X_2 > 0$ 60 | 61 | ## h 62 | ```{r 3h} 63 | plot(x1,x2,col=colors,xlim=c(0,5),ylim=c(0,5)) 64 | points(c(4), c(2), col=c("red")) 65 | ``` 66 | -------------------------------------------------------------------------------- /ch9/3.md: -------------------------------------------------------------------------------- 1 | Chapter 9: Exercise 3 2 | ===================== 3 | 4 | ## a 5 | 6 | 7 | ```r 8 | x1 = c(3, 2, 4, 1, 2, 4, 4) 9 | x2 = c(4, 2, 4, 4, 1, 3, 1) 10 | colors = c("red", "red", "red", "red", "blue", "blue", "blue") 11 | plot(x1, x2, col = colors, xlim = c(0, 5), ylim = c(0, 5)) 12 | ``` 13 | 14 | ![plot of chunk 3a](figure/3a.png) 15 | 16 | 17 | ## b 18 | The maximal margin classifier has to be in between observations #2, #3 and #5, #6. 19 | 20 | $$ 21 | (2,2), (4,4) \\ 22 | (2,1), (4,3) \\ 23 | => (2,1.5), (4,3.5) \\ 24 | b = (3.5 - 1.5) / (4 - 2) = 1 \\ 25 | a = X_2 - X_1 = 1.5 - 2 = -0.5 26 | $$ 27 | 28 | 29 | ```r 30 | plot(x1, x2, col = colors, xlim = c(0, 5), ylim = c(0, 5)) 31 | abline(-0.5, 1) 32 | ``` 33 | 34 | ![plot of chunk 3b](figure/3b.png) 35 | 36 | 37 | ## c 38 | $0.5 - X_1 + X_2 > 0$ 39 | 40 | ## d 41 | 42 | ```r 43 | plot(x1, x2, col = colors, xlim = c(0, 5), ylim = c(0, 5)) 44 | abline(-0.5, 1) 45 | abline(-1, 1, lty = 2) 46 | abline(0, 1, lty = 2) 47 | ``` 48 | 49 | ![plot of chunk 3d](figure/3d.png) 50 | 51 | 52 | ## e 53 | 54 | ```r 55 | plot(x1, x2, col = colors, xlim = c(0, 5), ylim = c(0, 5)) 56 | abline(-0.5, 1) 57 | arrows(2, 1, 2, 1.5) 58 | arrows(2, 2, 2, 1.5) 59 | arrows(4, 4, 4, 3.5) 60 | arrows(4, 3, 4, 3.5) 61 | ``` 62 | 63 | ![plot of chunk 3e](figure/3e.png) 64 | 65 | 66 | ## f 67 | A slight movement of observation #7 (4,1) blue would not have an effect on the 68 | maximal margin hyperplane since its movement would be outside of the margin. 69 | 70 | ## g 71 | 72 | ```r 73 | plot(x1, x2, col = colors, xlim = c(0, 5), ylim = c(0, 5)) 74 | abline(-0.8, 1) 75 | ``` 76 | 77 | ![plot of chunk 3g](figure/3g.png) 78 | 79 | $-0.8 - X_1 + X_2 > 0$ 80 | 81 | ## h 82 | 83 | ```r 84 | plot(x1, x2, col = colors, xlim = c(0, 5), ylim = c(0, 5)) 85 | points(c(4), c(2), col = c("red")) 86 | ``` 87 | 88 | ![plot of chunk 3h](figure/3h.png) 89 | 90 | -------------------------------------------------------------------------------- /ch9/4.Rmd: -------------------------------------------------------------------------------- 1 | Chapter 9: Exercise 4 2 | ======================================================== 3 | 4 | We create a random initial dataset which lies along the parabola $y = 3*x^2 + 4$. We then separate the two classes by translating them along Y-axis. 5 | 6 | ```{r 4a} 7 | set.seed(131) 8 | x = rnorm(100) 9 | y = 3 * x^2 + 4 + rnorm(100) 10 | train = sample(100, 50) 11 | y[train] = y[train] + 3 12 | y[-train] = y[-train] - 3 13 | # Plot using different colors 14 | plot(x[train], y[train], pch="+", lwd=4, col="red", ylim=c(-4, 20), xlab="X", ylab="Y") 15 | points(x[-train], y[-train], pch="o", lwd=4, col="blue") 16 | ``` 17 | 18 | The plot clearly shows non-linear separation. We now create both train and test dataframes by taking half of positive and negative classes and creating a new `z` vector of 0 and 1 for classes. 19 | ```{r 4b} 20 | set.seed(315) 21 | z = rep(0, 100) 22 | z[train] = 1 23 | # Take 25 observations each from train and -train 24 | final.train = c(sample(train, 25), sample(setdiff(1:100, train), 25)) 25 | data.train = data.frame(x=x[final.train], y=y[final.train], z=as.factor(z[final.train])) 26 | data.test = data.frame(x=x[-final.train], y=y[-final.train], z=as.factor(z[-final.train])) 27 | library(e1071) 28 | svm.linear = svm(z~., data=data.train, kernel="linear", cost=10) 29 | plot(svm.linear, data.train) 30 | table(z[final.train], predict(svm.linear, data.train)) 31 | ``` 32 | The plot shows the linear boundary. The classifier makes $10$ classification errors on train data. 33 | 34 | Next, we train an SVM with polynomial kernel 35 | ```{r 4c} 36 | set.seed(32545) 37 | svm.poly = svm(z~., data=data.train, kernel="polynomial", cost=10) 38 | plot(svm.poly, data.train) 39 | table(z[final.train], predict(svm.poly, data.train)) 40 | ``` 41 | This is a default polynomial kernel with degree 3. It makes $15$ errors on train data. 42 | 43 | Finally, we train an SVM with radial basis kernel with gamma of 1. 44 | ```{r 4d} 45 | set.seed(996) 46 | svm.radial = svm(z~., data=data.train, kernel="radial", gamma=1, cost=10) 47 | plot(svm.radial, data.train) 48 | table(z[final.train], predict(svm.radial, data.train)) 49 | ``` 50 | This classifier perfectly classifies train data!. 51 | 52 | Here are how the test errors look like. 53 | ```{r 4e} 54 | plot(svm.linear, data.test) 55 | plot(svm.poly, data.test) 56 | plot(svm.radial, data.test) 57 | table(z[-final.train], predict(svm.linear, data.test)) 58 | table(z[-final.train], predict(svm.poly, data.test)) 59 | table(z[-final.train], predict(svm.radial, data.test)) 60 | ``` 61 | The tables show that linear, polynomial and radial basis kernels classify 6, 14, and 0 test points incorrectly respectively. Radial basis kernel is the best and has a zero test misclassification error. -------------------------------------------------------------------------------- /ch9/7.Rmd: -------------------------------------------------------------------------------- 1 | Chapter 9: Exercise 7 2 | ======================================================== 3 | 4 | ### a 5 | ```{r} 6 | library(ISLR) 7 | gas.med = median(Auto$mpg) 8 | new.var = ifelse(Auto$mpg > gas.med, 1, 0) 9 | Auto$mpglevel = as.factor(new.var) 10 | ``` 11 | 12 | ### b 13 | ```{r} 14 | library(e1071) 15 | set.seed(3255) 16 | tune.out = tune(svm, mpglevel~., data=Auto, kernel="linear", ranges=list(cost=c(0.01, 0.1, 1, 5, 10, 100))) 17 | summary(tune.out) 18 | ``` 19 | We see that cross-validation error is minimized for $\tt{cost}=1$. 20 | 21 | ### c 22 | ```{r} 23 | set.seed(21) 24 | tune.out = tune(svm, mpglevel~., data=Auto, kernel="polynomial", ranges=list(cost=c(0.1, 1, 5, 10), degree=c(2, 3, 4))) 25 | summary(tune.out) 26 | ``` 27 | The lowest cross-validation error is obtained for $\tt{cost} = 10$ and $\tt{degree} = 2$. 28 | 29 | ```{r} 30 | set.seed(463) 31 | tune.out = tune(svm, mpglevel~., data=Auto, kernel="radial", ranges=list(cost=c(0.1, 1, 5, 10), gamma=c(0.01, 0.1, 1, 5, 10, 100))) 32 | summary(tune.out) 33 | ``` 34 | Finally, for radial basis kernel, $\tt{cost} = 10$ and $\tt{gamma} = 0.01$. 35 | 36 | ### d 37 | ```{r 7d} 38 | svm.linear = svm(mpglevel~., data=Auto, kernel="linear", cost=1) 39 | svm.poly = svm(mpglevel~., data=Auto, kernel="polynomial", cost=10, degree=2) 40 | svm.radial = svm(mpglevel~., data=Auto, kernel="radial", cost=10, gamma=0.01) 41 | plotpairs = function(fit){ 42 | for (name in names(Auto)[!(names(Auto) %in% c("mpg", "mpglevel","name"))]) { 43 | plot(fit, Auto, as.formula(paste("mpg~", name, sep=""))) 44 | } 45 | } 46 | plotpairs(svm.linear) 47 | plotpairs(svm.poly) 48 | plotpairs(svm.radial) 49 | ``` 50 | -------------------------------------------------------------------------------- /ch9/9.R: -------------------------------------------------------------------------------- 1 | ## Online course quiz question: 9.R 2 | ## Explanation: Logistic regression is similar to SVM with a linear kernel. 3 | 4 | library(MASS) 5 | svm_error <- function() { 6 | # 1) generate a random training sample to train on + fit 7 | 8 | # build training set 9 | x0 = mvrnorm(50,rep(0,10),diag(10)) 10 | x1 = mvrnorm(50,rep(c(1,0),c(5,5)),diag(10)) 11 | train = rbind(x0,x1) 12 | classes = rep(c(0,1),c(50,50)) 13 | dat=data.frame(train,classes=as.factor(classes)) 14 | 15 | # fit 16 | # svmfit=svm(classes~.,data=dat,kernel="linear") 17 | svmfit = glm(classes~., data=dat, family="binomial") 18 | 19 | # 2) evaluate the number of mistakes we make on a large test set = 1000 samples 20 | test_x0 = mvrnorm(500,rep(0,10),diag(10)) 21 | test_x1 = mvrnorm(500,rep(c(1,0),c(5,5)),diag(10)) 22 | test = rbind(test_x0,test_x1) 23 | test_classes = rep(c(0,1),c(500,500)) 24 | test_dat = data.frame(test,test_classes=as.factor(test_classes)) 25 | fit = predict(svmfit,test_dat) 26 | fit = ifelse(fit < 0.5, 0, 1) 27 | error = sum(fit != test_dat$test_classes)/1000 28 | 29 | return(error) 30 | } 31 | 32 | # 3) repeat (1-2) many times and averaging the error rate for each trial 33 | errors = replicate(1000, svm_error()) 34 | 35 | print(mean(errors)) 36 | -------------------------------------------------------------------------------- /ch9/9.R.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch9/9.R.png -------------------------------------------------------------------------------- /ch9/R_exercise_video.R: -------------------------------------------------------------------------------- 1 | n.samples = 300 2 | y = sample(c(0, 1), n.samples, replace=T) 3 | x = matrix(rep(0, n.samples * 10), ncol=10) 4 | 5 | for (i in 1:n.samples) { 6 | if (y[i] == 0) 7 | x[i, ] = rnorm(10) 8 | else 9 | x[i, ] = rnorm(10, mean=c(1, 1, 1, 1, 1, 0, 0, 0, 0, 0)) 10 | } 11 | 12 | total.0 = seq(1, n.samples)[y == 0] 13 | total.1 = seq(1, n.samples)[y == 1] 14 | train.0 = sample(total.0, 50) 15 | train.1 = sample(total.1, 50) 16 | train = c(train.0, train.1) 17 | 18 | library(e1071) 19 | 20 | dat = data.frame(x=x, y=as.factor(y)) 21 | svm.fit = svm(y~., data=dat[train, ], kernel="linear") 22 | svm.pred = predict(svm.fit, dat[-train, ]) 23 | mean(dat[-train, "y"] != svm.pred) 24 | 25 | glm.fit = glm(y~., dat[train, ], family=binomial) 26 | glm.prob = predict(glm.fit, dat[-train, ], type="response") 27 | glm.pred = ifelse(glm.prob > 0.5, 1, 0) 28 | sum(dat[-train, "y"] != glm.pred) -------------------------------------------------------------------------------- /ch9/figure/1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch9/figure/1.png -------------------------------------------------------------------------------- /ch9/figure/2a.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch9/figure/2a.png -------------------------------------------------------------------------------- /ch9/figure/2b.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch9/figure/2b.png -------------------------------------------------------------------------------- /ch9/figure/2c.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch9/figure/2c.png -------------------------------------------------------------------------------- /ch9/figure/3a.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch9/figure/3a.png -------------------------------------------------------------------------------- /ch9/figure/3b.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch9/figure/3b.png -------------------------------------------------------------------------------- /ch9/figure/3d.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch9/figure/3d.png -------------------------------------------------------------------------------- /ch9/figure/3e.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch9/figure/3e.png -------------------------------------------------------------------------------- /ch9/figure/3g.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch9/figure/3g.png -------------------------------------------------------------------------------- /ch9/figure/3h.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch9/figure/3h.png -------------------------------------------------------------------------------- /ch9/figure/4a.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch9/figure/4a.png -------------------------------------------------------------------------------- /ch9/figure/4b.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch9/figure/4b.png -------------------------------------------------------------------------------- /ch9/figure/4c.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch9/figure/4c.png -------------------------------------------------------------------------------- /ch9/figure/4d.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch9/figure/4d.png -------------------------------------------------------------------------------- /ch9/figure/4e1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch9/figure/4e1.png -------------------------------------------------------------------------------- /ch9/figure/4e2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch9/figure/4e2.png -------------------------------------------------------------------------------- /ch9/figure/4e3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch9/figure/4e3.png -------------------------------------------------------------------------------- /ch9/figure/5b.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch9/figure/5b.png -------------------------------------------------------------------------------- /ch9/figure/5d.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch9/figure/5d.png -------------------------------------------------------------------------------- /ch9/figure/5f.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch9/figure/5f.png -------------------------------------------------------------------------------- /ch9/figure/5g.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch9/figure/5g.png -------------------------------------------------------------------------------- /ch9/figure/5h.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch9/figure/5h.png -------------------------------------------------------------------------------- /ch9/figure/6a.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch9/figure/6a.png -------------------------------------------------------------------------------- /ch9/figure/6c.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch9/figure/6c.png -------------------------------------------------------------------------------- /ch9/figure/7d1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch9/figure/7d1.png -------------------------------------------------------------------------------- /ch9/figure/7d10.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch9/figure/7d10.png -------------------------------------------------------------------------------- /ch9/figure/7d11.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch9/figure/7d11.png -------------------------------------------------------------------------------- /ch9/figure/7d12.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch9/figure/7d12.png -------------------------------------------------------------------------------- /ch9/figure/7d13.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch9/figure/7d13.png -------------------------------------------------------------------------------- /ch9/figure/7d14.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch9/figure/7d14.png -------------------------------------------------------------------------------- /ch9/figure/7d15.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch9/figure/7d15.png -------------------------------------------------------------------------------- /ch9/figure/7d16.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch9/figure/7d16.png -------------------------------------------------------------------------------- /ch9/figure/7d17.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch9/figure/7d17.png -------------------------------------------------------------------------------- /ch9/figure/7d18.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch9/figure/7d18.png -------------------------------------------------------------------------------- /ch9/figure/7d19.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch9/figure/7d19.png -------------------------------------------------------------------------------- /ch9/figure/7d2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch9/figure/7d2.png -------------------------------------------------------------------------------- /ch9/figure/7d20.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch9/figure/7d20.png -------------------------------------------------------------------------------- /ch9/figure/7d21.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch9/figure/7d21.png -------------------------------------------------------------------------------- /ch9/figure/7d3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch9/figure/7d3.png -------------------------------------------------------------------------------- /ch9/figure/7d4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch9/figure/7d4.png -------------------------------------------------------------------------------- /ch9/figure/7d5.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch9/figure/7d5.png -------------------------------------------------------------------------------- /ch9/figure/7d6.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch9/figure/7d6.png -------------------------------------------------------------------------------- /ch9/figure/7d7.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch9/figure/7d7.png -------------------------------------------------------------------------------- /ch9/figure/7d8.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch9/figure/7d8.png -------------------------------------------------------------------------------- /ch9/figure/7d9.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch9/figure/7d9.png -------------------------------------------------------------------------------- /ch9/figure/9a.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/asadoughi/stat-learning/8ccf543576abdca49132c36cd6a990c88c2e6490/ch9/figure/9a.png -------------------------------------------------------------------------------- /index.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 |

An Introduction to Statistical Learning Unofficial Solutions

5 | 6 |

7 | Fork the solutions!
8 | Twitter me @princehonest
9 | Official book website
10 |

11 | 12 |

13 | Check out Github issues and repo for the latest updates. 14 |

15 | 16 | 38 | 39 | 40 | 41 | --------------------------------------------------------------------------------