├── Convergence ├── Convergence.Rmd └── Convergence.nb.html ├── Gaussian_processes_I ├── .DS_Store ├── .Rapp.history ├── .Rhistory ├── GPs.Rmd ├── GPs.nb.html ├── Visualisations.Rmd ├── kplr003632418-2009131105131_llc.csv ├── method1.stan ├── method2.stan ├── predict_gauss.stan └── sim_gp.stan ├── Heteroskedasticity ├── .DS_Store ├── .Rapp.history ├── EuStockMarket.csv ├── Heteroskedasticity.Rmd ├── Heteroskedasticity.nb.html ├── arch.stan ├── garch.stan ├── heto_model.stan └── homo_model.stan ├── Hierarchical ├── .Rapp.history ├── .Rhistory ├── Hierarchical.Rmd ├── Hierarchical.nb.html ├── hier_model.stan ├── non_hier_model.stan ├── non_hier_model_pool.stan └── premiereleague.csv ├── Linear_regression_I ├── .ipynb_checkpoints │ └── Linear_regression_I-checkpoint.ipynb ├── Linear_regression_I.Rmd ├── Linear_regression_I.ipynb ├── Linear_regression_I.nb.html ├── data.csv └── model.stan ├── README.md └── Split_testing_I ├── .Rhistory ├── Split_testing_I.Rmd ├── Split_testing_I.nb.html ├── data.txt ├── data2.txt └── my_model.stan /Convergence/Convergence.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Have I converged" 3 | output: html_notebook 4 | --- 5 | 6 | 7 | This notebook accompanies the youtube tutorial for Stan: https://youtu.be/0FdMZwIbJ_4 8 | 9 | ```{r} 10 | require(rstan) 11 | ``` 12 | 13 | ### Law of large numbers (LLN) 14 | The law of large numbers says that the sample mean will approach the expectation value as the sample size approaches large values. For example a coin toss, where heads is 0, and tails is 1, we can see that for a small sample of 10 tosses, we dont necessarily expect the mean tto be 0.5: 15 | 16 | ```{r} 17 | tosses = rbinom(n=10, size=1, prob=0.5) 18 | mean(tosses) 19 | ``` 20 | 21 | But with more flips, the sample mean will approach the expectation 22 | 23 | ```{r} 24 | tosses = rbinom(5000, 1, 0.5) 25 | means = sapply(1:5000, function(x) mean(tosses[1:x])) 26 | plot(means, ty='l', ylim=c(0,1), xlab='toss', ylab='mean of tosses') 27 | abline(h=0.5, lty='dashed') 28 | ``` 29 | 30 | ### The central limit theorem (CLT) 31 | The central limit theorem says that with sufficiently large independent and random samples, the distribution of sample means will be approximately normal distributed, which justifies the common use of a gaussian distribution for the mean of a large collection of data. For example lets look at 10,000 sets of 1,000 samples from a poisson and exponetial distrbutions. You will find that the sample means will be gaussian distributed about the expectation, with a standard deviation equal to the square root of N. 32 | 33 | ```{r,fig.width=5, fig.height=2.5} 34 | par(mfrow=c(1,2)) 35 | 36 | #get samples 37 | sample_pois = sapply(1:10000, function(x) rpois(1000, 0.6)) 38 | 39 | sample_exp = sapply(1:10000, function(x) rexp(1000, 0.6)) 40 | 41 | #calculate means of samples 42 | mean_pois = colMeans(sample_pois) 43 | mean_exp = colMeans(sample_exp) 44 | 45 | #plot distributions 46 | plot(density(mean_pois), xlab='sample mean', ylab='density', main='Poisson distribution') 47 | lines(x = seq(0.5,0.8,0.005), dnorm(x = seq(0.5,0.8,0.005), mean=mean(sample_pois), sd=sd(sample_pois)/sqrt(1000)), col='red', lty='dotted') #plot the gaussian approximation 48 | legend('topright', legend = c('true', 'gaussian'), col=c('black', 'red'), lty=c('solid', 'dotted'), bty='n') 49 | 50 | 51 | plot(density(mean_exp), xlab='sample mean', ylab='', main='Exponential distribution') 52 | lines(x = seq(1.5,1.9,0.01), dnorm(x = seq(1.5,1.9,0.01), mean=mean(sample_exp), sd=sd(sample_exp)/sqrt(1000)), col='red', lty='dotted') #plot the gaussian approximation 53 | ``` 54 | 55 | 56 | As an example let's look at the 8 schools dataset from Section 5.5 of Gelman et al. (2003). Here we have J=8 schools, that are used in a study to see the effectiveness of coaching on the improvement of students in an aptitude test. Each school has an estimated treatment effect y and standard error on the effect estimate sigma. 57 | 58 | ```{r} 59 | schools_dat <- list(J = 8, 60 | y = c(28, 8, -3, 7, -1, 1, 18, 12), 61 | sigma = c(15, 10, 16, 11, 9, 11, 10, 18)) 62 | 63 | ``` 64 | 65 | The model is a hiearchical model, we're not going to be discussing hierarchical models in this video, so I wont go into detail but as usual you have a data block with the number of schools, estimated treatment effects and error on those estimates. We're interested in the true treatment effect of each individual school and the population treatment effect across all of the schools. 66 | 67 | ```{r} 68 | schools_code = " 69 | data { 70 | int J; // number of schools 71 | vector[J] y; // observed treatment effect 72 | vector[J] sigma; // uncertainty on observed effect 73 | } 74 | 75 | parameters { 76 | vector[J] theta; // true treatment effect 77 | real mu; // population treatment effect 78 | real tau; //population scale 79 | } 80 | 81 | model { 82 | //(hyper) priors 83 | mu ~ normal(0,5); 84 | tau ~ cauchy(0,5); 85 | 86 | theta ~ normal(mu,tau); 87 | 88 | //likelihood 89 | y ~ normal(theta, sigma); 90 | } 91 | " 92 | ``` 93 | 94 | 95 | Lets first fit the model with 4 chains 96 | ```{r} 97 | fit = stan(model_code = schools_code, data=schools_dat, chains = 4,iter = 20) 98 | ``` 99 | 100 | ## Scatter plots 101 | 102 | Extract the parameters of the fit and plot them 103 | ```{r} 104 | params=extract(fit, permuted=FALSE, inc_warmup=TRUE) 105 | ``` 106 | 107 | ```{r} 108 | plot(c(-5,25), c(-5,25), ty='n', xlab='mu', ylab='tau') 109 | lines(params[,'chain:1','mu'], params[,'chain:1','tau'], col='black',ty='o', pch=20) 110 | lines(params[,'chain:2','mu'], params[,'chain:2','tau'], col='orange',ty='o', pch=20) 111 | lines(params[,'chain:3','mu'], params[,'chain:3','tau'], col='red',ty='o', pch=20) 112 | lines(params[,'chain:4','mu'], params[,'chain:4','tau'], col='gray',ty='o', pch=20) 113 | legend('topright', legend=c('chain1', 'chain2', 'chain3', 'chain4'), col=c('black', 'orange', 'red', 'gray'), lty='solid', bty='n') 114 | ``` 115 | 116 | ## Trace plots 117 | 118 | From the scatter plot and the trace plots, we can see that the chains are not well mixed. Their distributions are very different. 119 | 120 | ```{r} 121 | traceplot(fit, pars=c('mu','tau')) 122 | ``` 123 | 124 | By increasing the number of iterations, and throwing away the warmup we can improve the mixing. 125 | 126 | ```{r} 127 | fit = stan(model_code = schools_code, data=schools_dat, chains = 4,iter = 1000) 128 | traceplot(fit, pars=c('mu','tau')) 129 | ``` 130 | 131 | ## Rhat 132 | Rhat statistic is given in the last column of the print() function. It compares the variances of the pooled chains to the individual chains for each parameter. Ideally Rhat should be less than 1.1 133 | ```{r} 134 | print(fit) 135 | ``` 136 | 137 | ## Neff 138 | Neff is the number of effective samples, it tells you the number of samples taking into account the correlations between samples. Ideally the number of effective samples divided by the number of iterations should be greater than 0.01 139 | ```{r} 140 | neff = summary(fit)$summary[,'n_eff'] 141 | neff/4000 142 | ``` 143 | 144 | Sometimes the number of effective samples can be different in the bulk of the posterior compared to in the tails. Its useful to check these values too with the monitor() function. The bulk_ESS and tail_ESS should be greater than 100 per chain. 145 | ```{r} 146 | monitor(extract(fit, permuted = FALSE, inc_warmup = FALSE)) 147 | ``` 148 | 149 | Running larger chains and thinning the chains can sometimes help improve the effective sample size. 150 | ```{r} 151 | fit = stan(model_code = schools_code, data=schools_dat, chains = 4,iter = 10000, warmup = 500) 152 | ``` 153 | 154 | ## Divergences 155 | 156 | Divergences are indicators of regions of high curvature that is not well explored by Stan. These can be seen as red points in the pairs() plot of parameters. 157 | ```{r} 158 | pairs(fit, pars=c('mu', 'tau', 'lp__')) 159 | ``` 160 | 161 | Increasing adapt_delta can sometimes reduce the number of divergences 162 | 163 | ```{r} 164 | 165 | adapt_delta = c(0.8, 0.85, 0.9, 0.95) 166 | ndiv = get_num_divergent(fit) 167 | 168 | for(i in 2:4){ 169 | fit = stan(model_code = schools_code, data=schools_dat, chains = 4,iter = 10000, warmup = 500,control=list(adapt_delta=adapt_delta[i])) 170 | ndiv = c(ndiv,get_num_divergent(fit)) 171 | } 172 | ``` 173 | 174 | However if the number of divergences does not decrease with increase of adapt delta, then your model requires reparameterisation. 175 | 176 | 177 | ```{r} 178 | plot(adapt_delta,ndiv, ty='o', pch=20, xlab='adapt_delta', ylab='number of divergences') 179 | ``` 180 | 181 | 182 | ```{r} 183 | schools_code = " 184 | data { 185 | int J; // number of schools 186 | vector[J] y; // observed treatment effect 187 | vector[J] sigma; // uncertainty on observed effect 188 | } 189 | 190 | parameters { 191 | real mu; // population treatment effect 192 | real tau; //population scale 193 | vector[J] eta; //unscaled deviation from mu by school 194 | } 195 | 196 | transformed parameters { 197 | vector[J] theta; // true treatment effect 198 | 199 | theta = mu + tau * eta; 200 | } 201 | 202 | model { 203 | //(hyper) priors 204 | mu ~ normal(0,5); 205 | tau ~ cauchy(0,5); 206 | eta ~ normal(0,1); 207 | 208 | //likelihood 209 | y ~ normal(theta, sigma); 210 | } 211 | " 212 | ``` 213 | 214 | ```{r} 215 | fit = stan(model_code = schools_code, data=schools_dat, chains = 4,iter = 10000, warmup = 500, control = list(adapt_delta=0.95)) 216 | ``` 217 | ```{r} 218 | print(fit) 219 | ``` 220 | 221 | 222 | 223 | 224 | 225 | 226 | -------------------------------------------------------------------------------- /Gaussian_processes_I/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MaggieLieu/STAN_tutorials/6616b30827530521e2d5f9cf3f3e2fccc761c27d/Gaussian_processes_I/.DS_Store -------------------------------------------------------------------------------- /Gaussian_processes_I/.Rapp.history: -------------------------------------------------------------------------------- 1 | get_num_divergent(fit) 2 | fit = stan('model.stan', data=schools_dat, chains=4, iter=10000, warmup=500, control=list(adapt_delta=0.85)) 3 | get_num_divergent(fit) 4 | fit = stan('model.stan', data=schools_dat, chains=4, iter=10000, warmup=500, control=list(adapt_delta=0.85)) 5 | get_num_divergent(fit) 6 | fit = stan('model.stan', data=schools_dat, chains=4, iter=10000, warmup=500, control=list(adapt_delta=0.85)) 7 | get_num_divergent(fit) 8 | fit = stan('model.stan', data=schools_dat, chains=4, iter=10000, warmup=500, control=list(adapt_delta=0.85)) 9 | get_num_divergent(fit) 10 | fit = stan('model.stan', data=schools_dat, chains=4, iter=10000, warmup=500, control=list(adapt_delta=0.95)) 11 | get_num_divergent(fit) 12 | print(fit) 13 | get_num_divergent(fit) 14 | fit = stan('model.stan', data=schools_dat, chains =4, iter=10000, warmup=500, control=list(adapt_delta=0.95)) 15 | 82.24*12 16 | 30000/(82.24*12) 17 | 21414+9500 18 | (21414+9500-12500)*0.2 19 | 8084-3682 20 | 21414-(21414+9500-12500)*0.2 21 | (7238+9500-12500)*0.2 22 | (14333+9500-12500)*0.2 23 | 9449+18507-3682 24 | 9449+18507-847.6-2266.6 25 | (14333-12500)*0.2 26 | 9449+18507-847.6-366.6 27 | 10000*7*8.5 28 | 28*86\ 29 | 28*86 30 | 2000+(9*150)+(9*350)+2000+(4*100) 31 | 60652/157695 32 | 60652/(60652+157695) 33 | 50/(180) 34 | 54/(180) 35 | 12+3+7 36 | 90/22 37 | 2000+(9*150)+9*350+2000+100*4 38 | datasets(E) 39 | datasets() 40 | data(EUStockMarkets) 41 | data(EUStockmarkets) 42 | data() 43 | data(EuStockMarkets) 44 | EuStockMarkets 45 | write.csv('EuStockMarkets.csv', EuStockMarkets) 46 | time(EuStockMarkets) 47 | time(EuStockMarkets)$time 48 | time(EuStockMarkets)$t 49 | EuStockMarkets$FTSE 50 | EuStockMarkets['FTSE'] 51 | t=time(EuStockMarkets) 52 | t 53 | EuStockMarkets 54 | EuStockMarkets['Stocks'] 55 | EuStockMarkets['Stock'] 56 | names(EuStockMarkets) 57 | head(EuStockMarkets) 58 | EuStockMarkets[,'FTSE'] 59 | head(EuStockMarkets) 60 | ftse= EuStockMarkets[,'FTSE'] 61 | out = data.frame('t':t, 'FTSE':ftse) 62 | dim(t) 63 | t['time'] 64 | head(t) 65 | t 66 | ftse 67 | ts = data.frame(year=time(EuStockMarkets)) 68 | stocks=data.frame(EuStockMarkets) 69 | ts 70 | stocks=data.frame(year = time(EuStockMarkets), EuStockMarkets) 71 | stocks 72 | write.csv('EuStockMarket', stocks, quote=FALSE, row.names=FALSE) 73 | write.csv(stocks,'EuStockMarket',quote=FALSE, row.names=FALSE) 74 | data() 75 | rgamma(100,0.1,0.1) 76 | plot(density(rgamma(1000,0.1,0.1))) 77 | plot(density(rgamma(1000,0.1,0.1)), xlim=x(0,1)) 78 | plot(density(rgamma(1000,0.1,0.1)), xlim=c(0,1)) 79 | plot(density(rgamma(1000,0.1,0.01)), xlim=c(0,1)) 80 | plot(density(rgamma(1000,0.1,0.1)), xlim=c(0,1)) 81 | plot(density(rgamma(1000,0.3,0.3)), xlim=c(0,1)) 82 | 186.19-9.15 83 | (186.19-9.15)/24 84 | 7.37/10000 85 | 5/50000 86 | 7*5 87 | 35/5 88 | 430000-60000 89 | 43000*0.2 90 | 430000*0.2 91 | 86000+37000 92 | 60000+180000 93 | (60000+180000)/0.8*1 94 | 335*0.2 95 | 335-67-60 96 | x=seq(-10,10,0.1) 97 | y=dpois(x, 0.2) 98 | x=seq(0,10,0.1) 99 | y=dpois(x, 0.2) 100 | warnings() 101 | x=seq(-10,10,1) 102 | y=dpois(x, 0.2) 103 | plot(x,y) 104 | plot(x,y, ty='l') 105 | x=seq(0,20,1) 106 | y=dpois(x, 0.2) 107 | plot(x,y, ty='l') 108 | y=dpois(x, 0.5) 109 | lines(x,y) 110 | y=dpois(x, 4) 111 | lines(x,y) 112 | y=dpois(x, 8) 113 | lines(x,y) 114 | y=dpois(x, 0.8) 115 | lines(x,y) 116 | plot(x,y, ty='l', xlab=x, ylab='density') 117 | plot(x,y, ty='l', xlab=, ylab='density') 118 | plot(x,y, ty='l', xlab='x', ylab='density') 119 | y=dpois(x, 5) 120 | lines(x,y) 121 | y=dpois(x, 12) 122 | lines(x,y) 123 | y=dpois(x, 0.8) 124 | plot(x,y, ty='l', xlab='x', ylab='density') 125 | y=dpois(x, 5) 126 | lines(x,y) 127 | hist(y) 128 | plot(x,y,ty='s') 129 | plot(x,y,ty='o') 130 | plot(x,y,ty='h') 131 | plot(x,y,ty='h', lwd=3) 132 | y=dpois(x, 0.8) 133 | plot(x,y,ty='h', lwd=5, xlab='x', ylab='density') 134 | y=dpois(x, 5) 135 | points(x,y, ty='h', col='red',lwd=3) 136 | y=dpois(x, 12) 137 | points(x,y, ty='h', col='orange',lwd=1) 138 | points(x,y, ty='h', col='gray',lwd=1) 139 | points(x,y, ty='h', col='blue',lwd=1) 140 | y=dpois(x, 0.8) 141 | plot(x,y,ty='h', lwd=5, xlab='x', ylab='density') 142 | y=dpois(x, 5) 143 | points(x,y, ty='h', col='orange',lwd=3) 144 | y=dpois(x, 12) 145 | points(x,y, ty='h', col='red',lwd=1) 146 | 2e6*0.04 147 | require(rstan)# 148 | set.seed(1) #set seed 149 | data = read.csv('/Users/maggielieu/Desktop/STAN/GITHUB/STAN_tutorials/Hierarchical/premiereleague.csv',col.names = c('Home','score1', 'score2', 'Away'), stringsAsFactors = FALSE) 150 | teams = unique(data$Home)# 151 | ht = unlist(sapply(1:ng, function(g) which(teams == data$Home[g])))# 152 | at = unlist(sapply(1:ng, function(g) which(teams == data$Away[g]))) 153 | ng = nrow(data) 154 | nt = length(unique(data$Home)) 155 | teams = unique(data$Home)# 156 | ht = unlist(sapply(1:ng, function(g) which(teams == data$Home[g])))# 157 | at = unlist(sapply(1:ng, function(g) which(teams == data$Away[g]))) 158 | 0.047*2e6 159 | 800/2e6 160 | 0.0004 161 | 143.74-47.72 162 | (143.74-47.72)/22 163 | (143.74-47.72)/22/10*5 164 | (143.74-47.72)/22/10*50 165 | 357+36 166 | 291+36 167 | 327+36 168 | 327+36+36 169 | 399+37 170 | 399+36 171 | 399+36+36 172 | 721+36 173 | 757+36 174 | 757+36+36 175 | 757+36+36+36 176 | 0.00379808*13130.31+0.13+0.01809664*# 177 | 8267.83+0.38 178 | 149.62+0.38+49.87+0.13 179 | 200-180 180 | 0.00389472*9986.34 181 | 0.00389472*10100 182 | 331.83*179.31 183 | 331.83/179.31 184 | 179.31/331.83 185 | 0.29492637 +# 186 | 0.17100590+ # 187 | 0.21660333 188 | 0.21660333*350 189 | 0.17100590*350 190 | 180*100000 191 | 0.018*100000 192 | 0.018*100001 193 | 0.018*101000 194 | 0.018*10100 195 | 0.05*2e6 196 | 100000 197 | 4665/20 198 | R 199 | setwd('Desktop/STAN/GITHUB/STAN_tutorials/Gaussian_processes_I/') 200 | library(rstan) 201 | stan_data = read_rdump('gp-fit.data.R') 202 | N 203 | stan_data 204 | plot(stan_data$x, stan_data$y) 205 | diff(stan_data$x) 206 | round(0.5) 207 | round(1.5) 208 | round(2.5) 209 | data$N 210 | stan_data$N 211 | fit_gp <- stan(file="gp-fit.stan", data=stan_data,# 212 | iter=200, chains=3); 213 | 602/7 214 | 3917/20 215 | 3917/20/3 216 | 566/60 217 | x=seq(0,20,0.5) 218 | y=lgamma(x) 219 | y=dgamma(x, shape=5, rate=5) 220 | plot(x,y) 221 | plot(x,y,ty='l') 222 | x=seq(0,20,0.1) 223 | y=dgamma(x, shape=5, rate=5) 224 | plot(x,y,ty='l') 225 | install.packages('invgamma') 226 | require(invgamma) 227 | y2=dinvgamma(x,2,20) 228 | lines(x, y2, lty='dotted', col='red') 229 | y=dgamma(x, shape=5, rate=20) 230 | lines(x, y, lty='dotted', col='red') 231 | y=dgamma(x, shape=2, rate=20) 232 | lines(x, y, lty='dotted', col='red') 233 | y=dgamma(x, shape=5, rate=5) 234 | lines(x, y, lty='dotted', col='red') 235 | y=dgamma(x, shape=1, rate=5) 236 | lines(x, y, lty='dotted', col='red') 237 | y=dgamma(x, shape=1, rate=1) 238 | lines(x, y, lty='dotted', col='blue') 239 | y=dgamma(x, shape=2, rate=2) 240 | lines(x, y, lty='dotted', col='blue') 241 | 5964/20 242 | 5964/19 243 | 5000/19 244 | 5000/20 245 | 1025*3+99*5 246 | (1025*3+99*5)/19 247 | (3417.83+99*5)/19 248 | (3417.83+99*5)/19/3 249 | (3417.83+99*5)/20 250 | (3417.83+99*5)/20/3 251 | (3417.83+99*5+350)/20/3 252 | (3417.83+99*5+350)/19/3 253 | (3417.83+99*5+350)/19 254 | 225+30+30+30 255 | 225+25+25+25+10 256 | 60*20 257 | 6000/20/3 258 | 5069/20 259 | 5069/20/3 260 | -------------------------------------------------------------------------------- /Gaussian_processes_I/.Rhistory: -------------------------------------------------------------------------------- 1 | knitr::opts_chunk$set(echo = TRUE) 2 | require(rstan) 3 | require(shiny) 4 | knitr::opts_chunk$set(echo = TRUE) 5 | sim_gp = " 6 | data { 7 | int N; //number of data points 8 | real x[N]; //data 9 | real alpha; 10 | real rho; 11 | } 12 | transformed data { 13 | matrix[N,N] K = cov_exp_quad(x, alpha, rho) + diag_matrix(rep_vector(1e-9,N)); // Covariance function 14 | vector[N] mu = rep_vector(0,N); //mean 15 | } 16 | generated quantities { 17 | vector[N] f = multi_normal_rng(mu,K); 18 | } 19 | " 20 | sim_gp = " 21 | data { 22 | int N; //number of data points 23 | real x[N]; //data 24 | real alpha; 25 | real rho; 26 | } 27 | transformed data { 28 | matrix[N,N] K = cov_exp_quad(x, alpha, rho) + diag_matrix(rep_vector(1e-9,N)); // Covariance function 29 | vector[N] mu = rep_vector(0,N); //mean 30 | } 31 | generated quantities { 32 | vector[N] f = multi_normal_rng(mu,K); 33 | } 34 | " 35 | N=100 36 | data = list( 37 | N = N, 38 | x = seq(-10, 10, length.out=N), #equally spaced points 39 | alpha = 1, 40 | rho = 1, 41 | ) 42 | N=100 43 | data = list( 44 | N = N, 45 | x = seq(-10, 10, length.out=N), #equally spaced points 46 | alpha = 1, 47 | rho = 1 48 | ) 49 | ndraws = 50 # number of draws 50 | fit = stan(model_code = sim_gp, data=data, algorithm ='Fixed_param',warmup = 0, chains = 1, iter=ndraws) 51 | params = extract(fit) 52 | quantile(params$f[1,],probs = c(0.05,0.95)) 53 | yconf = sapply(1:ndraws, function(x) quantile(params$f[x,], probs=c(0.05,0.95))) 54 | yconf 55 | yconf[1,] 56 | yconf[2,] 57 | plot(x, yconf[2,]) 58 | plot(data$x, yconf[2,]) 59 | dim(yconf) 60 | dim(data$x) 61 | len(data$x) 62 | length(data$x) 63 | dim(params$f) 64 | yconf = sapply(1:ndraws, function(x) quantile(params$f[,x], probs=c(0.05,0.95))) 65 | plot(x, yconf[1,]) 66 | plot(data$x, yconf[1,]) 67 | dim(yconf) 68 | yconf = sapply(1:ndraws, function(x) quantile(params$f[,x], probs=c(0.05,0.95))) 69 | yconf = sapply(1:N, function(x) quantile(params$f[,x], probs=c(0.05,0.95))) 70 | dim(yconf) 71 | plot(data$x, yconf[1,]) 72 | plot(data$x, yconf[2,]) 73 | N=100 74 | data = list( 75 | N = N, 76 | x = seq(-10, 10, length.out=N), #equally spaced points 77 | alpha = 1, 78 | rho = 1 79 | ) 80 | xr=c(0.5,10) 81 | yr=c(-5,5) 82 | plot(xr, yr, ty='n') 83 | for(i in 1:ndraws){ 84 | lines(data$x, params$f[i,], pch=20, col=rgb(0,0.55,0.5, 0.2), cex=0.5) 85 | } 86 | points(data$x, params$f[ndraws,], pch=20, col='black', cex=1) 87 | lines(data$x, colMeans(params$f),lwd=3) 88 | xr=c(-10,10) 89 | yr=c(-5,5) 90 | plot(xr, yr, ty='n') 91 | for(i in 1:ndraws){ 92 | lines(data$x, params$f[i,], pch=20, col=rgb(0,0.55,0.5, 0.2), cex=0.5) 93 | } 94 | points(data$x, params$f[ndraws,], pch=20, col='black', cex=1) 95 | lines(data$x, colMeans(params$f),lwd=3) 96 | plot(xr,yr, ty='n') 97 | polygon(x=c(data$x, rev(data$x)), y=c(yconf[1,], rev(yconf[2,]))) 98 | plot(xr,yr, ty='n') 99 | polygon(x=c(data$x, rev(data$x)), y=c(yconf[1,], rev(yconf[2,])), border=NA, col=rgb(0,0,0,0.1)) 100 | yconf = sapply(1:N, function(x) quantile(params$f[,x], probs=c(0.003,0.05,0.32,0.68,0.95,0.997))) 101 | yconf 102 | yconf[1,] 103 | yconf 104 | plot(xr,yr, ty='n') 105 | polygon(x=c(data$x, rev(data$x)), y=c(yconf[1,], rev(yconf[4,])), border=NA, col=rgb(0,0,0,0.1)) 106 | polygon(x=c(data$x, rev(data$x)), y=c(yconf[2,], rev(yconf[5,])), border=NA, col=rgb(0,0,0,0.1)) 107 | polygon(x=c(data$x, rev(data$x)), y=c(yconf[3,], rev(yconf[6,])), border=NA, col=rgb(0,0,0,0.1)) 108 | plot(xr,yr, ty='n') 109 | polygon(x=c(data$x, rev(data$x)), y=c(yconf[1,], rev(yconf[4,])), border=NA, col=rgb(0,0,0,0.1)) 110 | polygon(x=c(data$x, rev(data$x)), y=c(yconf[2,], rev(yconf[5,])), border=NA, col=rgb(0,0,0,0.1)) 111 | polygon(x=c(data$x, rev(data$x)), y=c(yconf[3,], rev(yconf[6,])), border=NA, col=rgb(0,0,0,0.1)) 112 | lines(data$x, colMeans(params$f),lwd=3) 113 | -------------------------------------------------------------------------------- /Gaussian_processes_I/GPs.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Gaussian Processes in Stan" 3 | output: html_notebook 4 | runtime: shiny 5 | --- 6 | 7 | This notebook is a tutorial on Gaussian Processes using Stan that accompanies the youtube video https://youtu.be/132s2B-mzBg 8 | ```{r} 9 | require(rstan) 10 | ``` 11 | # Simulating from a gaussian process 12 | ```{r} 13 | N=100 14 | 15 | data = list( 16 | N = N, 17 | x = seq(-10, 10, length.out=N), #equally spaced points 18 | alpha = 1, 19 | rho = 1 20 | ) 21 | ``` 22 | 23 | ```{r} 24 | ndraws = 50 # number of draws 25 | fit = stan(file='sim_gp.stan', data=data, algorithm ='Fixed_param',warmup = 0, chains = 1, iter=ndraws) 26 | ``` 27 | ```{r} 28 | params = extract(fit) 29 | ``` 30 | 31 | ```{r} 32 | xr=c(-10,10) 33 | yr=c(-5,5) 34 | plot(xr, yr, ty='n') 35 | for(i in 1:ndraws){ 36 | lines(data$x, params$f[i,], pch=20, col=rgb(0,0.55,0.5, 0.2), cex=0.5) 37 | } 38 | points(data$x, params$f[ndraws,], pch=20, col='black', cex=1) 39 | lines(data$x, colMeans(params$f),lwd=3) 40 | 41 | ``` 42 | ```{r} 43 | yconf = sapply(1:N, function(x) quantile(params$f[,x], probs=c(0.003,0.05,0.32,0.68,0.95,0.997))) 44 | 45 | plot(xr,yr, ty='n', xlab='t', ylab='y') 46 | for(i in 1:3){ 47 | polygon(x=c(data$x, rev(data$x)), y=c(yconf[i,], rev(yconf[i+3,])), border=NA, col=rgb(0,0,0,0.1)) 48 | } 49 | lines(data$x, colMeans(params$f),lwd=3) 50 | ``` 51 | # Kepler data 52 | Read in Kepler data for object KEPLAR 003632418 53 | ```{r} 54 | lcdata = read.csv('kplr003632418-2009131105131_llc.csv') 55 | ``` 56 | 57 | ```{r} 58 | plot(lcdata$TIME, lcdata$SAP_FLUX, pch=20, xlab='Time [days]', ylab='Flux [e-/s]', cex=0.5) 59 | ``` 60 | ```{r} 61 | bad = which(is.na(lcdata$SAP_FLUX)) #remove missing data 62 | x = lcdata$TIME[-bad] 63 | y = (lcdata$SAP_FLUX[-bad] - mean(lcdata$SAP_FLUX[-bad]))/sd(lcdata$SAP_FLUX[-bad]) #Normalise flux to have mean 0 and sd 1 64 | ``` 65 | 66 | ```{r} 67 | plot(x,y, pch=20, cex=0.5) 68 | ``` 69 | 70 | ```{r} 71 | data = list( 72 | N = length(x), 73 | x = x, 74 | y = y 75 | ) 76 | ``` 77 | 78 | 79 | ```{r} 80 | fit = stan(file='method2.stan', data=data,chains = 1, iter=2000) 81 | ``` 82 | 83 | ```{r} 84 | params = extract(fit) 85 | ``` 86 | 87 | ```{r} 88 | pairs(fit) 89 | ``` 90 | ```{r} 91 | mean(params$rho) 92 | mean(params$alpha) 93 | mean(params$sigma) 94 | ``` 95 | 96 | ```{r} 97 | N =500 98 | gen_data = list( 99 | N = N, 100 | x = seq(120, 132, length.out=N), #equally spaced points 101 | alpha = mean(params$alpha), 102 | rho = mean(params$rho), 103 | sigma = mean(params$sigma) 104 | ) 105 | 106 | ndraws = 50 # number of draws 107 | gen_sam = stan(file = 'sim_gp.stan', data=gen_data, algorithm ='Fixed_param', warmup = 0, chains = 1, iter=ndraws) 108 | ``` 109 | 110 | ```{r} 111 | gen_params = extract(gen_sam) 112 | xr = c(120,132) 113 | yr = c(-4,4) 114 | plot(xr, yr, ty='n') 115 | for(i in 1:ndraws){ 116 | lines(gen_data$x, gen_params$f[i,], pch=20, col=rgb(0,0.55,0.5, 0.2)) 117 | } 118 | points(x, y, pch=20, col='orange', cex=0.5) 119 | legend('topright', y = c('predicted','data'), col = c(rgb(0,0.55,0.5, 0.2),'orange'), bty='n', pch=c(NA,20), lty=c(1,NA)) 120 | ``` 121 | # Predictive inference 122 | 123 | ```{r} 124 | N_predict = 100 125 | x_predict = seq(range(data$x)[1], 132, length.out=N_predict) 126 | N_obs = 200 127 | Obs = sort(sample(1:length(x), N_obs)) 128 | pred_data <- list(N1=N_obs, x1=x[Obs], y1=y[Obs], N2=N_predict, x2=x_predict) 129 | 130 | pred_fit = stan(file='method1.stan', data =pred_data, iter=2000, chains=1) 131 | ``` 132 | ```{r} 133 | pred_params = extract(pred_fit) 134 | ``` 135 | 136 | ```{r} 137 | yr = c(-2,2) 138 | xr = c(120, 132) 139 | plot(xr, yr, ty='n') 140 | for( i in 1:100){ 141 | lines(x_predict, pred_params$f[i,(N_obs+1):(N_obs+N_predict)], col=rgb(0,0,0,0.1)) 142 | } 143 | points(data$x, data$y, pch=20, col='orange', cex=0.3) 144 | ``` 145 | ```{r} 146 | N_predict = 100 147 | x_predict = seq(range(x)[1], 132, length.out=N_predict) 148 | N_obs = 3 149 | Obs = sort(sample(1:length(x), N_obs)) 150 | pred_data <- list(rho=mean(params$rho),alpha=mean(params$alpha), sigma=mean(params$sigma), N=N_obs, x=x[Obs], y=y[Obs], N_predict=N_predict, x_predict=x_predict) 151 | 152 | pred_opt_fit <- stan(file='predict_gauss.stan', data=pred_data, iter=1000, warmup=0, 153 | chains=1, algorithm="Fixed_param") 154 | ``` 155 | 156 | ```{r} 157 | pred_params = extract(pred_opt_fit) 158 | ``` 159 | 160 | ```{r} 161 | yr = c(-2,2) 162 | xr = c(120, 132) 163 | plot(xr, yr, ty='n') 164 | for( i in 1:100){ 165 | lines(x_predict, pred_params$f_predict[i,], col=rgb(0,0,0,0.1)) 166 | } 167 | points(x, y, pch=20, col='orange', cex=0.3) 168 | points(pred_data$x, pred_data$y, pch=20, col='orange', cex=2) 169 | points(pred_data$x, pred_data$y, cex=2) 170 | ``` 171 | ```{r} 172 | yconf = sapply(1:N_predict, function(x) quantile(pred_params$f[,x], probs=c(0.003,0.05,0.32,0.68,0.95,0.997))) 173 | 174 | plot(xr,yr, ty='n', xlab='t', ylab='y') 175 | for(i in 1:3){ 176 | polygon(x=c(x_predict, rev(x_predict)), y=c(yconf[i,], rev(yconf[i+3,])), border=NA, col=rgb(0,0,0,0.1)) 177 | } 178 | lines(x_predict, colMeans(pred_params$f),lwd=3) 179 | 180 | ``` 181 | 182 | 183 | -------------------------------------------------------------------------------- /Gaussian_processes_I/Visualisations.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Visualisations of Gaussian Process Kernels" 3 | output: html_notebook 4 | runtime: shiny 5 | --- 6 | 7 | ```{r} 8 | require(shiny) 9 | ``` 10 | 11 | ```{r setup, include=FALSE} 12 | knitr::opts_chunk$set(echo = TRUE) 13 | ``` 14 | 15 | # Covariance kernels 16 | ### RBF kernel 17 | $$ 18 | \alpha ^2 \exp\left(-\frac{(x - x')^2}{2\rho^2}\right) 19 | $$ 20 | 21 | ```{r RBFkernel, echo=FALSE, fig.width=4, fig.height=4} 22 | inputPanel( 23 | sliderInput("alpha", label = "alpha:", 24 | min = 0, max = 1, value = 1, step = 0.1), 25 | 26 | sliderInput("rho", label = "rho:", 27 | min = 0.1, max = 2, value = 1, step = 0.2) 28 | ) 29 | 30 | renderPlot({ 31 | par(oma=c(0,0,0,0),mar=c(0,0,0,0)) 32 | N=10 33 | x=seq(-5,5,length.out=N) 34 | cov = matrix(nrow = N, ncol=N) 35 | for(i in 1:N){ 36 | for(j in 1:N){ 37 | cov[i,j] = input$alpha^2 * exp(-0.5 * (x[i]-x[j])^2 / (input$rho^2)) 38 | } 39 | } 40 | image(cov, ylim=c(1.1,-0.1), axes=FALSE, col=gray.colors(50, start=0, end=input$alpha)) 41 | 42 | }) 43 | ``` 44 | 45 | ### Periodic Kernel 46 | $$ 47 | \alpha^2 \exp\left(-\frac{2 \sin^2(\pi|x-x'|/p)}{\rho^2}\right) 48 | $$ 49 | 50 | ```{r periodickernel, echo=FALSE, fig.width=3, fig.height=3} 51 | inputPanel( 52 | sliderInput("alphapk", label = "alpha:", 53 | min = 0, max = 1, value = 1, step = 0.1), 54 | 55 | sliderInput("rhopk", label = "rho:", 56 | min = 0.1, max = 2, value = 1, step = 0.2), 57 | 58 | sliderInput("p", label = "p:", #periodicity 59 | min = 0, max = 3.14, value = 1, step = 0.1) 60 | ) 61 | 62 | renderPlot({ 63 | par(oma=c(0,0,0,0),mar=c(0,0,0,0)) 64 | N=10 65 | x=seq(-5,5,length.out=N) 66 | cov = matrix(nrow = N, ncol=N) 67 | for(i in 1:N){ 68 | for(j in 1:N){ 69 | cov[i,j] = input$alphapk^2 * exp(-2.0 * sin(pi* abs(x[i]-x[j]) / input$p)^2 / (input$rhopk^2)) 70 | } 71 | } 72 | 73 | image(cov, ylim=c(1.1,-0.1), axes=FALSE, col=gray.colors(50, start=0, end=input$alphapk)) 74 | 75 | }) 76 | ``` -------------------------------------------------------------------------------- /Gaussian_processes_I/method1.stan: -------------------------------------------------------------------------------- 1 | data { 2 | int N1; //number data observed 3 | real x1[N1]; 4 | vector[N1] y1; 5 | int N2; //number to be predicted 6 | real x2[N2]; 7 | } 8 | 9 | transformed data{ 10 | int N = N1 + N2; 11 | real x[N]; //number all 12 | for (n1 in 1:N1) x[n1] = x1[n1]; 13 | for (n2 in 1:N2) x[N1 + n2] = x2[n2]; 14 | } 15 | 16 | parameters { 17 | real rho; 18 | real alpha; 19 | real sigma; 20 | vector[N] eta; 21 | } 22 | 23 | transformed parameters { 24 | vector[N] f; 25 | { 26 | matrix[N,N] K = cov_exp_quad(x,alpha,rho) + diag_matrix(rep_vector(1e-9, N)); 27 | matrix[N,N] L_K = cholesky_decompose(K); 28 | f = L_K * eta; 29 | } 30 | } 31 | 32 | model { 33 | rho ~ normal(0,3); 34 | alpha ~ normal(0,1); 35 | sigma ~ normal(0,1); 36 | eta ~ normal(0,1); 37 | 38 | y1 ~ normal(f[1:N1], sigma); 39 | } 40 | 41 | generated quantities { 42 | vector[N2] y2; 43 | for(n2 in 1:N2) 44 | y2[n2] = normal_rng(f[N1 + n2], sigma); 45 | } 46 | -------------------------------------------------------------------------------- /Gaussian_processes_I/method2.stan: -------------------------------------------------------------------------------- 1 | data { 2 | int N; 3 | real x[N]; 4 | vector[N] y; 5 | } 6 | 7 | transformed data { 8 | vector[N] mu = rep_vector(0,N); 9 | } 10 | 11 | parameters { 12 | real rho; 13 | real alpha; 14 | real sigma; 15 | } 16 | 17 | model { 18 | matrix[N,N] K = cov_exp_quad(x, alpha, rho) + diag_matrix(rep_vector(square(sigma),N)); 19 | matrix[N,N] L_K = cholesky_decompose(K); 20 | 21 | rho ~ normal(0, 3); 22 | alpha ~ normal(0, 1); 23 | sigma ~ normal(0, 1); 24 | 25 | y ~ multi_normal_cholesky(mu, L_K); 26 | } 27 | -------------------------------------------------------------------------------- /Gaussian_processes_I/predict_gauss.stan: -------------------------------------------------------------------------------- 1 | functions { 2 | vector gp_pred_rng(real[] x2, 3 | vector y1, real[] x1, 4 | real alpha, real rho, real sigma, real delta) { 5 | int N1 = rows(y1); 6 | int N2 = size(x2); 7 | vector[N2] f2; 8 | { 9 | matrix[N1, N1] K = cov_exp_quad(x1, alpha, rho) 10 | + diag_matrix(rep_vector(square(sigma), N1)); 11 | matrix[N1, N1] L_K = cholesky_decompose(K); 12 | 13 | vector[N1] L_K_div_y1 = mdivide_left_tri_low(L_K, y1); 14 | vector[N1] K_div_y1 = mdivide_right_tri_low(L_K_div_y1', L_K)'; 15 | matrix[N1, N2] k_x1_x2 = cov_exp_quad(x1, x2, alpha, rho); 16 | vector[N2] f2_mu = (k_x1_x2' * K_div_y1); 17 | matrix[N1, N2] v_pred = mdivide_left_tri_low(L_K, k_x1_x2); 18 | matrix[N2, N2] cov_f2 = cov_exp_quad(x2, alpha, rho) - v_pred' * v_pred 19 | + diag_matrix(rep_vector(delta, N2)); 20 | f2 = multi_normal_rng(f2_mu, cov_f2); 21 | } 22 | return f2; 23 | } 24 | } 25 | 26 | data { 27 | int N; 28 | real x[N]; 29 | vector[N] y; 30 | 31 | int N_predict; 32 | real x_predict[N_predict]; 33 | 34 | real rho; 35 | real alpha; 36 | real sigma; 37 | } 38 | 39 | transformed data { 40 | matrix[N, N] cov = cov_exp_quad(x, alpha, rho) 41 | + diag_matrix(rep_vector(1e-10, N)); 42 | matrix[N, N] L_cov = cholesky_decompose(cov); 43 | } 44 | 45 | parameters {} 46 | model {} 47 | 48 | generated quantities { 49 | vector[N_predict] f_predict = gp_pred_rng(x_predict, y, x, alpha, rho, sigma, 1e-10); 50 | vector[N_predict] y_predict; 51 | 52 | for (n in 1:N_predict) 53 | y_predict[n] = normal_rng(f_predict[n], sigma); 54 | } 55 | -------------------------------------------------------------------------------- /Gaussian_processes_I/sim_gp.stan: -------------------------------------------------------------------------------- 1 | data { 2 | int N; //number of data points 3 | real x[N]; //data 4 | real alpha; 5 | real rho; 6 | } 7 | 8 | transformed data { 9 | matrix[N,N] K = cov_exp_quad(x, alpha, rho) + diag_matrix(rep_vector(1e-9,N)); // Covariance function 10 | vector[N] mu = rep_vector(0,N); //mean 11 | } 12 | 13 | generated quantities { 14 | vector[N] f = multi_normal_rng(mu,K); 15 | } 16 | -------------------------------------------------------------------------------- /Heteroskedasticity/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MaggieLieu/STAN_tutorials/6616b30827530521e2d5f9cf3f3e2fccc761c27d/Heteroskedasticity/.DS_Store -------------------------------------------------------------------------------- /Heteroskedasticity/.Rapp.history: -------------------------------------------------------------------------------- 1 | 1113.51/12 2 | 95*12 3 | data(mtcars) 4 | mtcars 5 | plot(mtcars$mpg,mtcars$cyl) 6 | plot(mtcars$mpg,mtcars$drat) 7 | plot(mtcars$mpg,mtcars$hp) 8 | plot(mtcars$mpg,mtcars$hp, log='xy') 9 | plot(mtcars$mpg,mtcars$hp) 10 | rbinom(1,2,0.5) 11 | rbinom(100, 2, 0.5) 12 | tosses = rbinom(100, 1, 0.5) 13 | length(which(tosses ==0)) 14 | length(which(tosses ==1)) 15 | length(which(tosses ==0))/length(which(tosses=1)) 16 | length(which(tosses ==0))/length(which(tosses==1)) 17 | tosses = rbinom(100, 1, 0.5) 18 | length(which(tosses ==0))/length(which(tosses==1)) 19 | N = c(10, 100, 1000, 10000)# 20 | Ratios = c()# 21 | For (n in N){# 22 | Tosses = rbinom(n=n, size=1, prob=0.5)# 23 | Ratios = c(Ratios, length(which(tosses ==0))/length(which(tosses==1)))# 24 | } 25 | N = c(10, 100, 1000, 10000)# 26 | Ratios = c()# 27 | for(n in N){# 28 | Tosses = rbinom(n=n, size=1, prob=0.5)# 29 | Ratios = c(Ratios, length(which(tosses ==0))/length(which(tosses==1)))# 30 | } 31 | plot(N, Ratios) 32 | N 33 | n 34 | N = c(10, 100, 1000, 10000)# 35 | ratios = c()# 36 | for (n in N){# 37 | tosses = rbinom(n=n, size=1, prob=0.5)# 38 | ratios = c(ratios, length(which(tosses ==0))/length(which(tosses==1)))# 39 | } 40 | plot(N, Ratios) 41 | plot(N, ratios) 42 | N = c(10, 100, 1000, 10000)# 43 | ratios = c()# 44 | for (n in N){# 45 | tosses = rbinom(n=n, size=1, prob=0.5)# 46 | ratios = c(ratios, length(which(tosses ==0))/length(which(tosses==1)))# 47 | } 48 | plot(N, ratios) 49 | abline(h=1) 50 | N = c(10, 100, 1000, 10000)# 51 | ratios = c()# 52 | for (n in N){# 53 | tosses = rbinom(n=n, size=1, prob=0.5)# 54 | ratios = c(ratios, length(which(tosses ==0))/length(which(tosses==1)))# 55 | }; plot(N, ratios); abline(h=1) 56 | sapply(1:2, function(x) rbinom(10,1,0.5)) 57 | mean(5) 58 | tosses = rbinom(1000, 1, 0.5) # 59 | means = sapply(1:1000, function(x) mean(tosses[1:x])) 60 | plot(means) 61 | plot(means, ty='l') 62 | abline(h=0.5) 63 | mean(c(0,1)) 64 | plot(means, ty='l', ylim=c(0,1)) 65 | tosses[1:10] 66 | mean(tosses[1:10]) 67 | abline(h=0.5) 68 | mean(tosses[1:3]) 69 | tosses = rbinom(10000, 1, 0.5) # 70 | means = sapply(1:10000, function(x) mean(tosses[1:x]))# 71 | plot(means, ty=’l’, ylim=c(0,1), xlab=’toss’, ylab=’mean of tosses’)# 72 | abline(h=0.5, lty=’dashed’) 73 | tosses = rbinom(10000, 1, 0.5) # 74 | means = sapply(1:10000, function(x) mean(tosses[1:x]))# 75 | plot(means, ty='l', ylim=c(0,1), xlab='toss, ylab=’mean of tosses’)# 76 | abline(h=0.5, lty=’dashed’) 77 | ' 78 | '' 79 | tosses = rbinom(10000, 1, 0.5) # 80 | means = sapply(1:10000, function(x) mean(tosses[1:x]))# 81 | plot(means, ty='l', ylim=c(0,1), xlab='toss', ylab='mean of tosses')# 82 | abline(h=0.5, lty=’dashed’) 83 | tosses = rbinom(10000, 1, 0.5) # 84 | means = sapply(1:10000, function(x) mean(tosses[1:x]))# 85 | plot(means, ty='l', ylim=c(0,1), xlab='toss', ylab='mean of tosses')# 86 | abline(h=0.5, lty='dashed') 87 | tosses = rbinom(5000, 1, 0.5) # 88 | means = sapply(1:5000, function(x) mean(tosses[1:x]))# 89 | plot(means, ty='l', ylim=c(0,1), xlab='toss', ylab='mean of tosses')# 90 | abline(h=0.5, lty='dashed') 91 | library(help = "datasets") 92 | HairEyeColor 93 | EuStockMarkets 94 | head(EuStockMarkets) 95 | plot(EuStockMarkets) 96 | require(graphics)# 97 | pairs(mtcars, main = "mtcars data", gap = 1/4) 98 | require(graphics)# 99 | pairs(mtcars, main = "mtcars data", gap = 1/4)# 100 | coplot(mpg ~ disp | as.factor(cyl), data = mtcars,# 101 | panel = panel.smooth, rows = 1) 102 | mtcars2 <- within(mtcars, {# 103 | vs <- factor(vs, labels = c("V", "S"))# 104 | am <- factor(am, labels = c("automatic", "manual"))# 105 | cyl <- ordered(cyl)# 106 | gear <- ordered(gear)# 107 | carb <- ordered(carb)# 108 | })# 109 | summary(mtcars2) 110 | data(schools) 111 | schoolls 112 | schools 113 | require(rstan) 114 | a = matrix(data = 1, nrow=3, ncol=3) 115 | a 116 | b = matrix(data = 2, nrow=3, ncol=3) 117 | a 118 | v 119 | c(a,b) 120 | tosses = rbinom(n=10, size=1,prob=0.5) 121 | mean(tosses) 122 | tosses = rbinom(n=5000, size=1, prob=0.5) 123 | means = sapply(1:5000, function(x) mean(tosses[1:x])) 124 | plot(means, ty='l', ylim = c(0,1), xlab='toss', ylab='mean of tosses') 125 | abline(h=0.5, lty='dashed') 126 | require(magaxis) 127 | require(magixaxis) 128 | require(magicaxis) 129 | plot(means, ty='l', ylim = c(0,1), xlab='', ylab='' axes=FALSE) 130 | plot(means, ty='l', ylim = c(0,1), xlab='', ylab='', axes=FALSE) 131 | magaxis(1:2) 132 | magaxis(3:4, labels=FALSE) 133 | mtext('tosses', side=1, line=2) 134 | mtext('mean of tosses', side=2, line=2) 135 | plot(means, ty='l', ylim = c(0,1), xlab='', ylab='', axes=FALSE, lwd=2) 136 | magaxis(1:2) 137 | magaxis(3:4, labels=FALSE) 138 | mtext('tosses', side=1, line=2) 139 | mtext('mean of tosses', side=2, line=2) 140 | abline(h=0.5, lty='dashed') 141 | plot(means, ty='n', ylim = c(0,1), xlab='', ylab='', axes=FALSE, lwd=2) 142 | magaxis(1:2) 143 | magaxis(3:4, labels=FALSE) 144 | mtext('tosses', side=1, line=2) 145 | mtext('mean of tosses', side=2, line=2) 146 | abline(h=0.5, lty='dashed') 147 | sample_pois = sapply(1:10000, function(x) rpois(1000,0.6)) 148 | sample_exp = sapply(1:10000, function(x) rexp(1000,0.6)) 149 | mean_pois = colMeans(sample_pois) 150 | mean_exp = colMeans(sample_exp) 151 | plot(density(mean_pois)) 152 | plot(density(mean_exp)) 153 | acf 154 | schools_dat = list( J = 8, y = c(28,8,-3,7,-1,1,18,12), sigma = c(15,10,16,11,9,11,10,18)) 155 | require(rstan) 156 | schools_dat = list( J = 8, y = c(28,8,-3,7,-1,1,18,12), sigma = c(15,10,16,11,9,11,10,18)) 157 | require(rstan) 158 | fit = stan('model.stan', data=schools_dat, chains=4, iter=20) 159 | schools_dat = list( J = 8, y = c(28,8,-3,7,-1,1,18,12), sigma = c(15,10,16,11,9,11,10,18)) 160 | require(rstan) 161 | fit = stan('model.stan', data = schools_dat, chains=4, iter=20) 162 | params = extract(fit, permuted=FALSE, inc_warmup=TRUE) 163 | plot(c(-5,25), c(-5,25), ty='n', xlab='mu', ylab='tau') 164 | lines(params[,'chain:1', 'mu'], params[,'chain:1','tau'], col='black', ty='o', pch=20) 165 | lines(params[,'chain:2', 'mu'], params[,'chain:2','tau'], col='orange', ty='o', pch=20) 166 | lines(params[,'chain:3', 'mu'], params[,'chain:3','tau'], col='red', ty='o', pch=20) 167 | lines(params[,'chain:4', 'mu'], params[,'chain:4','tau'], col='gray', ty='o', pch=20) 168 | traceplot(fit,pars=c('mu','tau')) 169 | fit = stan('model.stan', data=schools_dat, chains=4, iter=1000) 170 | traceplot(fit, pars=c('mu','tau')) 171 | fit = stan('model.stan', data=schools_dat, chains=4, iter=1000) 172 | traceplot(fit, pars=c('mu','tau')) 173 | fit = stan('model.stan', data=schools_dat, chains=4, iter=1000) 174 | traceplot(fit, pars=c('mu','tau')) 175 | print(fit) 176 | neff = summary(fit)$summary[,'n_eff'] 177 | neff/2000 178 | monitor(extract(fit, permuted=FALSE, inc_warmup=FALSE)) 179 | fit = stan('model.stan', data=schools_dat, chains=4, iter=10000, warmup=500) 180 | pairs(fit, pars=c('mu', 'tau','lp__')) 181 | fit = stan('model.stan', data=schools_dat, chains=4, iter=1000, warmup=500, control=list(adapt_delta=0.85)) 182 | fit = stan('model.stan', data=schools_dat, chains=4, iter=10000, warmup=500) 183 | pairs(fit, pars=c('mu', 'tau','lp__')) 184 | fit = stan('model.stan', data=schools_dat, chains=4, iter=10000, warmup=500) 185 | pairs(fit, pars=c('mu', 'tau','lp__')) 186 | fit = stan('model.stan', data=schools_dat, chains=4, iter=10000, warmup=500, control=list(adapt_delta=0.85)) 187 | get_num_divergent(fit) 188 | fit = stan('model.stan', data=schools_dat, chains=4, iter=10000, warmup=500) 189 | print(fit) 190 | fit = stan('model.stan', data=schools_dat, chains=4, iter=10000, warmup=500) 191 | fit = stan('model.stan', data=schools_dat, chains=4, iter=10000, warmup=50) 192 | fit = stan('model.stan', data=schools_dat, chains=4, iter=10000, warmup=500) 193 | pairs(fit, pars=c('mu', 'tau','lp__')) 194 | fit = stan('model.stan', data=schools_dat, chains=4, iter=10000, warmup=500, control=list(adapt_delta=0.85)) 195 | get_num_divergent(fit) 196 | fit = stan('model.stan', data=schools_dat, chains=4, iter=10000, warmup=500, control=list(adapt_delta=0.85)) 197 | get_num_divergent(fit) 198 | fit = stan('model.stan', data=schools_dat, chains=4, iter=10000, warmup=500, control=list(adapt_delta=0.85)) 199 | get_num_divergent(fit) 200 | fit = stan('model.stan', data=schools_dat, chains=4, iter=10000, warmup=500, control=list(adapt_delta=0.85)) 201 | get_num_divergent(fit) 202 | fit = stan('model.stan', data=schools_dat, chains=4, iter=10000, warmup=500, control=list(adapt_delta=0.85)) 203 | get_num_divergent(fit) 204 | fit = stan('model.stan', data=schools_dat, chains=4, iter=10000, warmup=500, control=list(adapt_delta=0.85)) 205 | get_num_divergent(fit) 206 | fit = stan('model.stan', data=schools_dat, chains=4, iter=10000, warmup=500, control=list(adapt_delta=0.85)) 207 | get_num_divergent(fit) 208 | fit = stan('model.stan', data=schools_dat, chains=4, iter=10000, warmup=500, control=list(adapt_delta=0.85)) 209 | get_num_divergent(fit) 210 | fit = stan('model.stan', data=schools_dat, chains=4, iter=10000, warmup=500, control=list(adapt_delta=0.95)) 211 | get_num_divergent(fit) 212 | print(fit) 213 | get_num_divergent(fit) 214 | fit = stan('model.stan', data=schools_dat, chains =4, iter=10000, warmup=500, control=list(adapt_delta=0.95)) 215 | 82.24*12 216 | 30000/(82.24*12) 217 | 21414+9500 218 | (21414+9500-12500)*0.2 219 | 8084-3682 220 | 21414-(21414+9500-12500)*0.2 221 | (7238+9500-12500)*0.2 222 | (14333+9500-12500)*0.2 223 | 9449+18507-3682 224 | 9449+18507-847.6-2266.6 225 | (14333-12500)*0.2 226 | 9449+18507-847.6-366.6 227 | 10000*7*8.5 228 | 28*86\ 229 | 28*86 230 | 2000+(9*150)+(9*350)+2000+(4*100) 231 | 60652/157695 232 | 60652/(60652+157695) 233 | 50/(180) 234 | 54/(180) 235 | 12+3+7 236 | 90/22 237 | 2000+(9*150)+9*350+2000+100*4 238 | x = runif(n,0,10)# 239 | y = a + b*x + rnorm(n, 0, 0.3)*x# 240 | plot(x,y, pch=20) 241 | n=100 242 | x = runif(n,0,10)# 243 | y = a + b*x + rnorm(n, 0, 0.3)*x# 244 | plot(x,y, pch=20) 245 | a = 0.4; b=1.3 246 | x = runif(n,0,10)# 247 | y = a + b*x + rnorm(n, 0, 0.3)*x# 248 | plot(x,y, pch=20) 249 | library(animation)# 250 | saveGIF({# 251 | ani.options(nmax = 100)# 252 | par(oma=c())# 253 | #x=seq(1,9, length.out = ani.options('nmax'))# 254 | n=100# 255 | a = 0.4; b=1.3# 256 | x = runif(n,0,10)# 257 | y = a + b*x + rnorm(n, 0, 0.3)*x# 258 | plot(x,y, pch=20, ty='n')# 259 | ## use a loop to create images one by one# 260 | for (i in 1:n) {# 261 | points(x[n], y[n])# 262 | }# 263 | # 264 | }, interval = 0.05, movie.name = '~/Desktop/bm_demo.gif', ani.width = 600, ani.height = 600, ani.res=300) 265 | saveGIF({# 266 | ani.options(nmax = 100)# 267 | #par(oma=c())# 268 | #x=seq(1,9, length.out = ani.options('nmax'))# 269 | n=100# 270 | a = 0.4; b=1.3# 271 | x = runif(n,0,10)# 272 | y = a + b*x + rnorm(n, 0, 0.3)*x# 273 | ## use a loop to create images one by one# 274 | for (i in 1:n) {# 275 | plot(x,y, pch=20, ty='n')# 276 | points(x[1:n], y[1:n])# 277 | }# 278 | # 279 | }, interval = 0.05, movie.name = '~/Desktop/bm_demo.gif', ani.width = 600, ani.height = 600, ani.res=300) 280 | saveGIF({# 281 | ani.options(nmax = 100)# 282 | #par(oma=c())# 283 | #x=seq(1,9, length.out = ani.options('nmax'))# 284 | n=100# 285 | a = 0.4; b=1.3# 286 | x = runif(n,0,10)# 287 | y = a + b*x + rnorm(n, 0, 0.3)*x# 288 | ## use a loop to create images one by one# 289 | for (i in 1:n) {# 290 | plot(x,y, pch=20, ty='n')# 291 | points(x[1:i], y[1:i])# 292 | }# 293 | # 294 | }, interval = 0.05, movie.name = '~/Desktop/bm_demo.gif', ani.width = 6, ani.height = 6, ani.res=300) 295 | saveGIF({# 296 | ani.options(nmax = 100)# 297 | #par(oma=c())# 298 | #x=seq(1,9, length.out = ani.options('nmax'))# 299 | n=100# 300 | a = 0.4; b=1.3# 301 | x = runif(n,0,10)# 302 | y = a + b*x + rnorm(n, 0, 0.3)*x# 303 | ## use a loop to create images one by one# 304 | for (i in 1:n) {# 305 | plot(x,y, pch=20, ty='n')# 306 | points(x[1:i], y[1:i])# 307 | }# 308 | # 309 | }, interval = 0.05, movie.name = '~/Desktop/bm_demo.gif', ani.width = 6, ani.height = 6, ani.res=300) 310 | library(animation)# 311 | saveGIF({# 312 | ani.options(nmax = 100)# 313 | #par(oma=c())# 314 | #x=seq(1,9, length.out = ani.options('nmax'))# 315 | n=100# 316 | a = 0.4; b=1.3# 317 | x = runif(n,0,10)# 318 | y = a + b*x + rnorm(n, 0, 0.3)*x# 319 | ## use a loop to create images one by one# 320 | for (i in 1:ani.options('nmax')) {# 321 | plot(x,y, pch=20, ty='n')# 322 | points(x[1:i], y[1:i])# 323 | }# 324 | # 325 | }, interval = 0.05, movie.name = '~/Desktop/bm_demo.gif', ani.width = 6, ani.height = 6, ani.res=300) 326 | library(animation)# 327 | saveGIF({# 328 | n=100# 329 | a = 0.4; b=1.3# 330 | ani.options(nmax = n)# 331 | par(oma=c())# 332 | #x=seq(1,9, length.out = ani.options('nmax'))# 333 | x = runif(n,0,10)# 334 | y = a + b*x + rnorm(n, 0, 0.3)*x# 335 | ## use a loop to create images one by one# 336 | for (i in 1:ani.options('nmax')) {# 337 | plot(x,y, pch=20, ty='n')# 338 | points(x[1:i], y=[1:i], pch=20)# 339 | #plot(x[1:i], dnorm(x[1:i],5,1) ,ty='l', lwd=2, xlim=c(1,9),ylim=c(0,0.45), xlab='X', ylab='density', cex.lab=2, cex.axis=2)# 340 | }# 341 | # 342 | }, interval = 0.05, movie.name = '~/Desktop/bm_demo.gif', ani.width = 600, ani.height = 600) 343 | saveGIF({# 344 | n=100# 345 | a = 0.4; b=1.3# 346 | ani.options(nmax = n)# 347 | par(oma=c())# 348 | #x=seq(1,9, length.out = ani.options('nmax'))# 349 | x = runif(n,0,10)# 350 | y = a + b*x + rnorm(n, 0, 0.3)*x# 351 | ## use a loop to create images one by one# 352 | for (i in 1:ani.options('nmax')) {# 353 | plot(x,y, pch=20, ty='n')# 354 | points(x[1:i], y=[1:i], pch=20)# 355 | #plot(x[1:i], dnorm(x[1:i],5,1) ,ty='l', lwd=2, xlim=c(1,9),ylim=c(0,0.45), xlab='X', ylab='density', cex.lab=2, cex.axis=2)# 356 | }# 357 | # 358 | }, interval = 0.05, movie.name = '~/Desktop/bm_demo.gif', ani.width = 600, ani.height = 600) 359 | saveGIF({# 360 | n=100# 361 | a = 0.4; b=1.3# 362 | ani.options(nmax = n)# 363 | par(oma=c())# 364 | #x=seq(1,9, length.out = ani.options('nmax'))# 365 | x = runif(n,0,10)# 366 | y = a + b*x + rnorm(n, 0, 0.3)*x# 367 | ## use a loop to create images one by one# 368 | for (i in 1:ani.options('nmax')) {# 369 | plot(x,y, pch=20, ty='n')# 370 | #points(x[1:i], y=[1:i], pch=20)# 371 | #plot(x[1:i], dnorm(x[1:i],5,1) ,ty='l', lwd=2, xlim=c(1,9),ylim=c(0,0.45), xlab='X', ylab='density', cex.lab=2, cex.axis=2)# 372 | }# 373 | # 374 | }, interval = 0.05, movie.name = '~/Desktop/bm_demo.gif', ani.width = 600, ani.height = 600) 375 | plot(x,y) 376 | plot(x,y, ty='n') 377 | points(x[1:4], y[1:4], pch=20) 378 | saveGIF({# 379 | n=100# 380 | a = 0.4; b=1.3# 381 | ani.options(nmax = n)# 382 | par(oma=c())# 383 | #x=seq(1,9, length.out = ani.options('nmax'))# 384 | x = runif(n,0,10)# 385 | y = a + b*x + rnorm(n, 0, 0.3)*x# 386 | ## use a loop to create images one by one# 387 | for (i in 1:ani.options('nmax')) {# 388 | plot(x,y, pch=20, ty='n')# 389 | points(x[1:i], y=[1:i], pch=20)# 390 | #plot(x[1:i], dnorm(x[1:i],5,1) ,ty='l', lwd=2, xlim=c(1,9),ylim=c(0,0.45), xlab='X', ylab='density', cex.lab=2, cex.axis=2)# 391 | }# 392 | # 393 | }, interval = 0.05, movie.name = '~/Desktop/bm_demo.gif', ani.width = 600, ani.height = 600) 394 | saveGIF({# 395 | n=100# 396 | a = 0.4; b=1.3# 397 | ani.options(nmax = n)# 398 | par(oma=c())# 399 | #x=seq(1,9, length.out = ani.options('nmax'))# 400 | x = runif(n,0,10)# 401 | y = a + b*x + rnorm(n, 0, 0.3)*x# 402 | ## use a loop to create images one by one# 403 | for (i in 1:ani.options('nmax')) {# 404 | plot(x,y, pch=20, ty='n');# 405 | points(x[1:i], y=[1:i], pch=20)# 406 | #plot(x[1:i], dnorm(x[1:i],5,1) ,ty='l', lwd=2, xlim=c(1,9),ylim=c(0,0.45), xlab='X', ylab='density', cex.lab=2, cex.axis=2)# 407 | }# 408 | # 409 | }, interval = 0.05, movie.name = '~/Desktop/bm_demo.gif', ani.width = 600, ani.height = 600) 410 | saveGIF({# 411 | n=100# 412 | a = 0.4; b=1.3# 413 | ani.options(nmax = n)# 414 | par(oma=c())# 415 | #x=seq(1,9, length.out = ani.options('nmax'))# 416 | x = runif(n,0,10)# 417 | y = a + b*x + rnorm(n, 0, 0.3)*x# 418 | ## use a loop to create images one by one# 419 | for (i in 1:ani.options('nmax')) {# 420 | plot(x,y, pch=20, ty='n');# 421 | points(x[1:i], y[1:i], pch=20)# 422 | #plot(x[1:i], dnorm(x[1:i],5,1) ,ty='l', lwd=2, xlim=c(1,9),ylim=c(0,0.45), xlab='X', ylab='density', cex.lab=2, cex.axis=2)# 423 | }# 424 | # 425 | }, interval = 0.05, movie.name = '~/Desktop/bm_demo.gif', ani.width = 600, ani.height = 600) 426 | require(rstan)# 427 | data = list(n = length(y),# 428 | y = y, # 429 | X = x)# 430 | # 431 | fit = stan(model_code=my_model, data=data) 432 | require(rstan)# 433 | data = list(n = length(y),# 434 | y = y, # 435 | X = x)# 436 | # 437 | fit = stan(file='my_model.stan', data=data) 438 | require(rstan)# 439 | data = list(n = length(y),# 440 | y = y, # 441 | X = x)# 442 | # 443 | fit = stan(file='my_model.stan', data=data) 444 | setwd('Desktop/STAN/GITHUB/STAN_tutorials/Heteroskedasticity /') 445 | require(rstan)# 446 | data = list(n = length(y),# 447 | y = y, # 448 | X = x)# 449 | # 450 | fit = stan(file='my_model.stan', data=data) 451 | pairs(fit, pars=c('alpha','beta','sigma')) 452 | a 453 | b 454 | require(rstan)# 455 | data = list(n = length(y),# 456 | y = y, # 457 | X = x)# 458 | # 459 | fit = stan(file='heto_model.stan', data=data) 460 | pairs(fit, pars=c('alpha','beta','sigma')) 461 | ts = data.frame(year = time(EuStockMarkets))# 462 | stocks = data.frame(EuStockMarkets)# 463 | stocks 464 | ts = data.frame(year = time(EuStockMarkets))# 465 | stocks = data.frame(EuStockMarkets)# 466 | stocks 467 | t = ts$year# 468 | y = stocks$FTSE# 469 | pc_dif = diff(y)/y[-length(y)]*100 #calculate percentage change # 470 | plot(t, y, ty='l', xlab='year', ylab='price')# 471 | plot(t[-1], pc_dif, ty='l', xlab='year', ylab='percentage change') 472 | plot(t, y, ty='l', xlab='year', ylab='price') 473 | data = list(# 474 | T = length(pc_dif),# 475 | r = pc_dif# 476 | ) 477 | fit = stan(file='arch.stan', data=data) 478 | print(fit) 479 | params = extract(fit)# 480 | mu = mean(params$mu)# 481 | alpha0 = mean(params$alpha0)# 482 | alpha1 = mean(params$alpha1) 483 | pred = sapply(2:1860, function(x) mu + sqrt(alpha0 + alpha1*(pc_dif[x-1] - mu)^2) ) 484 | plot(t[-1], pc_dif, ty='l', xlab='time', ylab='percentage change') 485 | lines(t[-1], pred, lty='solid', col='red')# 486 | lines(t[-1], -pred, lty='solid', col='red') 487 | legend('topright', legend=c('Truth', 'Predicted Volatility'), col=c('black','red'), bty='n', lty='solid') 488 | 3*8 489 | 3*8*5 490 | 3*8*5*4*12 491 | 3370/(8*5*4*12) 492 | 33700/(8*5*4*12) 493 | 1000/30 494 | mail 495 | 10000/(30*2) 496 | 10000/(30*12) 497 | data = list(# 498 | T = length(pc_dif),# 499 | r = pc_dif,# 500 | sigma1 = 0.1# 501 | )# 502 | # 503 | fit = stan(file='garch.stan', data=data) 504 | params = extract(fit)# 505 | mu = mean(params$mu)# 506 | alpha0 = mean(params$alpha0)# 507 | alpha1 = mean(params$alpha1)# 508 | beta1 = mean(params$beta1)# 509 | sigma = colMeans(params$sigma) 510 | pred = sapply(2:1860, function(x) mu + sqrt(alpha0 # 511 | + alpha1 * (pc_dif[x-1] - mu)^2# 512 | + beta1 * (sigma[x-1])^2) ) 513 | plot(t[-1], pc_dif, ty='l', xlab='time', ylab='percentage change', main='FTSE') 514 | lines(t[-1], pred, lty='solid', col='red', lwd=1) 515 | lines(t[-1], -pred, lty='solid', col='red', lwd=1) 516 | legend('topright', legend=c('Truth', 'Predicted Volatility'), col=c('black','red'), bty='n', lty='solid') 517 | yCI = sapply(2:1860, function(x) quantile(params$mu + sqrt(params$alpha0 # 518 | + params$alpha1 * (pc_dif[x-1] - params$mu)^2# 519 | + params$beta1 * (params$sigma[,x-1])^2) , probs=c(0.05,0.95) )) 520 | polygon(x=c(t[2:1860], rev(t[2:1860]), t[2]), y=c(yCI[1,], rev(yCI[2,]),yCI[1,1]), col=rgb(1,0,0,0.1), border = NA) 521 | plot(t[-1], pc_dif, ty='l', xlab='time', ylab='percentage change', main='FTSE') 522 | lines(t[-1], pred, lty='solid', col='red', lwd=1) 523 | lines(t[-1], -pred, lty='solid', col='red', lwd=1) 524 | polygon(x=c(t[2:1860], rev(t[2:1860]), t[2]), y=c(yCI[1,], rev(yCI[2,]),yCI[1,1]), col=rgb(0,0,1,0.3), border = NA) 525 | polygon(x=c(t[2:1860], rev(t[2:1860]), t[2]), y=c(-yCI[1,], rev(-yCI[2,]),-yCI[1,1]), col=rgb(0,0,1,0.3), border = NA) 526 | legend('topright', legend=c('Truth', 'Predicted Volatility'), col=c('black','red'), bty='n', lty='solid') 527 | -------------------------------------------------------------------------------- /Heteroskedasticity/Heteroskedasticity.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Modelling Heteroscedasticity" 3 | output: html_notebook 4 | --- 5 | 6 | This notebook accompanies the Stan youtube tutorial on introduction to Heterogeneity and how to modeel heeteroscedasticity (https://youtu.be/nwuU-KEKXhU) 7 | Let's make a plot to show the difference between homogeneous variance and heterogeneous variance. 8 | 9 | ```{r, fig.height=2, fig.width=5} 10 | par(mfrow=c(1,2)) 11 | n=100 12 | x = runif(n,0,10) 13 | a = 0.4; b=1.3 14 | y = a + b*x + rnorm(n,0,1) 15 | 16 | plot(x,y, pch=20, main='homoscedacity') 17 | abline(a=a, b=b, col='red', lwd=2) 18 | 19 | y = a + b * x + rnorm(n, 0, sqrt(x)) 20 | plot(x,y, pch=20, main='heteroscedacity') 21 | abline(a=a, b=b, col='red', lwd=2) 22 | 23 | ``` 24 | 25 | Say we have some heteroscedastic data, 26 | ```{r, fig.height=2, fig.width=5} 27 | par(mfrow=c(1,2)) 28 | x = runif(n,0,10) 29 | y = a + b*x + rnorm(n, 0, 0.3)*x 30 | plot(x,y, pch=20) 31 | ``` 32 | 33 | 34 | We fit the data assuming homoscedasticity. 35 | ```{r} 36 | require(rstan) 37 | data = list(n = length(y), 38 | y = y, 39 | X = x) 40 | 41 | fit = stan(file='homo_model.stan', data=data) 42 | ``` 43 | 44 | ```{r} 45 | pairs(fit, pars=c('alpha','beta','sigma')) 46 | ``` 47 | Note that the parameters are not well recovered. Instead we should use a model that properly accounts for the heterescedastic variance. 48 | 49 | ```{r} 50 | fit = stan(file = 'heto_model.stan', data=data) 51 | ``` 52 | 53 | ```{r} 54 | pairs(fit, pars=c('alpha','beta','sigma')) 55 | ``` 56 | With a heteroscedastic model, we recover well the parameters. 57 | 58 | ## Financial data 59 | Now let's assume we are interested in buying stocks on the stock market. Ideally we want to purchase during a time of low volatility and this is one such scenario where modelling of heteroscedasticity is important. 60 | ```{r} 61 | ts = data.frame(year = time(EuStockMarkets)) 62 | stocks = data.frame(EuStockMarkets) 63 | stocks 64 | ``` 65 | 66 | For modelling heteroscedasticity we work with the percentage change of the stock price rather than the stock price itself 67 | ```{r} 68 | par(mfrow=c(1,2)) 69 | t = ts$year 70 | y = stocks$FTSE 71 | pc_dif = diff(y)/y[-length(y)]*100 #calculate percentage change 72 | plot(t, y, ty='l', xlab='year', ylab='price') 73 | plot(t[-1], pc_dif, ty='l', xlab='year', ylab='percentage change') 74 | ``` 75 | 76 | We fit the data using an ARCH[1] model that uses the previous value to predict the next value. 77 | ```{r} 78 | data = list( 79 | T = length(pc_dif), 80 | r = pc_dif 81 | ) 82 | 83 | fit = stan(file='arch.stan', data=data) 84 | ``` 85 | 86 | ```{r} 87 | #extract the parameters 88 | params = extract(fit) 89 | mu = mean(params$mu) 90 | alpha0 = mean(params$alpha0) 91 | alpha1 = mean(params$alpha1) 92 | ``` 93 | 94 | ```{r} 95 | # Let's use our best fit model to predict the volatility from the previous day. 96 | pred = sapply(2:1860, function(x) mu + sqrt(alpha0 + alpha1*(pc_dif[x-1] - mu)^2) ) 97 | ``` 98 | 99 | ```{r} 100 | plot(t[-1], pc_dif, ty='l', xlab='time', ylab='percentage change') 101 | lines(t[-1], pred, lty='solid', col='red') 102 | lines(t[-1], -pred, lty='solid', col='red') 103 | legend('topright', legend=c('Truth', 'Predicted Volatility'), col=c('black','red'), bty='n', lty='solid') 104 | ``` 105 | We dont expect the prediction to match the truth because we are fitting the volatility not the percentage change! but we do hope that the volatility is higher when the percentage change is higher and thats what we see. 106 | 107 | The arch model was nice but we can do better with a generalised ARCH or GARCH model. The garch model not only takes into account the previous data point, but also the volatility of the previous data point which makes it even more robust. However the downside is that you need provide not only the first point but the volatility at that point 108 | 109 | 110 | 111 | ```{r} 112 | data = list( 113 | T = length(pc_dif), 114 | r = pc_dif, 115 | sigma1 = 0.1 116 | ) 117 | 118 | fit = stan(file='garch.stan', data=data) 119 | ``` 120 | 121 | ```{r} 122 | #extract parameters from fit 123 | params = extract(fit) 124 | mu = mean(params$mu) 125 | alpha0 = mean(params$alpha0) 126 | alpha1 = mean(params$alpha1) 127 | beta1 = mean(params$beta1) 128 | sigma = colMeans(params$sigma) 129 | ``` 130 | 131 | 132 | ```{r} 133 | # Let's use our best fit model to predict the volatility from the previous day. 134 | pred = sapply(2:1860, function(x) mu + sqrt(alpha0 135 | + alpha1 * (pc_dif[x-1] - mu)^2 136 | + beta1 * (sigma[x-1])^2) ) 137 | ``` 138 | 139 | 140 | ```{r} 141 | # get uncertainties 142 | yCI = sapply(2:1860, function(x) quantile(params$mu + sqrt(params$alpha0 143 | + params$alpha1 * (pc_dif[x-1] - params$mu)^2 144 | + params$beta1 * (params$sigma[,x-1])^2) , probs=c(0.05,0.95) )) 145 | ``` 146 | 147 | ```{r} 148 | plot(t[-1], pc_dif, ty='l', xlab='time', ylab='percentage change', main='FTSE') 149 | legend('topright', legend=c('Truth', 'Predicted Volatility'), col=c('black','red'), bty='n', lty='solid') 150 | polygon(x=c(t[2:1860], rev(t[2:1860]), t[2]), y=c(yCI[1,], rev(yCI[2,]),yCI[1,1]), col=rgb(1,0,0,0.1), border = NA) #plot envelope of uncertainties 151 | polygon(x=c(t[2:1860], rev(t[2:1860]), t[2]), y=c(-yCI[1,], rev(-yCI[2,]),-yCI[1,1]), col=rgb(1,0,0,0.1), border = NA) #plot envelope of uncertainties 152 | lines(t[-1], pred, lty='solid', col='red', lwd=1) 153 | lines(t[-1], -pred, lty='solid', col='red', lwd=1) 154 | ``` 155 | Notice how the predictions are much more smooth now. The uncertainties are barely visible even zooming in. 156 | -------------------------------------------------------------------------------- /Heteroskedasticity/arch.stan: -------------------------------------------------------------------------------- 1 | data { 2 | int T; 3 | real r[T]; 4 | } 5 | 6 | parameters { 7 | real mu; //average return 8 | real alpha0; // noise intercept 9 | real alpha1; //noise slope 10 | } 11 | 12 | model { 13 | //priors 14 | mu ~ normal(0,10); 15 | alpha0 ~ normal(0,10); 16 | alpha1 ~ normal(0,10); 17 | 18 | for(t in 2:T){ 19 | r[t] ~ normal(mu, sqrt(alpha0 + alpha1 * pow(r[t-1] - mu, 2))); //likelihood 20 | } 21 | } 22 | -------------------------------------------------------------------------------- /Heteroskedasticity/garch.stan: -------------------------------------------------------------------------------- 1 | data { 2 | int T; 3 | real r[T]; 4 | real sigma1; //scale of noise at t=1 5 | } 6 | 7 | parameters { 8 | real mu; 9 | real alpha0; 10 | real alpha1; 11 | real beta1; 12 | } 13 | 14 | transformed parameters{ 15 | real sigma[T]; 16 | sigma[1] = sigma1; 17 | for (t in 2:T){ 18 | sigma[t] = sqrt(alpha0 19 | + alpha1 * pow(r[t-1] - mu, 2) 20 | + beta1 * pow(sigma[t-1],2)); 21 | } 22 | } 23 | 24 | model { 25 | //priors 26 | alpha0 ~ normal(0,10); 27 | alpha1 ~ normal(0,10); 28 | beta1 ~ normal(0,10); 29 | 30 | //likelihood 31 | r ~ normal(mu, sigma); 32 | } 33 | -------------------------------------------------------------------------------- /Heteroskedasticity/heto_model.stan: -------------------------------------------------------------------------------- 1 | data { 2 | int n; 3 | vector[n] y; 4 | vector[n] X; 5 | } 6 | 7 | parameters { 8 | real alpha; 9 | real beta; 10 | real sigma; 11 | } 12 | 13 | model { 14 | //priors 15 | alpha ~ normal(0,10); 16 | beta ~ normal(0,10); 17 | sigma ~ cauchy(0,1); 18 | 19 | //likelihood 20 | y~normal(alpha + beta*X, sigma*X); 21 | } 22 | -------------------------------------------------------------------------------- /Heteroskedasticity/homo_model.stan: -------------------------------------------------------------------------------- 1 | data { 2 | int n; 3 | vector[n] y; 4 | vector[n] X; 5 | } 6 | 7 | parameters { 8 | real alpha; 9 | real beta; 10 | real sigma; 11 | } 12 | 13 | model { 14 | //priors 15 | alpha ~ normal(0,10); 16 | beta ~ normal(0,10); 17 | sigma ~ cauchy(0,1); 18 | 19 | //likelihood 20 | y~normal(alpha + beta*X, sigma); 21 | } 22 | -------------------------------------------------------------------------------- /Hierarchical/.Rapp.history: -------------------------------------------------------------------------------- 1 | tosses = rbinom(10000, 1, 0.5) # 2 | means = sapply(1:10000, function(x) mean(tosses[1:x]))# 3 | plot(means, ty='l', ylim=c(0,1), xlab='toss', ylab='mean of tosses')# 4 | abline(h=0.5, lty='dashed') 5 | tosses = rbinom(5000, 1, 0.5) # 6 | means = sapply(1:5000, function(x) mean(tosses[1:x]))# 7 | plot(means, ty='l', ylim=c(0,1), xlab='toss', ylab='mean of tosses')# 8 | abline(h=0.5, lty='dashed') 9 | library(help = "datasets") 10 | HairEyeColor 11 | EuStockMarkets 12 | head(EuStockMarkets) 13 | plot(EuStockMarkets) 14 | require(graphics)# 15 | pairs(mtcars, main = "mtcars data", gap = 1/4) 16 | require(graphics)# 17 | pairs(mtcars, main = "mtcars data", gap = 1/4)# 18 | coplot(mpg ~ disp | as.factor(cyl), data = mtcars,# 19 | panel = panel.smooth, rows = 1) 20 | mtcars2 <- within(mtcars, {# 21 | vs <- factor(vs, labels = c("V", "S"))# 22 | am <- factor(am, labels = c("automatic", "manual"))# 23 | cyl <- ordered(cyl)# 24 | gear <- ordered(gear)# 25 | carb <- ordered(carb)# 26 | })# 27 | summary(mtcars2) 28 | data(schools) 29 | schoolls 30 | schools 31 | require(rstan) 32 | a = matrix(data = 1, nrow=3, ncol=3) 33 | a 34 | b = matrix(data = 2, nrow=3, ncol=3) 35 | a 36 | v 37 | c(a,b) 38 | tosses = rbinom(n=10, size=1,prob=0.5) 39 | mean(tosses) 40 | tosses = rbinom(n=5000, size=1, prob=0.5) 41 | means = sapply(1:5000, function(x) mean(tosses[1:x])) 42 | plot(means, ty='l', ylim = c(0,1), xlab='toss', ylab='mean of tosses') 43 | abline(h=0.5, lty='dashed') 44 | require(magaxis) 45 | require(magixaxis) 46 | require(magicaxis) 47 | plot(means, ty='l', ylim = c(0,1), xlab='', ylab='' axes=FALSE) 48 | plot(means, ty='l', ylim = c(0,1), xlab='', ylab='', axes=FALSE) 49 | magaxis(1:2) 50 | magaxis(3:4, labels=FALSE) 51 | mtext('tosses', side=1, line=2) 52 | mtext('mean of tosses', side=2, line=2) 53 | plot(means, ty='l', ylim = c(0,1), xlab='', ylab='', axes=FALSE, lwd=2) 54 | magaxis(1:2) 55 | magaxis(3:4, labels=FALSE) 56 | mtext('tosses', side=1, line=2) 57 | mtext('mean of tosses', side=2, line=2) 58 | abline(h=0.5, lty='dashed') 59 | plot(means, ty='n', ylim = c(0,1), xlab='', ylab='', axes=FALSE, lwd=2) 60 | magaxis(1:2) 61 | magaxis(3:4, labels=FALSE) 62 | mtext('tosses', side=1, line=2) 63 | mtext('mean of tosses', side=2, line=2) 64 | abline(h=0.5, lty='dashed') 65 | sample_pois = sapply(1:10000, function(x) rpois(1000,0.6)) 66 | sample_exp = sapply(1:10000, function(x) rexp(1000,0.6)) 67 | mean_pois = colMeans(sample_pois) 68 | mean_exp = colMeans(sample_exp) 69 | plot(density(mean_pois)) 70 | plot(density(mean_exp)) 71 | acf 72 | schools_dat = list( J = 8, y = c(28,8,-3,7,-1,1,18,12), sigma = c(15,10,16,11,9,11,10,18)) 73 | require(rstan) 74 | schools_dat = list( J = 8, y = c(28,8,-3,7,-1,1,18,12), sigma = c(15,10,16,11,9,11,10,18)) 75 | require(rstan) 76 | fit = stan('model.stan', data=schools_dat, chains=4, iter=20) 77 | schools_dat = list( J = 8, y = c(28,8,-3,7,-1,1,18,12), sigma = c(15,10,16,11,9,11,10,18)) 78 | require(rstan) 79 | fit = stan('model.stan', data = schools_dat, chains=4, iter=20) 80 | params = extract(fit, permuted=FALSE, inc_warmup=TRUE) 81 | plot(c(-5,25), c(-5,25), ty='n', xlab='mu', ylab='tau') 82 | lines(params[,'chain:1', 'mu'], params[,'chain:1','tau'], col='black', ty='o', pch=20) 83 | lines(params[,'chain:2', 'mu'], params[,'chain:2','tau'], col='orange', ty='o', pch=20) 84 | lines(params[,'chain:3', 'mu'], params[,'chain:3','tau'], col='red', ty='o', pch=20) 85 | lines(params[,'chain:4', 'mu'], params[,'chain:4','tau'], col='gray', ty='o', pch=20) 86 | traceplot(fit,pars=c('mu','tau')) 87 | fit = stan('model.stan', data=schools_dat, chains=4, iter=1000) 88 | traceplot(fit, pars=c('mu','tau')) 89 | fit = stan('model.stan', data=schools_dat, chains=4, iter=1000) 90 | traceplot(fit, pars=c('mu','tau')) 91 | fit = stan('model.stan', data=schools_dat, chains=4, iter=1000) 92 | traceplot(fit, pars=c('mu','tau')) 93 | print(fit) 94 | neff = summary(fit)$summary[,'n_eff'] 95 | neff/2000 96 | monitor(extract(fit, permuted=FALSE, inc_warmup=FALSE)) 97 | fit = stan('model.stan', data=schools_dat, chains=4, iter=10000, warmup=500) 98 | pairs(fit, pars=c('mu', 'tau','lp__')) 99 | fit = stan('model.stan', data=schools_dat, chains=4, iter=1000, warmup=500, control=list(adapt_delta=0.85)) 100 | fit = stan('model.stan', data=schools_dat, chains=4, iter=10000, warmup=500) 101 | pairs(fit, pars=c('mu', 'tau','lp__')) 102 | fit = stan('model.stan', data=schools_dat, chains=4, iter=10000, warmup=500) 103 | pairs(fit, pars=c('mu', 'tau','lp__')) 104 | fit = stan('model.stan', data=schools_dat, chains=4, iter=10000, warmup=500, control=list(adapt_delta=0.85)) 105 | get_num_divergent(fit) 106 | fit = stan('model.stan', data=schools_dat, chains=4, iter=10000, warmup=500) 107 | print(fit) 108 | fit = stan('model.stan', data=schools_dat, chains=4, iter=10000, warmup=500) 109 | fit = stan('model.stan', data=schools_dat, chains=4, iter=10000, warmup=50) 110 | fit = stan('model.stan', data=schools_dat, chains=4, iter=10000, warmup=500) 111 | pairs(fit, pars=c('mu', 'tau','lp__')) 112 | fit = stan('model.stan', data=schools_dat, chains=4, iter=10000, warmup=500, control=list(adapt_delta=0.85)) 113 | get_num_divergent(fit) 114 | fit = stan('model.stan', data=schools_dat, chains=4, iter=10000, warmup=500, control=list(adapt_delta=0.85)) 115 | get_num_divergent(fit) 116 | fit = stan('model.stan', data=schools_dat, chains=4, iter=10000, warmup=500, control=list(adapt_delta=0.85)) 117 | get_num_divergent(fit) 118 | fit = stan('model.stan', data=schools_dat, chains=4, iter=10000, warmup=500, control=list(adapt_delta=0.85)) 119 | get_num_divergent(fit) 120 | fit = stan('model.stan', data=schools_dat, chains=4, iter=10000, warmup=500, control=list(adapt_delta=0.85)) 121 | get_num_divergent(fit) 122 | fit = stan('model.stan', data=schools_dat, chains=4, iter=10000, warmup=500, control=list(adapt_delta=0.85)) 123 | get_num_divergent(fit) 124 | fit = stan('model.stan', data=schools_dat, chains=4, iter=10000, warmup=500, control=list(adapt_delta=0.85)) 125 | get_num_divergent(fit) 126 | fit = stan('model.stan', data=schools_dat, chains=4, iter=10000, warmup=500, control=list(adapt_delta=0.85)) 127 | get_num_divergent(fit) 128 | fit = stan('model.stan', data=schools_dat, chains=4, iter=10000, warmup=500, control=list(adapt_delta=0.95)) 129 | get_num_divergent(fit) 130 | print(fit) 131 | get_num_divergent(fit) 132 | fit = stan('model.stan', data=schools_dat, chains =4, iter=10000, warmup=500, control=list(adapt_delta=0.95)) 133 | 82.24*12 134 | 30000/(82.24*12) 135 | 21414+9500 136 | (21414+9500-12500)*0.2 137 | 8084-3682 138 | 21414-(21414+9500-12500)*0.2 139 | (7238+9500-12500)*0.2 140 | (14333+9500-12500)*0.2 141 | 9449+18507-3682 142 | 9449+18507-847.6-2266.6 143 | (14333-12500)*0.2 144 | 9449+18507-847.6-366.6 145 | 10000*7*8.5 146 | 28*86\ 147 | 28*86 148 | 2000+(9*150)+(9*350)+2000+(4*100) 149 | 60652/157695 150 | 60652/(60652+157695) 151 | 50/(180) 152 | 54/(180) 153 | 12+3+7 154 | 90/22 155 | 2000+(9*150)+9*350+2000+100*4 156 | datasets(E) 157 | datasets() 158 | data(EUStockMarkets) 159 | data(EUStockmarkets) 160 | data() 161 | data(EuStockMarkets) 162 | EuStockMarkets 163 | write.csv('EuStockMarkets.csv', EuStockMarkets) 164 | time(EuStockMarkets) 165 | time(EuStockMarkets)$time 166 | time(EuStockMarkets)$t 167 | EuStockMarkets$FTSE 168 | EuStockMarkets['FTSE'] 169 | t=time(EuStockMarkets) 170 | t 171 | EuStockMarkets 172 | EuStockMarkets['Stocks'] 173 | EuStockMarkets['Stock'] 174 | names(EuStockMarkets) 175 | head(EuStockMarkets) 176 | EuStockMarkets[,'FTSE'] 177 | head(EuStockMarkets) 178 | ftse= EuStockMarkets[,'FTSE'] 179 | out = data.frame('t':t, 'FTSE':ftse) 180 | dim(t) 181 | t['time'] 182 | head(t) 183 | t 184 | ftse 185 | ts = data.frame(year=time(EuStockMarkets)) 186 | stocks=data.frame(EuStockMarkets) 187 | ts 188 | stocks=data.frame(year = time(EuStockMarkets), EuStockMarkets) 189 | stocks 190 | write.csv('EuStockMarket', stocks, quote=FALSE, row.names=FALSE) 191 | write.csv(stocks,'EuStockMarket',quote=FALSE, row.names=FALSE) 192 | data() 193 | rgamma(100,0.1,0.1) 194 | plot(density(rgamma(1000,0.1,0.1))) 195 | plot(density(rgamma(1000,0.1,0.1)), xlim=x(0,1)) 196 | plot(density(rgamma(1000,0.1,0.1)), xlim=c(0,1)) 197 | plot(density(rgamma(1000,0.1,0.01)), xlim=c(0,1)) 198 | plot(density(rgamma(1000,0.1,0.1)), xlim=c(0,1)) 199 | plot(density(rgamma(1000,0.3,0.3)), xlim=c(0,1)) 200 | 186.19-9.15 201 | (186.19-9.15)/24 202 | 7.37/10000 203 | 5/50000 204 | 7*5 205 | 35/5 206 | 430000-60000 207 | 43000*0.2 208 | 430000*0.2 209 | 86000+37000 210 | 60000+180000 211 | (60000+180000)/0.8*1 212 | 335*0.2 213 | 335-67-60 214 | x=seq(-10,10,0.1) 215 | y=dpois(x, 0.2) 216 | x=seq(0,10,0.1) 217 | y=dpois(x, 0.2) 218 | warnings() 219 | x=seq(-10,10,1) 220 | y=dpois(x, 0.2) 221 | plot(x,y) 222 | plot(x,y, ty='l') 223 | x=seq(0,20,1) 224 | y=dpois(x, 0.2) 225 | plot(x,y, ty='l') 226 | y=dpois(x, 0.5) 227 | lines(x,y) 228 | y=dpois(x, 4) 229 | lines(x,y) 230 | y=dpois(x, 8) 231 | lines(x,y) 232 | y=dpois(x, 0.8) 233 | lines(x,y) 234 | plot(x,y, ty='l', xlab=x, ylab='density') 235 | plot(x,y, ty='l', xlab=, ylab='density') 236 | plot(x,y, ty='l', xlab='x', ylab='density') 237 | y=dpois(x, 5) 238 | lines(x,y) 239 | y=dpois(x, 12) 240 | lines(x,y) 241 | y=dpois(x, 0.8) 242 | plot(x,y, ty='l', xlab='x', ylab='density') 243 | y=dpois(x, 5) 244 | lines(x,y) 245 | hist(y) 246 | plot(x,y,ty='s') 247 | plot(x,y,ty='o') 248 | plot(x,y,ty='h') 249 | plot(x,y,ty='h', lwd=3) 250 | y=dpois(x, 0.8) 251 | plot(x,y,ty='h', lwd=5, xlab='x', ylab='density') 252 | y=dpois(x, 5) 253 | points(x,y, ty='h', col='red',lwd=3) 254 | y=dpois(x, 12) 255 | points(x,y, ty='h', col='orange',lwd=1) 256 | points(x,y, ty='h', col='gray',lwd=1) 257 | points(x,y, ty='h', col='blue',lwd=1) 258 | y=dpois(x, 0.8) 259 | plot(x,y,ty='h', lwd=5, xlab='x', ylab='density') 260 | y=dpois(x, 5) 261 | points(x,y, ty='h', col='orange',lwd=3) 262 | y=dpois(x, 12) 263 | points(x,y, ty='h', col='red',lwd=1) 264 | 2e6*0.04 265 | setwd('Desktop/STAN/GITHUB/STAN_tutorials/Hierarchical/') 266 | data = read.csv('/Users/maggielieu/Desktop/STAN/GITHUB/STAN_tutorials/Hierarchical/premiereleague.csv',col.names = c('Home','score1', 'score2', 'Away'), stringsAsFactors = FALSE) 267 | head(data) 268 | -------------------------------------------------------------------------------- /Hierarchical/.Rhistory: -------------------------------------------------------------------------------- 1 | require(rstan) 2 | set.seed(1) #set seed 3 | data = read.csv('/Users/maggielieu/Desktop/STAN/GITHUB/STAN_tutorials/Hierarchical/premiereleague.csv',col.names = c('Home','score1', 'score2', 'Away'), stringsAsFactors = FALSE) 4 | data 5 | ng = nrow(data) 6 | cat('we have G =', ng, 'games \n') 7 | nt = length(unique(data$Home)) 8 | cat('we have T = ', nt, 'teams \n') 9 | teams = unique(data$Home) 10 | ht = unlist(sapply(1:ng, function(g) which(teams == data$Home[g]))) 11 | at = unlist(sapply(1:ng, function(g) which(teams == data$Away[g]))) 12 | # we will save the last 5 games to predict 13 | np=5 14 | ngob = ng-np 15 | my_data = list( 16 | nt = nt, 17 | ng = ngob, 18 | ht = ht[1:ngob], 19 | at = at[1:ngob], 20 | s1 = data$score1[1:ngob], 21 | s2 = data$score2[1:ngob], 22 | np = np, 23 | htnew = ht[(ngob+1):ng], 24 | atnew = at[(ngob+1):ng] 25 | ) 26 | # we will save the last 5 games to predict 27 | np=5 28 | ngob = ng-np 29 | my_data = list( 30 | nt = nt, 31 | ng = ngob, 32 | ht = ht[1:ngob], 33 | at = at[1:ngob], 34 | s1 = data$score1[1:ngob], 35 | s2 = data$score2[1:ngob], 36 | np = np, 37 | htnew = ht[(ngob+1):ng], 38 | atnew = at[(ngob+1):ng] 39 | ) 40 | nhfit = stan(file = 'non_hier_model.stan', data = my_data) 41 | par(mfrow=c(5,2), mar=c(0,0,0,0), oma=c(4,4,1,1)) 42 | nhparams = extract(nhfit) 43 | lb = c(FALSE,FALSE,FALSE,FALSE,TRUE) 44 | for(i in 1:5){ 45 | plot(density(nhparams$s1new[,i]), main='', xlab = '', ylab='', xlim=c(0,10),xaxt='n',yaxt='n', ann=FALSE) 46 | axis(side=1, tck=0.05,labels=lb[i], at = seq(0,10)) 47 | abline(v=data$score1[ngob+i],col='red') 48 | legend('topright', legend=c(data$Home[ngob+i]),bty='n') 49 | plot(density(nhparams$s2new[,i]), main='', xlab = '', ylab='', xlim=c(0,10),xaxt='n',yaxt='n', ann=FALSE) 50 | axis(side=1, tck=0.05,labels=lb[i], at = seq(0,10)) 51 | abline(v=data$score2[ngob+i],col='red') 52 | legend('topright', legend=c(data$Away[ngob+i]),bty='n') 53 | } 54 | mtext('score',side=1, outer=TRUE, line=2) 55 | mtext('density',side=2, outer=TRUE, line=2) 56 | colMeans(nhparams$s1new) 57 | pred_scores = c(colMeans(nhparams$s1new),colMeans(nhparams$s2new)) 58 | true_scores = c(data$score1[(ngob+1):ng],data$score2[(ngob+1):ng] ) 59 | plot(true_scores, pred_scores) 60 | pred_scores = c(colMeans(nhparams$s1new),colMeans(nhparams$s2new)) 61 | true_scores = c(data$score1[(ngob+1):ng],data$score2[(ngob+1):ng] ) 62 | plot(true_scores, pred_scores, xlim=c(0,6), ylim=c(0,6), pch=20) 63 | pred_scores = c(colMeans(nhparams$s1new),colMeans(nhparams$s2new)) 64 | true_scores = c(data$score1[(ngob+1):ng],data$score2[(ngob+1):ng] ) 65 | plot(true_scores, pred_scores, xlim=c(0,6), ylim=c(0,6), pch=20) 66 | pred_scores = c(colMeans(nhparams$s1new),colMeans(nhparams$s2new)) 67 | true_scores = c(data$score1[(ngob+1):ng],data$score2[(ngob+1):ng] ) 68 | plot(true_scores, pred_scores, xlim=c(0,6), ylim=c(0,6), pch=20) 69 | pred_scores = c(colMeans(nhparams$s1new),colMeans(nhparams$s2new)) 70 | true_scores = c(data$score1[(ngob+1):ng],data$score2[(ngob+1):ng] ) 71 | plot(true_scores, pred_scores, xlim=c(0,6), ylim=c(0,6), pch=20) 72 | abline(a=0, b=1, lty='dashed) 73 | pred_scores = c(colMeans(nhparams$s1new),colMeans(nhparams$s2new)) 74 | true_scores = c(data$score1[(ngob+1):ng],data$score2[(ngob+1):ng] ) 75 | plot(true_scores, pred_scores, xlim=c(0,6), ylim=c(0,6), pch=20) 76 | abline(a=0, b=1, lty='dashed') 77 | pred_scores = c(colMeans(nhparams$s1new),colMeans(nhparams$s2new)) 78 | true_scores = c(data$score1[(ngob+1):ng],data$score2[(ngob+1):ng] ) 79 | plot(true_scores, pred_scores, xlim=c(0,6), ylim=c(0,6), pch=20, ylab='predicted scores', xlab='true scores') 80 | abline(a=0, b=1, lty='dashed') 81 | nhfit = stan(file = 'non_hier_model.stan', data = my_data) 82 | nhparams = extract(nhfit) 83 | pred_scores = c(colMeans(nhparams$s1new),colMeans(nhparams$s2new)) 84 | true_scores = c(data$score1[(ngob+1):ng],data$score2[(ngob+1):ng] ) 85 | plot(true_scores, pred_scores, xlim=c(0,6), ylim=c(0,6), pch=20, ylab='predicted scores', xlab='true scores') 86 | abline(a=0, b=1, lty='dashed') 87 | nhparams = extract(nhfit) 88 | pred_scores = c(colMeans(nhparams$s1new),colMeans(nhparams$s2new)) 89 | pred_errors = c(sapply(1:np, function(x) sd(nhparams$s1new[,x])),sapply(1:np, function(x) sd(nhparams$s1new[,x]))) 90 | true_scores = c(data$score1[(ngob+1):ng],data$score2[(ngob+1):ng] ) 91 | plot(true_scores, pred_scores, xlim=c(0,6), ylim=c(0,6), pch=20, ylab='predicted scores', xlab='true scores') 92 | abline(a=0, b=1, lty='dashed') 93 | arrows(true_scores, pred_scores+pred_errors, true_scores, pred_scores-pred_errors, length = 0.05, angle = 90, code = 3) 94 | nhparams = extract(nhfit) 95 | pred_scores = c(colMeans(nhparams$s1new),colMeans(nhparams$s2new)) 96 | pred_errors = c(sapply(1:np, function(x) sd(nhparams$s1new[,x])),sapply(1:np, function(x) sd(nhparams$s1new[,x]))) 97 | true_scores = c(data$score1[(ngob+1):ng],data$score2[(ngob+1):ng] ) 98 | plot(true_scores, pred_scores, xlim=c(0,4), ylim=c(0,4), pch=20, ylab='predicted scores', xlab='true scores') 99 | abline(a=0, b=1, lty='dashed') 100 | arrows(true_scores, pred_scores+pred_errors, true_scores, pred_scores-pred_errors, length = 0.05, angle = 90, code = 3) 101 | nhparams = extract(nhfit) 102 | pred_scores = c(colMeans(nhparams$s1new),colMeans(nhparams$s2new)) 103 | pred_errors = c(sapply(1:np, function(x) sd(nhparams$s1new[,x])),sapply(1:np, function(x) sd(nhparams$s1new[,x]))) 104 | true_scores = c(data$score1[(ngob+1):ng],data$score2[(ngob+1):ng] ) 105 | plot(true_scores, pred_scores, xlim=c(0,4), ylim=c(0,4), pch=20, ylab='predicted scores', xlab='true scores') 106 | abline(a=0, b=1, lty='dashed') 107 | arrows(true_scores, pred_scores+pred_errors, true_scores, pred_scores-pred_errors, length = 0.05, angle = 90, code = 3) 108 | nhfit = stan(file = 'non_hier_model.stan', data = my_data) 109 | pred_scores = c(colMeans(nhparams$s1new),colMeans(nhparams$s2new)) 110 | true_scores = c(data$score1[(ngob+1):ng],data$score2[(ngob+1):ng] ) 111 | plot(true_scores, pred_scores, xlim=c(0,6), ylim=c(0,6), pch=20, ylab='predicted scores', xlab='true scores') 112 | abline(a=0, b=1, lty='dashed') 113 | pred_errors = c(sapply(1:np, function(x) sd(nhparams$s1new[,x])),sapply(1:np, function(x) sd(nhparams$s1new[,x]))) 114 | arrows(true_scores, pred_scores+pred_errors, true_scores, pred_scores-pred_errors, length = 0.05, angle = 90, code = 3) 115 | par(mfrow=c(5,2), mar=c(0,0,0,0), oma=c(4,4,1,1)) 116 | nhparams = extract(nhfit) 117 | lb = c(FALSE,FALSE,FALSE,FALSE,TRUE) 118 | for(i in 1:5){ 119 | plot(density(nhparams$s1new[,i]), main='', xlab = '', ylab='', xlim=c(0,10),xaxt='n',yaxt='n', ann=FALSE) 120 | axis(side=1, tck=0.05,labels=lb[i], at = seq(0,10)) 121 | abline(v=data$score1[ngob+i],col='red') 122 | legend('topright', legend=c(data$Home[ngob+i]),bty='n') 123 | plot(density(nhparams$s2new[,i]), main='', xlab = '', ylab='', xlim=c(0,10),xaxt='n',yaxt='n', ann=FALSE) 124 | axis(side=1, tck=0.05,labels=lb[i], at = seq(0,10)) 125 | abline(v=data$score2[ngob+i],col='red') 126 | legend('topright', legend=c(data$Away[ngob+i]),bty='n') 127 | } 128 | mtext('score',side=1, outer=TRUE, line=2) 129 | mtext('density',side=2, outer=TRUE, line=2) 130 | pred_scores = c(colMeans(nhparams$s1new),colMeans(nhparams$s2new)) 131 | true_scores = c(data$score1[(ngob+1):ng],data$score2[(ngob+1):ng] ) 132 | plot(true_scores, pred_scores, xlim=c(0,6), ylim=c(0,6), pch=20, ylab='predicted scores', xlab='true scores') 133 | abline(a=0, b=1, lty='dashed') 134 | pred_errors = c(sapply(1:np, function(x) sd(nhparams$s1new[,x])),sapply(1:np, function(x) sd(nhparams$s1new[,x]))) 135 | arrows(true_scores, pred_scores+pred_errors, true_scores, pred_scores-pred_errors, length = 0.05, angle = 90, code = 3) 136 | pred_scores = c(colMeans(nhparams$s1new),colMeans(nhparams$s2new)) 137 | true_scores = c(data$score1[(ngob+1):ng],data$score2[(ngob+1):ng] ) 138 | plot(true_scores, pred_scores, xlim=c(0,6), ylim=c(0,6), pch=20, ylab='predicted scores', xlab='true scores') 139 | abline(a=0, b=1, lty='dashed') 140 | pred_errors = c(sapply(1:np, function(x) sd(nhparams$s1new[,x])),sapply(1:np, function(x) sd(nhparams$s1new[,x]))) 141 | arrows(true_scores, pred_scores+pred_errors, true_scores, pred_scores-pred_errors, length = 0.05, angle = 90, code = 3, col=rgb(0,0,0,0.3)) 142 | pred_scores = c(colMeans(nhparams$s1new),colMeans(nhparams$s2new)) 143 | true_scores = c(data$score1[(ngob+1):ng],data$score2[(ngob+1):ng] ) 144 | plot(true_scores, pred_scores, xlim=c(0,5), ylim=c(0,5), pch=20, ylab='predicted scores', xlab='true scores') 145 | abline(a=0, b=1, lty='dashed') 146 | pred_errors = c(sapply(1:np, function(x) sd(nhparams$s1new[,x])),sapply(1:np, function(x) sd(nhparams$s1new[,x]))) 147 | arrows(true_scores, pred_scores+pred_errors, true_scores, pred_scores-pred_errors, length = 0.05, angle = 90, code = 3, col=rgb(0,0,0,0.3)) 148 | nhparams = extract(nhfit) 149 | pred_scores = c(colMeans(nhparams$s1new),colMeans(nhparams$s2new)) 150 | pred_errors = c(sapply(1:np, function(x) sd(nhparams$s1new[,x])),sapply(1:np, function(x) sd(nhparams$s1new[,x]))) 151 | true_scores = c(data$score1[(ngob+1):ng],data$score2[(ngob+1):ng] ) 152 | plot(true_scores, pred_scores, xlim=c(0,4), ylim=c(0,4), pch=20, ylab='predicted scores', xlab='true scores') 153 | abline(a=0, b=1, lty='dashed') 154 | arrows(true_scores, pred_scores+pred_errors, true_scores, pred_scores-pred_errors, length = 0.05, angle = 90, code = 3) 155 | nhfit = stan(file = 'non_hier_model_pool.stan', data = my_data) 156 | nhparams = extract(nhfit) 157 | pred_scores = c(colMeans(nhparams$s1new),colMeans(nhparams$s2new)) 158 | pred_errors = c(sapply(1:np, function(x) sd(nhparams$s1new[,x])),sapply(1:np, function(x) sd(nhparams$s1new[,x]))) 159 | true_scores = c(data$score1[(ngob+1):ng],data$score2[(ngob+1):ng] ) 160 | plot(true_scores, pred_scores, xlim=c(0,4), ylim=c(0,4), pch=20, ylab='predicted scores', xlab='true scores') 161 | abline(a=0, b=1, lty='dashed') 162 | arrows(true_scores, pred_scores+pred_errors, true_scores, pred_scores-pred_errors, length = 0.05, angle = 90, code = 3) 163 | attack = colMeans(nhparams$att) 164 | defense = colMeans(nhparams$def) 165 | plot(attack,defense,xlim=c(-0.4,1.1)) 166 | abline(h=0) 167 | abline(v=0) 168 | text(attack,defense, labels=teams, cex=0.7, pos=4) 169 | hfit = stan(file = 'hier_model.stan', data = my_data) 170 | pairs(hfit, pars=c('mu_att', 'tau_att', 'mu_def', 'tau_def')) 171 | hparams = extract(hfit) 172 | par(mfrow=c(5,2), mar=c(0,0,0,0), oma=c(4,4,1,1)) 173 | lb = c(FALSE,FALSE,FALSE,FALSE,TRUE) 174 | for(i in 1:5){ 175 | plot(density(hparams$s1new[,i]), main='', xlab = '', ylab='', xlim=c(0,10),xaxt='n',yaxt='n', ann=FALSE) 176 | axis(side=1, tck=0.05,labels=lb[i], at = seq(0,10)) 177 | abline(v=data$score1[ngob+i],col='red') 178 | legend('topright', legend=c(data$Home[ngob+i]),bty='n') 179 | plot(density(hparams$s2new[,i]), main='', xlab = '', ylab='', xlim=c(0,10),xaxt='n',yaxt='n', ann=FALSE) 180 | axis(side=1, tck=0.05,labels=lb[i], at = seq(0,10)) 181 | abline(v=data$score2[ngob+i],col='red') 182 | legend('topright', legend=c(data$Away[ngob+i]),bty='n') 183 | } 184 | mtext('score',side=1, outer=TRUE, line=2) 185 | mtext('density',side=2, outer=TRUE, line=2) 186 | pred_scores = c(colMeans(hparams$s1new),colMeans(hparams$s2new)) 187 | pred_errors = c(sapply(1:np, function(x) sd(hparams$s1new[,x])),sapply(1:np, function(x) sd(hparams$s1new[,x]))) 188 | true_scores = c(data$score1[(ngob+1):ng],data$score2[(ngob+1):ng] ) 189 | plot(true_scores, pred_scores, xlim=c(0,4), ylim=c(0,4), pch=20, ylab='predicted scores', xlab='true scores') 190 | abline(a=0, b=1, lty='dashed') 191 | arrows(true_scores, pred_scores+pred_errors, true_scores, pred_scores-pred_errors, length = 0.05, angle = 90, code = 3) 192 | hfit = stan(file = 'hier_model.stan', data = my_data) 193 | hfit = stan(file = 'hier_model.stan', data = my_data) 194 | hfit = stan(file = 'hier_model.stan', data = my_data) 195 | hfit = stan(file = 'hier_model.stan', data = my_data) 196 | hfit = stan(file = 'hier_model.stan', data = my_data) 197 | hfit = stan(file = 'hier_model.stan', data = my_data) 198 | print(hfit) 199 | pairs(hfit, pars=c('mu_att', 'tau_att', 'mu_def', 'tau_def')) 200 | hparams = extract(hfit) 201 | par(mfrow=c(5,2), mar=c(0,0,0,0), oma=c(4,4,1,1)) 202 | lb = c(FALSE,FALSE,FALSE,FALSE,TRUE) 203 | for(i in 1:5){ 204 | plot(density(hparams$s1new[,i]), main='', xlab = '', ylab='', xlim=c(0,10),xaxt='n',yaxt='n', ann=FALSE) 205 | axis(side=1, tck=0.05,labels=lb[i], at = seq(0,10)) 206 | abline(v=data$score1[ngob+i],col='red') 207 | legend('topright', legend=c(data$Home[ngob+i]),bty='n') 208 | plot(density(hparams$s2new[,i]), main='', xlab = '', ylab='', xlim=c(0,10),xaxt='n',yaxt='n', ann=FALSE) 209 | axis(side=1, tck=0.05,labels=lb[i], at = seq(0,10)) 210 | abline(v=data$score2[ngob+i],col='red') 211 | legend('topright', legend=c(data$Away[ngob+i]),bty='n') 212 | } 213 | mtext('score',side=1, outer=TRUE, line=2) 214 | mtext('density',side=2, outer=TRUE, line=2) 215 | pred_scores = c(colMeans(hparams$s1new),colMeans(hparams$s2new)) 216 | pred_errors = c(sapply(1:np, function(x) sd(hparams$s1new[,x])),sapply(1:np, function(x) sd(hparams$s1new[,x]))) 217 | true_scores = c(data$score1[(ngob+1):ng],data$score2[(ngob+1):ng] ) 218 | plot(true_scores, pred_scores, xlim=c(0,4), ylim=c(0,4), pch=20, ylab='predicted scores', xlab='true scores') 219 | abline(a=0, b=1, lty='dashed') 220 | arrows(true_scores, pred_scores+pred_errors, true_scores, pred_scores-pred_errors, length = 0.05, angle = 90, code = 3) 221 | pred_scores = c(colMeans(hparams$s1new),colMeans(hparams$s2new)) 222 | pred_errors = c(sapply(1:np, function(x) sd(hparams$s1new[,x])),sapply(1:np, function(x) sd(hparams$s1new[,x]))) 223 | true_scores = c(data$score1[(ngob+1):ng],data$score2[(ngob+1):ng] ) 224 | plot(true_scores, pred_scores, xlim=c(0,4), ylim=c(0,4), pch=20, ylab='predicted scores', xlab='true scores') 225 | abline(a=0, b=1, lty='dashed') 226 | arrows(true_scores, pred_scores+pred_errors, true_scores, pred_scores-pred_errors, length = 0.05, angle = 90, code = 3) 227 | hfit = stan(file = 'hier_model.stan', data = my_data) 228 | hfit = stan(file = 'hier_model.stan', data = my_data) 229 | hfit = stan(file = 'hier_model.stan', data = my_data) 230 | hfit = stan(file = 'hier_model.stan', data = my_data) 231 | hfit = stan(file = 'hier_model.stan', data = my_data) 232 | hfit = stan(file = 'hier_model.stan', data = my_data) 233 | hfit = stan(file = 'hier_model.stan', data = my_data) 234 | hfit = stan(file = 'hier_model.stan', data = my_data) 235 | hfit = stan(file = 'hier_model.stan', data = my_data) 236 | hfit = stan(file = 'hier_model.stan', data = my_data) 237 | hfit = stan(file = 'hier_model.stan', data = my_data) 238 | hfit = stan(file = 'hier_model.stan', data = my_data) 239 | hfit = stan(file = 'hier_model.stan', data = my_data) 240 | hfit = stan(file = 'hier_model.stan', data = my_data) 241 | hfit = stan(file = 'hier_model.stan', data = my_data) 242 | pairs(hfit, pars=c('mu_att', 'tau_att', 'mu_def', 'tau_def')) 243 | hparams = extract(hfit) 244 | par(mfrow=c(5,2), mar=c(0,0,0,0), oma=c(4,4,1,1)) 245 | lb = c(FALSE,FALSE,FALSE,FALSE,TRUE) 246 | for(i in 1:5){ 247 | plot(density(hparams$s1new[,i]), main='', xlab = '', ylab='', xlim=c(0,10),xaxt='n',yaxt='n', ann=FALSE) 248 | axis(side=1, tck=0.05,labels=lb[i], at = seq(0,10)) 249 | abline(v=data$score1[ngob+i],col='red') 250 | legend('topright', legend=c(data$Home[ngob+i]),bty='n') 251 | plot(density(hparams$s2new[,i]), main='', xlab = '', ylab='', xlim=c(0,10),xaxt='n',yaxt='n', ann=FALSE) 252 | axis(side=1, tck=0.05,labels=lb[i], at = seq(0,10)) 253 | abline(v=data$score2[ngob+i],col='red') 254 | legend('topright', legend=c(data$Away[ngob+i]),bty='n') 255 | } 256 | mtext('score',side=1, outer=TRUE, line=2) 257 | mtext('density',side=2, outer=TRUE, line=2) 258 | pred_scores = c(colMeans(hparams$s1new),colMeans(hparams$s2new)) 259 | pred_errors = c(sapply(1:np, function(x) sd(hparams$s1new[,x])),sapply(1:np, function(x) sd(hparams$s1new[,x]))) 260 | true_scores = c(data$score1[(ngob+1):ng],data$score2[(ngob+1):ng] ) 261 | plot(true_scores, pred_scores, xlim=c(0,4), ylim=c(0,4), pch=20, ylab='predicted scores', xlab='true scores') 262 | abline(a=0, b=1, lty='dashed') 263 | arrows(true_scores, pred_scores+pred_errors, true_scores, pred_scores-pred_errors, length = 0.05, angle = 90, code = 3) 264 | attack = colMeans(hparams$att) 265 | attacksd = sapply(1:nt, function(x) sd(hparams$att[,x])) 266 | defense = colMeans(hparams$def) 267 | defensesd = sapply(1:nt, function(x) sd(hparams$def[,x])) 268 | plot(attack,defense, xlim=c(-0.4,1), ylim=c(-0.45,0.3), pch=20) 269 | arrows(attack-attacksd, defense, attack+attacksd, defense, code=3, angle = 90, length = 0.04, col=rgb(0,0,0,0.2)) 270 | arrows(attack, defense-defensesd, attack, defense+defensesd, code=3, angle = 90, length = 0.04,col=rgb(0,0,0,0.2)) 271 | #abline(h=0) 272 | #abline(v=0) 273 | text(attack,defense, labels=teams, cex=0.7, adj=c(-0.05,-0.8) ) 274 | homescores = sapply(1:nt, function(x) sum(data$score1[which(ht == x)])) 275 | awayscores = sapply(1:nt, function(x) sum(data$score2[which(at == x)])) 276 | homelosses = sapply(1:nt, function(x) sum(data$score2[which(ht == x)])) 277 | awaylosses = sapply(1:nt, function(x) sum(data$score1[which(at == x)])) 278 | totalscores = homescores+awayscores 279 | totallosses = homelosses+awaylosses 280 | plot(totalscores,-totallosses,xlim=c(20,100)) 281 | text(totalscores,-totallosses, labels=teams, cex=0.7, pos=4, adj=c(-0.4)) 282 | pred_scores = c(colMeans(hparams$s1new),colMeans(hparams$s2new)) 283 | pred_errors = c(sapply(1:np, function(x) sd(hparams$s1new[,x])),sapply(1:np, function(x) sd(hparams$s1new[,x]))) 284 | true_scores = c(data$score1[(ngob+1):ng],data$score2[(ngob+1):ng] ) 285 | plot(true_scores, pred_scores, xlim=c(0,5), ylim=c(0,5), pch=20, ylab='predicted scores', xlab='true scores') 286 | abline(a=0, b=1, lty='dashed') 287 | arrows(true_scores, pred_scores+pred_errors, true_scores, pred_scores-pred_errors, length = 0.05, angle = 90, code = 3) 288 | pred_scores = c(colMeans(hparams$s1new),colMeans(hparams$s2new)) 289 | pred_errors = c(sapply(1:np, function(x) sd(hparams$s1new[,x])),sapply(1:np, function(x) sd(hparams$s1new[,x]))) 290 | true_scores = c(data$score1[(ngob+1):ng],data$score2[(ngob+1):ng] ) 291 | plot(true_scores, pred_scores, xlim=c(0,5), ylim=c(0,5), pch=20, ylab='predicted scores', xlab='true scores') 292 | abline(a=0, b=1, lty='dashed') 293 | arrows(true_scores, pred_scores+pred_errors, true_scores, pred_scores-pred_errors, length = 0.05, angle = 90, code = 3, rgb(0,0,0,0.3)) 294 | hfit = stan(file = 'hier_model.stan', data = my_data) 295 | pairs(hfit, pars=c('mu_att', 'tau_att', 'mu_def', 'tau_def')) 296 | hparams = extract(hfit) 297 | par(mfrow=c(5,2), mar=c(0,0,0,0), oma=c(4,4,1,1)) 298 | lb = c(FALSE,FALSE,FALSE,FALSE,TRUE) 299 | for(i in 1:5){ 300 | plot(density(hparams$s1new[,i]), main='', xlab = '', ylab='', xlim=c(0,10),xaxt='n',yaxt='n', ann=FALSE) 301 | axis(side=1, tck=0.05,labels=lb[i], at = seq(0,10)) 302 | abline(v=data$score1[ngob+i],col='red') 303 | legend('topright', legend=c(data$Home[ngob+i]),bty='n') 304 | plot(density(hparams$s2new[,i]), main='', xlab = '', ylab='', xlim=c(0,10),xaxt='n',yaxt='n', ann=FALSE) 305 | axis(side=1, tck=0.05,labels=lb[i], at = seq(0,10)) 306 | abline(v=data$score2[ngob+i],col='red') 307 | legend('topright', legend=c(data$Away[ngob+i]),bty='n') 308 | } 309 | mtext('score',side=1, outer=TRUE, line=2) 310 | mtext('density',side=2, outer=TRUE, line=2) 311 | pred_scores = c(colMeans(hparams$s1new),colMeans(hparams$s2new)) 312 | pred_errors = c(sapply(1:np, function(x) sd(hparams$s1new[,x])),sapply(1:np, function(x) sd(hparams$s1new[,x]))) 313 | true_scores = c(data$score1[(ngob+1):ng],data$score2[(ngob+1):ng] ) 314 | plot(true_scores, pred_scores, xlim=c(0,5), ylim=c(0,5), pch=20, ylab='predicted scores', xlab='true scores') 315 | abline(a=0, b=1, lty='dashed') 316 | arrows(true_scores, pred_scores+pred_errors, true_scores, pred_scores-pred_errors, length = 0.05, angle = 90, code = 3, rgb(0,0,0,0.3)) 317 | require(rstan) 318 | set.seed(1) #set seed 319 | data = read.csv('/Users/maggielieu/Desktop/STAN/GITHUB/STAN_tutorials/Hierarchical/premiereleague.csv',col.names = c('Home','score1', 'score2', 'Away'), stringsAsFactors = FALSE) 320 | data 321 | ng = nrow(data) 322 | cat('we have G =', ng, 'games \n') 323 | nt = length(unique(data$Home)) 324 | cat('we have T = ', nt, 'teams \n') 325 | teams = unique(data$Home) 326 | ht = unlist(sapply(1:ng, function(g) which(teams == data$Home[g]))) 327 | at = unlist(sapply(1:ng, function(g) which(teams == data$Away[g]))) 328 | # we will save the last 5 games to predict 329 | np=5 330 | ngob = ng-np 331 | my_data = list( 332 | nt = nt, 333 | ng = ngob, 334 | ht = ht[1:ngob], 335 | at = at[1:ngob], 336 | s1 = data$score1[1:ngob], 337 | s2 = data$score2[1:ngob], 338 | np = np, 339 | htnew = ht[(ngob+1):ng], 340 | atnew = at[(ngob+1):ng] 341 | ) 342 | nhfit = stan(file = 'non_hier_model.stan', data = my_data) 343 | nhfit = stan(file = 'non_hier_model.stan', data = my_data) 344 | print(nhfit) 345 | nhparams = extract(nhfit) 346 | pred_scores = c(colMeans(nhparams$s1new),colMeans(nhparams$s2new)) 347 | true_scores = c(data$score1[(ngob+1):ng],data$score2[(ngob+1):ng] ) 348 | plot(true_scores, pred_scores, xlim=c(0,5), ylim=c(0,5), pch=20, ylab='predicted scores', xlab='true scores') 349 | abline(a=0, b=1, lty='dashed') 350 | pred_errors = c(sapply(1:np, function(x) sd(nhparams$s1new[,x])),sapply(1:np, function(x) sd(nhparams$s1new[,x]))) 351 | arrows(true_scores, pred_scores+pred_errors, true_scores, pred_scores-pred_errors, length = 0.05, angle = 90, code = 3, col=rgb(0,0,0,0.3)) 352 | nhpoolfit = stan(file = 'non_hier_model_pool.stan', data = my_data) 353 | nhpoolparams = extract(nhpoolfit) 354 | pred_scores = c(colMeans(nhpoolparams$s1new),colMeans(nhpoolparams$s2new)) 355 | pred_errors = c(sapply(1:np, function(x) sd(nhpoolparams$s1new[,x])),sapply(1:np, function(x) sd(nhpoolparams$s1new[,x]))) 356 | true_scores = c(data$score1[(ngob+1):ng],data$score2[(ngob+1):ng] ) 357 | plot(true_scores, pred_scores, xlim=c(0,4), ylim=c(0,4), pch=20, ylab='predicted scores', xlab='true scores') 358 | abline(a=0, b=1, lty='dashed') 359 | arrows(true_scores, pred_scores+pred_errors, true_scores, pred_scores-pred_errors, length = 0.05, angle = 90, code = 3) 360 | attack = colMeans(nhparams$att) 361 | defense = colMeans(nhparams$def) 362 | plot(attack,defense,xlim=c(-0.4,1.1)) 363 | abline(h=0) 364 | abline(v=0) 365 | text(attack,defense, labels=teams, cex=0.7, pos=4) 366 | attack = colMeans(nhparams$att) 367 | defense = colMeans(nhparams$def) 368 | plot(attack,defense,xlim=c(-0.4,1.1)) 369 | abline(h=0) 370 | abline(v=0) 371 | text(attack,defense, labels=teams, cex=0.7, pos=4) 372 | nhpoolparams = extract(nhpoolfit) 373 | pred_scores = c(colMeans(nhpoolparams$s1new),colMeans(nhpoolparams$s2new)) 374 | pred_errors = c(sapply(1:np, function(x) sd(nhpoolparams$s1new[,x])),sapply(1:np, function(x) sd(nhpoolparams$s1new[,x]))) 375 | true_scores = c(data$score1[(ngob+1):ng],data$score2[(ngob+1):ng] ) 376 | plot(true_scores, pred_scores, xlim=c(0,4), ylim=c(0,4), pch=20, ylab='predicted scores', xlab='true scores') 377 | abline(a=0, b=1, lty='dashed') 378 | arrows(true_scores, pred_scores+pred_errors, true_scores, pred_scores-pred_errors, length = 0.05, angle = 90, code = 3) 379 | attack = colMeans(nhparams$att) 380 | defense = colMeans(nhparams$def) 381 | plot(attack,defense,xlim=c(-0.4,1.1)) 382 | abline(h=0) 383 | abline(v=0) 384 | text(attack,defense, labels=teams, cex=0.7, pos=4) 385 | attack = colMeans(nhpoolparams$att) 386 | defense = colMeans(nhpoolparams$def) 387 | plot(attack,defense,xlim=c(-0.4,1.1)) 388 | abline(h=0) 389 | abline(v=0) 390 | text(attack,defense, labels=teams, cex=0.7, pos=4) 391 | hfit = stan(file = 'hier_model.stan', data = my_data) 392 | hfit = stan(file = 'hier_model.stan', data = my_data) 393 | pairs(hfit, pars=c('mu_att', 'tau_att', 'mu_def', 'tau_def')) 394 | hparams = extract(hfit) 395 | pred_scores = c(colMeans(hparams$s1new),colMeans(hparams$s2new)) 396 | pred_errors = c(sapply(1:np, function(x) sd(hparams$s1new[,x])),sapply(1:np, function(x) sd(hparams$s1new[,x]))) 397 | true_scores = c(data$score1[(ngob+1):ng],data$score2[(ngob+1):ng] ) 398 | plot(true_scores, pred_scores, xlim=c(0,5), ylim=c(0,5), pch=20, ylab='predicted scores', xlab='true scores') 399 | abline(a=0, b=1, lty='dashed') 400 | arrows(true_scores, pred_scores+pred_errors, true_scores, pred_scores-pred_errors, length = 0.05, angle = 90, code = 3, rgb(0,0,0,0.3)) 401 | hfit = stan(file = 'hier_model.stan', data = my_data) 402 | hfit = stan(file = 'hier_model.stan', data = my_data) 403 | hparams = extract(hfit) 404 | pred_scores = c(colMeans(hparams$s1new),colMeans(hparams$s2new)) 405 | pred_errors = c(sapply(1:np, function(x) sd(hparams$s1new[,x])),sapply(1:np, function(x) sd(hparams$s1new[,x]))) 406 | true_scores = c(data$score1[(ngob+1):ng],data$score2[(ngob+1):ng] ) 407 | plot(true_scores, pred_scores, xlim=c(0,5), ylim=c(0,5), pch=20, ylab='predicted scores', xlab='true scores') 408 | abline(a=0, b=1, lty='dashed') 409 | arrows(true_scores, pred_scores+pred_errors, true_scores, pred_scores-pred_errors, length = 0.05, angle = 90, code = 3, rgb(0,0,0,0.3)) 410 | data = read.csv('/Users/maggielieu/Desktop/STAN/GITHUB/STAN_tutorials/Hierarchical/premiereleague.csv',col.names = c('Home','score1', 'score2', 'Away'), stringsAsFactors = FALSE) 411 | require(rstan) 412 | set.seed(1) #set seed 413 | data = read.csv('/Users/maggielieu/Desktop/STAN/GITHUB/STAN_tutorials/Hierarchical/premiereleague.csv',col.names = c('Home','score1', 'score2', 'Away'), stringsAsFactors = FALSE) 414 | data 415 | ng = nrow(data) 416 | cat('we have G =', ng, 'games \n') 417 | nt = length(unique(data$Home)) 418 | cat('we have T = ', nt, 'teams \n') 419 | teams = unique(data$Home) 420 | ht = unlist(sapply(1:ng, function(g) which(teams == data$Home[g]))) 421 | at = unlist(sapply(1:ng, function(g) which(teams == data$Away[g]))) 422 | # we will save the last 5 games to predict 423 | np=5 424 | ngob = ng-np 425 | my_data = list( 426 | nt = nt, 427 | ng = ngob, 428 | ht = ht[1:ngob], 429 | at = at[1:ngob], 430 | s1 = data$score1[1:ngob], 431 | s2 = data$score2[1:ngob], 432 | np = np, 433 | htnew = ht[(ngob+1):ng], 434 | atnew = at[(ngob+1):ng] 435 | ) 436 | nhfit = stan(file = 'non_hier_model.stan', data = my_data) 437 | nhparams = extract(nhfit) 438 | pred_scores = c(colMeans(nhparams$s1new),colMeans(nhparams$s2new)) 439 | true_scores = c(data$score1[(ngob+1):ng],data$score2[(ngob+1):ng] ) 440 | plot(true_scores, pred_scores, xlim=c(0,5), ylim=c(0,5), pch=20, ylab='predicted scores', xlab='true scores') 441 | abline(a=0, b=1, lty='dashed') 442 | pred_errors = c(sapply(1:np, function(x) sd(nhparams$s1new[,x])),sapply(1:np, function(x) sd(nhparams$s1new[,x]))) 443 | arrows(true_scores, pred_scores+pred_errors, true_scores, pred_scores-pred_errors, length = 0.05, angle = 90, code = 3, col=rgb(0,0,0,0.3)) 444 | attack = colMeans(nhparams$att) 445 | defense = colMeans(nhparams$def) 446 | plot(attack,defense,xlim=c(-0.4,1.1)) 447 | abline(h=0) 448 | abline(v=0) 449 | text(attack,defense, labels=teams, cex=0.7, pos=4) 450 | nhpoolfit = stan(file = 'non_hier_model_pool.stan', data = my_data) 451 | nhpoolparams = extract(nhpoolfit) 452 | pred_scores = c(colMeans(nhpoolparams$s1new),colMeans(nhpoolparams$s2new)) 453 | pred_errors = c(sapply(1:np, function(x) sd(nhpoolparams$s1new[,x])),sapply(1:np, function(x) sd(nhpoolparams$s1new[,x]))) 454 | true_scores = c(data$score1[(ngob+1):ng],data$score2[(ngob+1):ng] ) 455 | plot(true_scores, pred_scores, xlim=c(0,4), ylim=c(0,4), pch=20, ylab='predicted scores', xlab='true scores') 456 | abline(a=0, b=1, lty='dashed') 457 | arrows(true_scores, pred_scores+pred_errors, true_scores, pred_scores-pred_errors, length = 0.05, angle = 90, code = 3) 458 | attack = colMeans(nhpoolparams$att) 459 | defense = colMeans(nhpoolparams$def) 460 | plot(attack,defense,xlim=c(-0.4,1.1)) 461 | abline(h=0) 462 | abline(v=0) 463 | text(attack,defense, labels=teams, cex=0.7, pos=4) 464 | attack = colMeans(hparams$att) 465 | attacksd = sapply(1:nt, function(x) sd(hparams$att[,x])) 466 | defense = colMeans(hparams$def) 467 | defensesd = sapply(1:nt, function(x) sd(hparams$def[,x])) 468 | plot(attack,defense, xlim=c(-0.4,1), ylim=c(-0.45,0.3), pch=20) 469 | arrows(attack-attacksd, defense, attack+attacksd, defense, code=3, angle = 90, length = 0.04, col=rgb(0,0,0,0.2)) 470 | arrows(attack, defense-defensesd, attack, defense+defensesd, code=3, angle = 90, length = 0.04,col=rgb(0,0,0,0.2)) 471 | #abline(h=0) 472 | #abline(v=0) 473 | text(attack,defense, labels=teams, cex=0.7, adj=c(-0.05,-0.8) ) 474 | -------------------------------------------------------------------------------- /Hierarchical/Hierarchical.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Hierarchical modelling tutorial" 3 | output: html_notebook 4 | --- 5 | 6 | This notebook is a tutorial on Hierarchical modelling using Stan that accompanies the youtube video https://youtu.be/dNZQrcAjgXQ 7 | 8 | ```{r} 9 | require(rstan) 10 | set.seed(1) #set seed 11 | ``` 12 | 13 | First we read in the data 14 | ```{r} 15 | data = read.csv('/Users/maggielieu/Desktop/STAN/GITHUB/STAN_tutorials/Hierarchical/premiereleague.csv',col.names = c('Home','score1', 'score2', 'Away'), stringsAsFactors = FALSE) 16 | ``` 17 | 18 | ```{r} 19 | data 20 | ``` 21 | 22 | ```{r} 23 | ng = nrow(data) 24 | cat('we have G =', ng, 'games \n') 25 | ``` 26 | 27 | ```{r} 28 | nt = length(unique(data$Home)) 29 | cat('we have T = ', nt, 'teams \n') 30 | ``` 31 | 32 | # Traditional method 33 | 34 | We will assume that the goals scored come from a poisson distribution 35 | s1 | theta_g1 ~ Poisson(theta_g1) #game g score by home team 36 | s2 | theta_g2 ~ Poisson(theta_g2) #game g score by away team 37 | 38 | Assuming a log-linear random effect model 39 | log(theta_g1) = home + att_ht + def_at 40 | log(theta_g2) = att_at + def_ht 41 | 42 | where home is a constant for the advantage for the team hosting the game 43 | att and def are the attack and defence abilities of the teams where the indices at,ht correspond to the t=1-20 teams. 44 | 45 | priors we willl use for the attack and defence abilities are very wide, essentially the teams' performances are independent of each other 46 | home ~ normal(0,0.0001) 47 | att[t] ~ normal(0, 2) 48 | def[t] ~ normal(0, 2) 49 | 50 | Now convert team names for each match into numbers 51 | ```{r} 52 | teams = unique(data$Home) 53 | ht = unlist(sapply(1:ng, function(g) which(teams == data$Home[g]))) 54 | at = unlist(sapply(1:ng, function(g) which(teams == data$Away[g]))) 55 | ``` 56 | 57 | ```{r} 58 | # we will save the last 5 games to predict 59 | np=5 60 | ngob = ng-np #number of games to fit 61 | my_data = list( 62 | nt = nt, 63 | ng = ngob, 64 | ht = ht[1:ngob], 65 | at = at[1:ngob], 66 | s1 = data$score1[1:ngob], 67 | s2 = data$score2[1:ngob], 68 | np = np, 69 | htnew = ht[(ngob+1):ng], 70 | atnew = at[(ngob+1):ng] 71 | ) 72 | ``` 73 | 74 | ```{r} 75 | nhfit = stan(file = 'non_hier_model.stan', data = my_data) 76 | ``` 77 | ```{r} 78 | print(nhfit) 79 | ``` 80 | `Plot the predicted scores of the last 5 matches 81 | ```{r, fig.height=5, fig.width=5} 82 | nhparams = extract(nhfit) 83 | pred_scores = c(colMeans(nhparams$s1new),colMeans(nhparams$s2new)) 84 | true_scores = c(data$score1[(ngob+1):ng],data$score2[(ngob+1):ng] ) 85 | plot(true_scores, pred_scores, xlim=c(0,5), ylim=c(0,5), pch=20, ylab='predicted scores', xlab='true scores') 86 | abline(a=0, b=1, lty='dashed') 87 | 88 | pred_errors = c(sapply(1:np, function(x) sd(nhparams$s1new[,x])),sapply(1:np, function(x) sd(nhparams$s1new[,x]))) 89 | arrows(true_scores, pred_scores+pred_errors, true_scores, pred_scores-pred_errors, length = 0.05, angle = 90, code = 3, col=rgb(0,0,0,0.3)) 90 | ``` 91 | 92 | We can also look at the attack/defense of the teams: 93 | ```{r} 94 | attack = colMeans(nhparams$att) 95 | defense = colMeans(nhparams$def) 96 | 97 | plot(attack,defense,xlim=c(-0.4,1.1)) 98 | abline(h=0) 99 | abline(v=0) 100 | text(attack,defense, labels=teams, cex=0.7, pos=4) 101 | ``` 102 | 103 | 104 | 105 | Or we can assume all of the abilities are the same, or at least very similar by setting a very tight prior on the attack and defence abilities N(0,0.001), essentially restricting the teams to have the same performance 106 | 107 | ```{r} 108 | nhpoolfit = stan(file = 'non_hier_model_pool.stan', data = my_data) 109 | ``` 110 | 111 | Again we can plot the predicted scores of the last 5 matches, note now how the predicted scores are all the same. 112 | ```{r, fig.width=5, fig.height=5} 113 | nhpoolparams = extract(nhpoolfit) 114 | pred_scores = c(colMeans(nhpoolparams$s1new),colMeans(nhpoolparams$s2new)) 115 | pred_errors = c(sapply(1:np, function(x) sd(nhpoolparams$s1new[,x])),sapply(1:np, function(x) sd(nhpoolparams$s1new[,x]))) 116 | true_scores = c(data$score1[(ngob+1):ng],data$score2[(ngob+1):ng] ) 117 | plot(true_scores, pred_scores, xlim=c(0,4), ylim=c(0,4), pch=20, ylab='predicted scores', xlab='true scores') 118 | abline(a=0, b=1, lty='dashed') 119 | arrows(true_scores, pred_scores+pred_errors, true_scores, pred_scores-pred_errors, length = 0.05, angle = 90, code = 3) 120 | ``` 121 | 122 | and similarly the teams' attack and defense abilities are all close to 0. 123 | ```{r} 124 | attack = colMeans(nhpoolparams$att) 125 | defense = colMeans(nhpoolparams$def) 126 | 127 | plot(attack,defense,xlim=c(-0.4,1.1)) 128 | abline(h=0) 129 | abline(v=0) 130 | text(attack,defense, labels=teams, cex=0.7, pos=4) 131 | ``` 132 | 133 | # Hierarchical model 134 | In a hierarchical model, the parameters of interest, in our case the attack and defense ability are drawn from the population distribution. 135 | att[t] ~ normal(mu_att, tau_att) 136 | def[t] ~ normal(mu_def, tau_def) 137 | 138 | Instead we define priors on the population, known as the hyperpriors. 139 | mu_att = normal(0,0.0001) 140 | tau_att = gamma(0.1,0.1) 141 | mu_def = normal(0, 0.0001) 142 | tau_def = gamma(0.1,0.1) 143 | 144 | ```{r} 145 | hfit = stan(file = 'hier_model.stan', data = my_data) 146 | ``` 147 | 148 | ```{r} 149 | print(hfit) 150 | ``` 151 | 152 | ```{r} 153 | pairs(hfit, pars=c('mu_att', 'tau_att', 'mu_def', 'tau_def')) 154 | ``` 155 | 156 | ```{r, fig.height=5, fig.width=5} 157 | hparams = extract(hfit) 158 | pred_scores = c(colMeans(hparams$s1new),colMeans(hparams$s2new)) 159 | pred_errors = c(sapply(1:np, function(x) sd(hparams$s1new[,x])),sapply(1:np, function(x) sd(hparams$s1new[,x]))) 160 | true_scores = c(data$score1[(ngob+1):ng],data$score2[(ngob+1):ng] ) 161 | plot(true_scores, pred_scores, xlim=c(0,5), ylim=c(0,5), pch=20, ylab='predicted scores', xlab='true scores') 162 | abline(a=0, b=1, lty='dashed') 163 | arrows(true_scores, pred_scores+pred_errors, true_scores, pred_scores-pred_errors, length = 0.05, angle = 90, code = 3, rgb(0,0,0,0.3)) 164 | ``` 165 | 166 | 167 | ```{r} 168 | attack = colMeans(hparams$att) 169 | attacksd = sapply(1:nt, function(x) sd(hparams$att[,x])) 170 | defense = colMeans(hparams$def) 171 | defensesd = sapply(1:nt, function(x) sd(hparams$def[,x])) 172 | 173 | plot(attack,defense, xlim=c(-0.4,1), ylim=c(-0.45,0.3), pch=20) 174 | arrows(attack-attacksd, defense, attack+attacksd, defense, code=3, angle = 90, length = 0.04, col=rgb(0,0,0,0.2)) 175 | arrows(attack, defense-defensesd, attack, defense+defensesd, code=3, angle = 90, length = 0.04,col=rgb(0,0,0,0.2)) 176 | #abline(h=0) 177 | #abline(v=0) 178 | text(attack,defense, labels=teams, cex=0.7, adj=c(-0.05,-0.8) ) 179 | ``` 180 | 181 | 182 | 183 | 184 | 185 | 186 | 187 | 188 | 189 | 190 | 191 | 192 | 193 | -------------------------------------------------------------------------------- /Hierarchical/hier_model.stan: -------------------------------------------------------------------------------- 1 | data { 2 | int nt; //number of teams 3 | int ng; //number of games 4 | int ht[ng]; //home team index 5 | int at[ng]; //away team index 6 | int s1[ng]; //score home team 7 | int s2[ng]; //score away team 8 | int np; //number of predicted games 9 | int htnew[np]; //home team index for prediction 10 | int atnew[np]; //away team index for prediction 11 | } 12 | 13 | parameters { 14 | real home; //home advantage 15 | vector[nt] att; //attack ability of each team 16 | vector[nt] def; //defence ability of each team 17 | //hyper parameters 18 | real mu_att; 19 | real tau_att; 20 | real mu_def; 21 | real tau_def; 22 | } 23 | 24 | transformed parameters { 25 | vector[ng] theta1; //score probability of home team 26 | vector[ng] theta2; //score probability of away team 27 | 28 | theta1 = exp(home + att[ht] - def[at]); 29 | theta2 = exp(att[at] - def[ht]); 30 | 31 | } 32 | 33 | model { 34 | //hyper priors 35 | mu_att ~ normal(0,0.1); 36 | tau_att ~ normal(0,1); 37 | mu_def ~ normal(0,0.1); 38 | tau_def ~ normal(0,1); 39 | 40 | //priors 41 | att ~ normal(mu_att, tau_att); 42 | def ~ normal(mu_def, tau_def); 43 | home ~ normal(0,0.0001); 44 | 45 | 46 | //likelihood 47 | s1 ~ poisson(theta1); 48 | s2 ~ poisson(theta2); 49 | } 50 | 51 | generated quantities { 52 | //generate predictions 53 | vector[np] theta1new; //score probability of home team 54 | vector[np] theta2new; //score probability of away team 55 | real s1new[np]; //predicted score 56 | real s2new[np]; //predicted score 57 | 58 | 59 | theta1new = exp(home + att[htnew] - def[atnew]); 60 | theta2new = exp(att[atnew] - def[htnew]); 61 | 62 | s1new = poisson_rng(theta1new); 63 | s2new = poisson_rng(theta2new); 64 | } 65 | 66 | -------------------------------------------------------------------------------- /Hierarchical/non_hier_model.stan: -------------------------------------------------------------------------------- 1 | data { 2 | int nt; //number of teams 3 | int ng; //number of games 4 | int ht[ng]; //home team index 5 | int at[ng]; //away team index 6 | int s1[ng]; //score home team 7 | int s2[ng]; //score away team 8 | int np; //number of predicted games 9 | int htnew[np]; //home team index for prediction 10 | int atnew[np]; //away team index for prediction 11 | } 12 | 13 | parameters { 14 | real home; //home advantage 15 | vector[nt] att; //attack ability of each team 16 | vector[nt] def; //defence ability of each team 17 | } 18 | 19 | transformed parameters { 20 | vector[ng] theta1; //score probability of home team 21 | vector[ng] theta2; //score probability of away team 22 | 23 | theta1 = exp(home + att[ht] - def[at]); 24 | theta2 = exp(att[at] - def[ht]); 25 | } 26 | 27 | model { 28 | //priors 29 | att ~ normal(0, 2); 30 | def ~ normal(0, 2); 31 | home ~ normal(0,0.0001); 32 | 33 | //likelihood 34 | s1 ~ poisson(theta1); 35 | s2 ~ poisson(theta2); 36 | } 37 | 38 | generated quantities { 39 | //generate predictions 40 | vector[np] theta1new; //score probability of home team 41 | vector[np] theta2new; //score probability of away team 42 | real s1new[np]; //predicted score 43 | real s2new[np]; //predicted score 44 | 45 | theta1new = exp(home + att[htnew] - def[atnew]); 46 | theta2new = exp(att[atnew] - def[htnew]); 47 | s1new = poisson_rng(theta1new); 48 | s2new = poisson_rng(theta2new); 49 | } 50 | -------------------------------------------------------------------------------- /Hierarchical/non_hier_model_pool.stan: -------------------------------------------------------------------------------- 1 | data { 2 | int nt; //number of teams 3 | int ng; //number of games 4 | int ht[ng]; //home team index 5 | int at[ng]; //away team index 6 | int s1[ng]; //score home team 7 | int s2[ng]; //score away team 8 | int np; //number of predicted games 9 | int htnew[np]; //home team index for prediction 10 | int atnew[np]; //away team index for prediction 11 | } 12 | 13 | parameters { 14 | real home; //home advantage 15 | vector[nt] att; //attack ability of each team 16 | vector[nt] def; //defence ability of each team 17 | } 18 | 19 | transformed parameters { 20 | vector[ng] theta1; //score probability of home team 21 | vector[ng] theta2; //score probability of away team 22 | 23 | theta1 = exp(home + att[ht] - def[at]); 24 | theta2 = exp(att[at] - def[ht]); 25 | } 26 | 27 | model { 28 | //priors 29 | att ~ normal(0, 0.0001); 30 | def ~ normal(0, 0.0001); 31 | home ~ normal(0,0.0001); 32 | 33 | //likelihood 34 | s1 ~ poisson(theta1); 35 | s2 ~ poisson(theta2); 36 | } 37 | 38 | generated quantities { 39 | //generate predictions 40 | vector[np] theta1new; //score probability of home team 41 | vector[np] theta2new; //score probability of away team 42 | real s1new[np]; //predicted score 43 | real s2new[np]; //predicted score 44 | 45 | theta1new = exp(home + att[htnew] - def[atnew]); 46 | theta2new = exp(att[atnew] - def[htnew]); 47 | s1new = poisson_rng(theta1new); 48 | s2new = poisson_rng(theta2new); 49 | } 50 | -------------------------------------------------------------------------------- /Hierarchical/premiereleague.csv: -------------------------------------------------------------------------------- 1 | Liverpool, 4,1,Norwich City 2 | West Ham United, 0,5,Manchester City 3 | Burnley, 3,0,Southampton 4 | Crystal Palace, 0,0,Everton 5 | Watford, 0,3,Brighton & Hove Albion 6 | AFC Bournemouth, 1,1,Sheffield United 7 | Tottenham Hotspur, 3,1,Aston Villa 8 | Leicester City, 0,0,Wolverhampton Wanderers 9 | Newcastle United, 0,1,Arsenal 10 | Manchester United, 4,0,Chelsea 11 | Arsenal, 2,1,Burnley 12 | Aston Villa, 1,2,AFC Bournemouth 13 | Everton, 1,0,Watford 14 | Norwich City, 3,1,Newcastle United 15 | Southampton, 1,2,Liverpool 16 | Brighton & Hove Albion,1,1,West Ham United 17 | Manchester City, 2,2,Tottenham Hotspur 18 | Sheffield United, 1,0,Crystal Palace 19 | Chelsea, 1,1,Leicester City 20 | Wolverhampton Wanderers,1,1,Manchester United 21 | Aston Villa, 2,0,Everton 22 | Norwich City, 2,3,Chelsea 23 | Manchester United, 1,2,Crystal Palace 24 | Sheffield United, 1,2,Leicester City 25 | Brighton & Hove Albion,0,2,Southampton 26 | Watford, 1,3,West Ham United 27 | Liverpool, 3,1,Arsenal 28 | AFC Bournemouth, 1,3,Manchester City 29 | Wolverhampton Wanderers,1,1,Burnley 30 | Tottenham Hotspur, 0,1,Newcastle United 31 | Southampton, 1,1,Manchester United 32 | Manchester City, 4,0,Brighton & Hove Albion 33 | Newcastle United, 1,1,Watford 34 | Chelsea, 2,2,Sheffield United 35 | Leicester City, 3,1,AFC Bournemouth 36 | West Ham United, 2,0,Norwich City 37 | Crystal Palace, 1,0,Aston Villa 38 | Burnley, 0,3,Liverpool 39 | Everton, 3,2,Wolverhampton Wanderers 40 | Arsenal, 2,2,Tottenham Hotspur 41 | Liverpool, 3,1,Newcastle United 42 | Wolverhampton Wanderers,2,5,Chelsea 43 | Tottenham Hotspur, 4,0,Crystal Palace 44 | Manchester United, 1,0,Leicester City 45 | Sheffield United, 0,1,Southampton 46 | Brighton & Hove Albion,1,1,Burnley 47 | Norwich City, 3,2,Manchester City 48 | AFC Bournemouth, 3,1,Everton 49 | Watford, 2,2,Arsenal 50 | Aston Villa, 0,0,West Ham United 51 | Southampton, 1,3,AFC Bournemouth 52 | Leicester City, 2,1,Tottenham Hotspur 53 | Manchester City, 8,0,Watford 54 | Everton, 0,2,Sheffield United 55 | Burnley, 2,0,Norwich City 56 | Newcastle United, 0,0,Brighton & Hove Albion 57 | West Ham United, 2,0,Manchester United 58 | Crystal Palace, 1,1,Wolverhampton Wanderers 59 | Arsenal, 3,2,Aston Villa 60 | Chelsea, 1,2,Liverpool 61 | Sheffield United, 0,1,Liverpool 62 | Wolverhampton Wanderers,2,0,Watford 63 | Aston Villa, 2,2,Burnley 64 | Chelsea, 2,0,Brighton & Hove Albion 65 | Tottenham Hotspur, 2,1,Southampton 66 | Crystal Palace, 2,0,Norwich City 67 | AFC Bournemouth, 2,2,West Ham United 68 | Everton, 1,3,Manchester City 69 | Leicester City, 5,0,Newcastle United 70 | Manchester United, 1,1,Arsenal 71 | Brighton & Hove Albion,3,0,Tottenham Hotspur 72 | Liverpool, 2,1,Leicester City 73 | Norwich City, 1,5,Aston Villa 74 | Burnley, 1,0,Everton 75 | Watford, 0,0,Sheffield United 76 | West Ham United, 1,2,Crystal Palace 77 | Manchester City, 0,2,Wolverhampton Wanderers 78 | Arsenal, 1,0,AFC Bournemouth 79 | Southampton, 1,4,Chelsea 80 | Newcastle United, 1,0,Manchester United 81 | Everton, 2,0,West Ham United 82 | Wolverhampton Wanderers,1,1,Southampton 83 | Aston Villa, 2,1,Brighton & Hove Albion 84 | Chelsea, 1,0,Newcastle United 85 | Tottenham Hotspur, 1,1,Watford 86 | Leicester City, 2,1,Burnley 87 | AFC Bournemouth, 0,0,Norwich City 88 | Crystal Palace, 0,2,Manchester City 89 | Manchester United, 1,1,Liverpool 90 | Sheffield United, 1,0,Arsenal 91 | Southampton, 0,9,Leicester City 92 | Manchester City, 3,0,Aston Villa 93 | West Ham United, 1,1,Sheffield United 94 | Brighton & Hove Albion,3,2,Everton 95 | Watford, 0,0,AFC Bournemouth 96 | Burnley, 2,4,Chelsea 97 | Newcastle United, 1,1,Wolverhampton Wanderers 98 | Liverpool, 2,1,Tottenham Hotspur 99 | Arsenal, 2,2,Crystal Palace 100 | Norwich City, 1,3,Manchester United 101 | AFC Bournemouth, 1,0,Manchester United 102 | Brighton & Hove Albion,2,0,Norwich City 103 | Aston Villa, 1,2,Liverpool 104 | Manchester City, 2,1,Southampton 105 | Arsenal, 1,1,Wolverhampton Wanderers 106 | Sheffield United, 3,0,Burnley 107 | West Ham United, 2,3,Newcastle United 108 | Watford, 1,2,Chelsea 109 | Crystal Palace, 0,2,Leicester City 110 | Everton, 1,1,Tottenham Hotspur 111 | Norwich City, 0,2,Watford 112 | Chelsea, 2,0,Crystal Palace 113 | Newcastle United, 2,1,AFC Bournemouth 114 | Tottenham Hotspur, 1,1,Sheffield United 115 | Burnley, 3,0,West Ham United 116 | Southampton, 1,2,Everton 117 | Leicester City, 2,0,Arsenal 118 | Wolverhampton Wanderers,2,1,Aston Villa 119 | Manchester United, 3,1,Brighton & Hove Albion 120 | Liverpool, 3,1,Manchester City 121 | West Ham United, 2,3,Tottenham Hotspur 122 | Arsenal, 2,2,Southampton 123 | Everton, 0,2,Norwich City 124 | Crystal Palace, 1,2,Liverpool 125 | Brighton & Hove Albion,0,2,Leicester City 126 | Watford, 0,3,Burnley 127 | AFC Bournemouth, 1,2,Wolverhampton Wanderers 128 | Manchester City, 2,1,Chelsea 129 | Sheffield United, 3,3,Manchester United 130 | Aston Villa, 2,0,Newcastle United 131 | Newcastle United, 2,2,Manchester City 132 | Liverpool, 2,1,Brighton & Hove Albion 133 | Chelsea, 0,1,West Ham United 134 | Tottenham Hotspur, 3,2,AFC Bournemouth 135 | Burnley, 0,2,Crystal Palace 136 | Southampton, 2,1,Watford 137 | Wolverhampton Wanderers,1,1,Sheffield United 138 | Norwich City, 2,2,Arsenal 139 | Manchester United, 2,2,Aston Villa 140 | Leicester City, 2,1,Everton 141 | Crystal Palace, 1,0,AFC Bournemouth 142 | Burnley, 1,4,Manchester City 143 | Wolverhampton Wanderers,2,0,West Ham United 144 | Manchester United, 2,1,Tottenham Hotspur 145 | Leicester City, 2,0,Watford 146 | Chelsea, 2,1,Aston Villa 147 | Southampton, 2,1,Norwich City 148 | Liverpool, 5,2,Everton 149 | Sheffield United, 0,2,Newcastle United 150 | Arsenal, 1,2,Brighton & Hove Albion 151 | Everton, 3,1,Chelsea 152 | Tottenham Hotspur, 5,0,Burnley 153 | Watford, 0,0,Crystal Palace 154 | AFC Bournemouth, 0,3,Liverpool 155 | Manchester City, 1,2,Manchester United 156 | Aston Villa, 1,4,Leicester City 157 | Newcastle United, 2,1,Southampton 158 | Norwich City, 1,2,Sheffield United 159 | Brighton & Hove Albion,2,2,Wolverhampton Wanderers 160 | West Ham United, 1,3,Arsenal 161 | Liverpool, 2,0,Watford 162 | Chelsea, 0,1,AFC Bournemouth 163 | Sheffield United, 2,0,Aston Villa 164 | Leicester City, 1,1,Norwich City 165 | Burnley, 1,0,Newcastle United 166 | Southampton, 0,1,West Ham United 167 | Wolverhampton Wanderers,1,2,Tottenham Hotspur 168 | Manchester United, 1,1,Everton 169 | Arsenal, 0,3,Manchester City 170 | Crystal Palace, 1,1,Brighton & Hove Albion 171 | Everton, 0,0,Arsenal 172 | Aston Villa, 1,3,Southampton 173 | Newcastle United, 1,0,Crystal Palace 174 | Norwich City, 1,2,Wolverhampton Wanderers 175 | Brighton & Hove Albion,0,1,Sheffield United 176 | AFC Bournemouth, 0,1,Burnley 177 | Manchester City, 3,1,Leicester City 178 | Watford, 2,0,Manchester United 179 | Tottenham Hotspur, 0,2,Chelsea 180 | Tottenham Hotspur, 2,1,Brighton & Hove Albion 181 | Aston Villa, 1,0,Norwich City 182 | Chelsea, 0,2,Southampton 183 | Everton, 1,0,Burnley 184 | Sheffield United, 1,1,Watford 185 | Crystal Palace, 2,1,West Ham United 186 | AFC Bournemouth, 1,1,Arsenal 187 | Manchester United, 4,1,Newcastle United 188 | Leicester City, 0,4,Liverpool 189 | Wolverhampton Wanderers,3,2,Manchester City 190 | Brighton & Hove Albion,2,0,AFC Bournemouth 191 | Newcastle United, 1,2,Everton 192 | Southampton, 1,1,Crystal Palace 193 | Watford, 3,0,Aston Villa 194 | Norwich City, 2,2,Tottenham Hotspur 195 | West Ham United, 1,2,Leicester City 196 | Burnley, 0,2,Manchester United 197 | Arsenal, 1,2,Chelsea 198 | Liverpool, 1,0,Wolverhampton Wanderers 199 | Manchester City, 2,0,Sheffield United 200 | Burnley, 1,2,Aston Villa 201 | Brighton & Hove Albion,1,1,Chelsea 202 | Newcastle United, 0,3,Leicester City 203 | Southampton, 1,0,Tottenham Hotspur 204 | Watford, 2,1,Wolverhampton Wanderers 205 | Manchester City, 2,1,Everton 206 | Norwich City, 1,1,Crystal Palace 207 | West Ham United, 4,0,AFC Bournemouth 208 | Arsenal, 2,0,Manchester United 209 | Liverpool, 2,0,Sheffield United 210 | Sheffield United, 1,0,West Ham United 211 | Crystal Palace, 1,1,Arsenal 212 | Wolverhampton Wanderers,1,1,Newcastle United 213 | Chelsea, 3,0,Burnley 214 | Everton, 1,0,Brighton & Hove Albion 215 | Manchester United, 4,0,Norwich City 216 | Leicester City, 1,2,Southampton 217 | Tottenham Hotspur, 0,1,Liverpool 218 | AFC Bournemouth, 0,3,Watford 219 | Aston Villa, 1,6,Manchester City 220 | Watford, 0,0,Tottenham Hotspur 221 | Manchester City, 2,2,Crystal Palace 222 | Arsenal, 1,1,Sheffield United 223 | Norwich City, 1,0,AFC Bournemouth 224 | West Ham United, 1,1,Everton 225 | Southampton, 2,3,Wolverhampton Wanderers 226 | Brighton & Hove Albion,1,1,Aston Villa 227 | Newcastle United, 1,0,Chelsea 228 | Burnley, 2,1,Leicester City 229 | Liverpool, 2,0,Manchester United 230 | Aston Villa, 2,1,Watford 231 | Everton, 2,2,Newcastle United 232 | Sheffield United, 0,1,Manchester City 233 | AFC Bournemouth, 3,1,Brighton & Hove Albion 234 | Crystal Palace, 0,2,Southampton 235 | Chelsea, 2,2,Arsenal 236 | Leicester City, 4,1,West Ham United 237 | Tottenham Hotspur, 2,1,Norwich City 238 | Manchester United, 0,2,Burnley 239 | Wolverhampton Wanderers,1,2,Liverpool 240 | West Ham United, 0,2,Liverpool 241 | Leicester City, 2,2,Chelsea 242 | Liverpool, 4,0,Southampton 243 | Newcastle United, 0,0,Norwich City 244 | West Ham United, 3,3,Brighton & Hove Albion 245 | Crystal Palace, 0,1,Sheffield United 246 | Watford, 2,3,Everton 247 | AFC Bournemouth, 2,1,Aston Villa 248 | Manchester United, 0,0,Wolverhampton Wanderers 249 | Burnley, 0,0,Arsenal 250 | Tottenham Hotspur, 2,0,Manchester City 251 | Everton, 3,1,Crystal Palace 252 | Brighton & Hove Albion,1,1,Watford 253 | Sheffield United, 2,1,AFC Bournemouth 254 | Wolverhampton Wanderers,0,0,Leicester City 255 | Southampton, 1,2,Burnley 256 | Norwich City, 0,1,Liverpool 257 | Aston Villa, 2,3,Tottenham Hotspur 258 | Arsenal, 4,0,Newcastle United 259 | Chelsea, 0,2,Manchester United 260 | Manchester City, 2,0,West Ham United 261 | Chelsea, 2,1,Tottenham Hotspur 262 | Sheffield United, 1,1,Brighton & Hove Albion 263 | Burnley, 3,0,AFC Bournemouth 264 | Southampton, 2,0,Aston Villa 265 | Crystal Palace, 1,0,Newcastle United 266 | Leicester City, 0,1,Manchester City 267 | Wolverhampton Wanderers,3,0,Norwich City 268 | Manchester United, 3,0,Watford 269 | Arsenal, 3,2,Everton 270 | Liverpool, 3,2,West Ham United 271 | Norwich City, 1,0,Leicester City 272 | Brighton & Hove Albion,0,1,Crystal Palace 273 | Newcastle United, 0,0,Burnley 274 | West Ham United, 3,1,Southampton 275 | AFC Bournemouth, 2,2,Chelsea 276 | Watford, 3,0,Liverpool 277 | Tottenham Hotspur, 2,3,Wolverhampton Wanderers 278 | Everton, 1,1,Manchester United 279 | Liverpool, 2,1,AFC Bournemouth 280 | Wolverhampton Wanderers,0,0,Brighton & Hove Albion 281 | Arsenal, 1,0,West Ham United 282 | Sheffield United, 1,0,Norwich City 283 | Southampton, 0,1,Newcastle United 284 | Crystal Palace, 1,0,Watford 285 | Burnley, 1,1,Tottenham Hotspur 286 | Chelsea, 4,0,Everton 287 | Manchester United, 2,0,Manchester City 288 | Leicester City, 4,0,Aston Villa 289 | Aston Villa, 0,0,Sheffield United 290 | Manchester City, 3,0,Arsenal 291 | Norwich City, 0,3,Southampton 292 | Tottenham Hotspur, 1,1,Manchester United 293 | Watford, 1,1,Leicester City 294 | Brighton & Hove Albion,2,1,Arsenal 295 | West Ham United, 0,2,Wolverhampton Wanderers 296 | AFC Bournemouth, 0,2,Crystal Palace 297 | Newcastle United, 3,0,Sheffield United 298 | Aston Villa, 1,2,Chelsea 299 | Everton, 0,0,Liverpool 300 | Manchester City, 5,0,Burnley 301 | Leicester City, 0,0,Brighton & Hove Albion 302 | Tottenham Hotspur, 2,0,West Ham United 303 | Wolverhampton Wanderers,1,0,AFC Bournemouth 304 | Newcastle United, 1,1,Aston Villa 305 | Manchester United, 3,0,Sheffield United 306 | Norwich City, 0,1,Everton 307 | Liverpool, 4,0,Crystal Palace 308 | Burnley, 1,0,Watford 309 | Southampton, 0,2,Arsenal 310 | Chelsea, 2,1,Manchester City 311 | Aston Villa, 0,1,Wolverhampton Wanderers 312 | Watford, 1,3,Southampton 313 | Crystal Palace, 0,1,Burnley 314 | Brighton & Hove Albion,0,3,Manchester United 315 | Arsenal, 4,0,Norwich City 316 | Everton, 2,1,Leicester City 317 | AFC Bournemouth, 1,4,Newcastle United 318 | West Ham United, 3,2,Chelsea 319 | Sheffield United, 3,1,Tottenham Hotspur 320 | Manchester City, 4,0,Liverpool 321 | Norwich City, 0,1,Brighton & Hove Albion 322 | Manchester United, 5,2,AFC Bournemouth 323 | Leicester City, 3,0,Crystal Palace 324 | Wolverhampton Wanderers,0,2,Arsenal 325 | Chelsea, 3,0,Watford 326 | Burnley, 1,1,Sheffield United 327 | Newcastle United, 2,2,West Ham United 328 | Liverpool, 2,0,Aston Villa 329 | Southampton, 1,0,Manchester City 330 | -------------------------------------------------------------------------------- /Linear_regression_I/Linear_regression_I.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Linear_regression_I" 3 | output: html_notebook 4 | --- 5 | 6 | This is a tutorial on simple linear regression with Stan using the iris dataset. 7 | 8 | ### First get the data 9 | ```{r} 10 | data(iris) 11 | head(iris) 12 | ``` 13 | 14 | 15 | We only use the versicolor flowers, and define the covariates as the sepal length and the variates as the petal length. 16 | ```{r} 17 | versicolor = which(iris$Species == 'versicolor') 18 | x = iris$Sepal.Length[versicolor] 19 | y = iris$Petal.Length[versicolor] 20 | ``` 21 | 22 | We prepare the data for Stan in a list format 23 | ```{r} 24 | data = list( 25 | N = length(x), 26 | x = x, 27 | y = y 28 | ) 29 | ``` 30 | 31 | ### Fit the data 32 | ```{r} 33 | require(rstan) 34 | fit = stan(file = 'model.stan',data=data) 35 | ``` 36 | 37 | ```{r} 38 | print(fit) 39 | ``` 40 | 41 | Extract the best fit parameters and visualise on the data 42 | ```{r} 43 | plot(x,y) 44 | params = extract(fit) 45 | alpha = mean(params$alpha) 46 | beta = mean(params$beta) 47 | abline(a=alpha, b=beta) 48 | ``` 49 | 50 | ### posterior intervals 51 | The posterior draws can be used to obtain posterior intervals on the fit 52 | ```{r} 53 | plot(x,y) 54 | params = extract(fit) 55 | alpha = mean(params$alpha) 56 | beta = mean(params$beta) 57 | abline(a=alpha, b=beta) 58 | xr=seq(4,7.5,0.1) 59 | yCI = sapply(xr, function(x) quantile(params$beta*x + params$alpha, probs=c(0.05,0.95) )) #95% quantiles 60 | lines(xr, yCI[1,], col='red') 61 | lines(xr, yCI[2,], col='red') 62 | ``` 63 | 64 | # Simulated data 65 | Simulated data from the posterior can be used as a sanity check 66 | ```{r} 67 | plot(density(y), xlim=c(2,6.5), ylim=c(0,1.4)) 68 | for(i in 1:10){lines(density(params$y_sim[i,]), col='red')} 69 | ``` 70 | 71 | Using the simulated data, we should recover similar parameters to the real data. To check this we can re-run the model with a simulated dataset. 72 | ```{r} 73 | y_new = params$y_sim[20, ] #20th simulated dataset 74 | data_new = list( 75 | N = length(x), 76 | x = x, 77 | y = y_new 78 | ) 79 | fit_new = stan(file ='model.stan', data=data_new) 80 | ``` 81 | 82 | 83 | 84 | Plot the posteriors on the parameters of the simulated and real data 85 | ```{r} 86 | par(mfrow=c(1,2)) 87 | params_new=extract(fit_new) 88 | plot(density(params$alpha)) 89 | lines(density(params_new$alpha),col='red') 90 | plot(density(params$beta)) 91 | lines(density(params_new$beta),col='red') 92 | ``` 93 | 94 | 95 | 96 | 97 | 98 | 99 | 100 | 101 | 102 | 103 | 104 | 105 | 106 | 107 | 108 | 109 | 110 | 111 | 112 | 113 | 114 | 115 | 116 | -------------------------------------------------------------------------------- /Linear_regression_I/data.csv: -------------------------------------------------------------------------------- 1 | Sepal.Length,Sepal.Width,Petal.Length,Petal.Width,Species 2 | 5.1,3.5,1.4,0.2,setosa 3 | 4.9,3,1.4,0.2,setosa 4 | 4.7,3.2,1.3,0.2,setosa 5 | 4.6,3.1,1.5,0.2,setosa 6 | 5,3.6,1.4,0.2,setosa 7 | 5.4,3.9,1.7,0.4,setosa 8 | 4.6,3.4,1.4,0.3,setosa 9 | 5,3.4,1.5,0.2,setosa 10 | 4.4,2.9,1.4,0.2,setosa 11 | 4.9,3.1,1.5,0.1,setosa 12 | 5.4,3.7,1.5,0.2,setosa 13 | 4.8,3.4,1.6,0.2,setosa 14 | 4.8,3,1.4,0.1,setosa 15 | 4.3,3,1.1,0.1,setosa 16 | 5.8,4,1.2,0.2,setosa 17 | 5.7,4.4,1.5,0.4,setosa 18 | 5.4,3.9,1.3,0.4,setosa 19 | 5.1,3.5,1.4,0.3,setosa 20 | 5.7,3.8,1.7,0.3,setosa 21 | 5.1,3.8,1.5,0.3,setosa 22 | 5.4,3.4,1.7,0.2,setosa 23 | 5.1,3.7,1.5,0.4,setosa 24 | 4.6,3.6,1,0.2,setosa 25 | 5.1,3.3,1.7,0.5,setosa 26 | 4.8,3.4,1.9,0.2,setosa 27 | 5,3,1.6,0.2,setosa 28 | 5,3.4,1.6,0.4,setosa 29 | 5.2,3.5,1.5,0.2,setosa 30 | 5.2,3.4,1.4,0.2,setosa 31 | 4.7,3.2,1.6,0.2,setosa 32 | 4.8,3.1,1.6,0.2,setosa 33 | 5.4,3.4,1.5,0.4,setosa 34 | 5.2,4.1,1.5,0.1,setosa 35 | 5.5,4.2,1.4,0.2,setosa 36 | 4.9,3.1,1.5,0.2,setosa 37 | 5,3.2,1.2,0.2,setosa 38 | 5.5,3.5,1.3,0.2,setosa 39 | 4.9,3.6,1.4,0.1,setosa 40 | 4.4,3,1.3,0.2,setosa 41 | 5.1,3.4,1.5,0.2,setosa 42 | 5,3.5,1.3,0.3,setosa 43 | 4.5,2.3,1.3,0.3,setosa 44 | 4.4,3.2,1.3,0.2,setosa 45 | 5,3.5,1.6,0.6,setosa 46 | 5.1,3.8,1.9,0.4,setosa 47 | 4.8,3,1.4,0.3,setosa 48 | 5.1,3.8,1.6,0.2,setosa 49 | 4.6,3.2,1.4,0.2,setosa 50 | 5.3,3.7,1.5,0.2,setosa 51 | 5,3.3,1.4,0.2,setosa 52 | 7,3.2,4.7,1.4,versicolor 53 | 6.4,3.2,4.5,1.5,versicolor 54 | 6.9,3.1,4.9,1.5,versicolor 55 | 5.5,2.3,4,1.3,versicolor 56 | 6.5,2.8,4.6,1.5,versicolor 57 | 5.7,2.8,4.5,1.3,versicolor 58 | 6.3,3.3,4.7,1.6,versicolor 59 | 4.9,2.4,3.3,1,versicolor 60 | 6.6,2.9,4.6,1.3,versicolor 61 | 5.2,2.7,3.9,1.4,versicolor 62 | 5,2,3.5,1,versicolor 63 | 5.9,3,4.2,1.5,versicolor 64 | 6,2.2,4,1,versicolor 65 | 6.1,2.9,4.7,1.4,versicolor 66 | 5.6,2.9,3.6,1.3,versicolor 67 | 6.7,3.1,4.4,1.4,versicolor 68 | 5.6,3,4.5,1.5,versicolor 69 | 5.8,2.7,4.1,1,versicolor 70 | 6.2,2.2,4.5,1.5,versicolor 71 | 5.6,2.5,3.9,1.1,versicolor 72 | 5.9,3.2,4.8,1.8,versicolor 73 | 6.1,2.8,4,1.3,versicolor 74 | 6.3,2.5,4.9,1.5,versicolor 75 | 6.1,2.8,4.7,1.2,versicolor 76 | 6.4,2.9,4.3,1.3,versicolor 77 | 6.6,3,4.4,1.4,versicolor 78 | 6.8,2.8,4.8,1.4,versicolor 79 | 6.7,3,5,1.7,versicolor 80 | 6,2.9,4.5,1.5,versicolor 81 | 5.7,2.6,3.5,1,versicolor 82 | 5.5,2.4,3.8,1.1,versicolor 83 | 5.5,2.4,3.7,1,versicolor 84 | 5.8,2.7,3.9,1.2,versicolor 85 | 6,2.7,5.1,1.6,versicolor 86 | 5.4,3,4.5,1.5,versicolor 87 | 6,3.4,4.5,1.6,versicolor 88 | 6.7,3.1,4.7,1.5,versicolor 89 | 6.3,2.3,4.4,1.3,versicolor 90 | 5.6,3,4.1,1.3,versicolor 91 | 5.5,2.5,4,1.3,versicolor 92 | 5.5,2.6,4.4,1.2,versicolor 93 | 6.1,3,4.6,1.4,versicolor 94 | 5.8,2.6,4,1.2,versicolor 95 | 5,2.3,3.3,1,versicolor 96 | 5.6,2.7,4.2,1.3,versicolor 97 | 5.7,3,4.2,1.2,versicolor 98 | 5.7,2.9,4.2,1.3,versicolor 99 | 6.2,2.9,4.3,1.3,versicolor 100 | 5.1,2.5,3,1.1,versicolor 101 | 5.7,2.8,4.1,1.3,versicolor 102 | 6.3,3.3,6,2.5,virginica 103 | 5.8,2.7,5.1,1.9,virginica 104 | 7.1,3,5.9,2.1,virginica 105 | 6.3,2.9,5.6,1.8,virginica 106 | 6.5,3,5.8,2.2,virginica 107 | 7.6,3,6.6,2.1,virginica 108 | 4.9,2.5,4.5,1.7,virginica 109 | 7.3,2.9,6.3,1.8,virginica 110 | 6.7,2.5,5.8,1.8,virginica 111 | 7.2,3.6,6.1,2.5,virginica 112 | 6.5,3.2,5.1,2,virginica 113 | 6.4,2.7,5.3,1.9,virginica 114 | 6.8,3,5.5,2.1,virginica 115 | 5.7,2.5,5,2,virginica 116 | 5.8,2.8,5.1,2.4,virginica 117 | 6.4,3.2,5.3,2.3,virginica 118 | 6.5,3,5.5,1.8,virginica 119 | 7.7,3.8,6.7,2.2,virginica 120 | 7.7,2.6,6.9,2.3,virginica 121 | 6,2.2,5,1.5,virginica 122 | 6.9,3.2,5.7,2.3,virginica 123 | 5.6,2.8,4.9,2,virginica 124 | 7.7,2.8,6.7,2,virginica 125 | 6.3,2.7,4.9,1.8,virginica 126 | 6.7,3.3,5.7,2.1,virginica 127 | 7.2,3.2,6,1.8,virginica 128 | 6.2,2.8,4.8,1.8,virginica 129 | 6.1,3,4.9,1.8,virginica 130 | 6.4,2.8,5.6,2.1,virginica 131 | 7.2,3,5.8,1.6,virginica 132 | 7.4,2.8,6.1,1.9,virginica 133 | 7.9,3.8,6.4,2,virginica 134 | 6.4,2.8,5.6,2.2,virginica 135 | 6.3,2.8,5.1,1.5,virginica 136 | 6.1,2.6,5.6,1.4,virginica 137 | 7.7,3,6.1,2.3,virginica 138 | 6.3,3.4,5.6,2.4,virginica 139 | 6.4,3.1,5.5,1.8,virginica 140 | 6,3,4.8,1.8,virginica 141 | 6.9,3.1,5.4,2.1,virginica 142 | 6.7,3.1,5.6,2.4,virginica 143 | 6.9,3.1,5.1,2.3,virginica 144 | 5.8,2.7,5.1,1.9,virginica 145 | 6.8,3.2,5.9,2.3,virginica 146 | 6.7,3.3,5.7,2.5,virginica 147 | 6.7,3,5.2,2.3,virginica 148 | 6.3,2.5,5,1.9,virginica 149 | 6.5,3,5.2,2,virginica 150 | 6.2,3.4,5.4,2.3,virginica 151 | 5.9,3,5.1,1.8,virginica 152 | -------------------------------------------------------------------------------- /Linear_regression_I/model.stan: -------------------------------------------------------------------------------- 1 | data { 2 | int N; //number of data 3 | vector[N] x; //covariates 4 | vector[N] y; //variates 5 | } 6 | 7 | parameters { 8 | real alpha;//intercept 9 | real beta; //slope 10 | real sigma; //scatter 11 | } 12 | 13 | model { 14 | //priors 15 | alpha ~ normal(0, 10); 16 | beta ~ normal(0,10); 17 | sigma ~ normal(0,1); 18 | 19 | y ~ normal(alpha + beta * x, sigma); //likelihood 20 | } 21 | 22 | generated quantities { 23 | vector[N] y_sim; //simulated data from posterior 24 | 25 | for(i in 1:N) 26 | y_sim[i] = normal_rng(alpha + beta * x[i], sigma); 27 | } 28 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Stan tutorials 2 | 3 | This folder contains codes and models used in the Stan youtube channel tutorials. 4 | 5 | 6 | [https://www.youtube.com/channel/UCwgN5srGpBH4M-Zc2cAluOA](https://www.youtube.com/channel/UCwgN5srGpBH4M-Zc2cAluOA) 7 | -------------------------------------------------------------------------------- /Split_testing_I/.Rhistory: -------------------------------------------------------------------------------- 1 | plot(seq(0,10,0.1), dnorm(seq(0,10,0.1),1,1), cex.axis=2) 2 | plot(seq(0,10,0.1), dnorm(seq(0,10,0.1),1,1), cex.axis=2,cex.labels=2) 3 | plot(seq(0,10,0.1), dnorm(seq(0,10,0.1),1,1), cex.axis=2,cex.lab=2) 4 | plot(seq(0,10,0.1), dnorm(seq(0,10,0.1),1,1), cex.axis=2,cex.lab=2, cex.yaxis=2) 5 | plot(seq(0,10,0.1), dnorm(seq(0,10,0.1),1,1), cex.axis=2,cex.lab=2, cex.axes=2) 6 | source('~/.active-rstudio-document', echo=TRUE) 7 | par(oma=c(2,2,1,1)) 8 | plot(seq(0,10,0.1), dnorm(seq(0,10,0.1),1,1), cex.axis=2,cex.lab=2, cex.axes=2) 9 | par(oma=c(2,2,1,1),mar=c(4,4,1,1)) 10 | plot(seq(0,10,0.1), dnorm(seq(0,10,0.1),1,1), cex.axis=2,cex.lab=2, cex.axes=2) 11 | lodds = 1/c(45057474,7509579, 144415, 2180, 97, 10.3) 12 | lodds 13 | lprizes 14 | source('~/Desktop/lotto_v_euromillions.R', echo=TRUE) 15 | source('~/Desktop/lotto_v_euromillions.R', echo=TRUE) 16 | source('~/Desktop/lotto_v_euromillions.R', echo=TRUE) 17 | source('~/Desktop/lotto_v_euromillions.R', echo=TRUE) 18 | plot(lodds, lprizes, ty='l', log='xy', xlab='prize [£]', ylab='odds') 19 | y 20 | lprizes 21 | plot(lodds, lprizes, ty='l', log='xy', ylab='prize [£]', xlab='odds') 22 | legend('topright', legend = c('lotto'), col=c('black'), lty=c('solid ')) 23 | legend('topright', legend = c('lotto'), col=c('black'), lty=c('solid'), bty='n') 24 | legend('topright', legend = c('lotto'), col=c('black'), ty=c('solid'), bty='n') 25 | legend('topright', legend = c('lotto'), color=c('black'), ty=c('solid'), bty='n') 26 | legend('topright', legend = c('lotto'), c=c('black'), ty=c('solid'), bty='n') 27 | legend('topright', legend = c('lotto'), col=c('black'), ty=c('solid'), bty='n') 28 | legend('topright', legend = c('lotto'), col=c('black'), lty=c('solid'), bty='n') 29 | legend('topright', legend = c('lotto'), col=c('black'), lty=c('solid'), bty='n') 30 | plot(lodds, lprizes, ty='l', log='xy', ylab='prize [£]', xlab='odds') 31 | eprizes = c(44236509, 315019, 50213,2944, 135, 54, 61, 13, 10, 8, 7, 5, 3) 32 | length(eprizes) 33 | eodds = 1/c(139838160, 6991908, 3107515, 621503, 31075, 14125, 13811, 985,706,314,188,49,22) 34 | length(eodds) 35 | millionaireodds = mean(1/c(3.5e6, 5e6, 4.6e6, 6.4e6)) 36 | millionaireodds 37 | eodds 38 | plot(lodds, lprizes, ty='l', log='xy', ylab='prize [£]', xlab='odds', lwd=2) 39 | # Should you play lotto or euromillions? 40 | #lotto odds 41 | lodds = 1/c(45057474,7509579, 144415, 2180, 97) 42 | ljackpot = mean(c(5078822, 3800000, 12634862, 10884211, 8609036)) 43 | lprizes = c(ljackpot, 1e6, 1750, 140, 30) 44 | #euromillions odds 45 | millionaireodds = mean(c(3.5e6, 5e6, 4.6e6, 6.4e6)) 46 | eodds = 1/c(139838160, millionaireodds, 6991908, 3107515, 621503, 31075, 14125, 13811, 985,706,314,188,49,22) 47 | eprizes = c(44236509, 1000000,315019, 50213,2944, 135, 54, 61, 13, 10, 8, 7, 5, 3) 48 | plot(lodds, lprizes, ty='l', log='xy', ylab='prize [£]', xlab='odds', lwd=2) 49 | lines(eodds, eprizes, ty='l', log='xy', col='orange',lwd=2) 50 | legend('topright', legend = c('lotto', 'euromil'), col=c('black','orange'), lty=c('solid'), bty='n') 51 | require(magicaxis) 52 | #lotto odds 53 | lodds = 1/c(45057474,7509579, 144415, 2180, 97) 54 | ljackpot = mean(c(5078822, 3800000, 12634862, 10884211, 8609036)) 55 | lprizes = c(ljackpot, 1e6, 1750, 140, 30) 56 | #euromillions odds 57 | millionaireodds = mean(c(3.5e6, 5e6, 4.6e6, 6.4e6)) 58 | eodds = 1/c(139838160, millionaireodds, 6991908, 3107515, 621503, 31075, 14125, 13811, 985,706,314,188,49,22) 59 | eprizes = c(44236509, 1000000,315019, 50213,2944, 135, 54, 61, 13, 10, 8, 7, 5, 3) 60 | plot(lodds, lprizes, ty='l', log='xy', ylab='prize [£]', xlab='odds', lwd=2, axes=FALSE) 61 | lines(eodds, eprizes, ty='l', log='xy', col='orange',lwd=2) 62 | legend('topright', legend = c('lotto', 'euromil'), col=c('black','orange'), lty=c('solid'), bty='n') 63 | magaxis(1:2); magaxis(3:4, labels=FALSE) 64 | install.packages('sm') 65 | #lotto odds 66 | lodds = 1/c(45057474,7509579, 144415, 2180, 97) 67 | # Should you play lotto or euromillions? 68 | #lotto odds 69 | lodds = 1/c(45057474,7509579, 144415, 2180, 97) 70 | ljackpot = mean(c(5078822, 3800000, 12634862, 10884211, 8609036)) 71 | lprizes = c(ljackpot, 1e6, 1750, 140, 30) 72 | #euromillions odds 73 | millionaireodds = mean(c(3.5e6, 5e6, 4.6e6, 6.4e6)) 74 | eodds = 1/c(139838160, millionaireodds, 6991908, 3107515, 621503, 31075, 14125, 13811, 985,706,314,188,49,22) 75 | eprizes = c(44236509, 1000000,315019, 50213,2944, 135, 54, 61, 13, 10, 8, 7, 5, 3) 76 | plot(lodds, lprizes, ty='l', log='xy', ylab='prize [£]', xlab='odds', lwd=2) 77 | lines(eodds, eprizes, ty='l', log='xy', col='orange',lwd=2) 78 | legend('topright', legend = c('lotto', 'euromil'), col=c('black','orange'), lty=c('solid'), bty='n') 79 | 1/103 80 | 1/104 81 | abline(v=1/104) 82 | for(i in 1:10){print(i)} 83 | plays=1e-3;lines(c(plays, plays), c(0,10)) 84 | plays=1e-3;lines(c(plays, plays), c(0,100)) 85 | plays=1e-3;lines(c(plays, plays), c(0,100)) 86 | lodds = 1/c(45057474,7509579, 144415, 2180, 97) 87 | ljackpot = mean(c(5078822, 3800000, 12634862, 10884211, 8609036)) 88 | lprizes = c(ljackpot, 1e6, 1750, 140, 30) 89 | #euromillions odds 90 | millionaireodds = mean(c(3.5e6, 5e6, 4.6e6, 6.4e6)) 91 | eodds = 1/c(139838160, millionaireodds, 6991908, 3107515, 621503, 31075, 14125, 13811, 985,706,314,188,49,22) 92 | eprizes = c(44236509, 1000000,315019, 50213,2944, 135, 54, 61, 13, 10, 8, 7, 5, 3) 93 | plot(lodds, lprizes, ty='l', log='xy', ylab='prize [£]', xlab='odds', lwd=2) 94 | lines(eodds, eprizes, ty='l', log='xy', col='orange',lwd=2) 95 | legend('topright', legend = c('lotto', 'euromil'), col=c('black','orange'), lty=c('solid'), bty='n') 96 | for(i in 1:20){ 97 | plays = i*104 #104 draws per year 98 | lines(c(plays, plays), c(0,50), lwd=2) 99 | } 100 | 1e2 101 | # Should you play lotto or euromillions? 102 | #lotto odds 103 | lodds = 1/c(45057474,7509579, 144415, 2180, 97) 104 | ljackpot = mean(c(5078822, 3800000, 12634862, 10884211, 8609036)) 105 | lprizes = c(ljackpot, 1e6, 1750, 140, 30) 106 | #euromillions odds 107 | millionaireodds = mean(c(3.5e6, 5e6, 4.6e6, 6.4e6)) 108 | eodds = 1/c(139838160, millionaireodds, 6991908, 3107515, 621503, 31075, 14125, 13811, 985,706,314,188,49,22) 109 | eprizes = c(44236509, 1000000,315019, 50213,2944, 135, 54, 61, 13, 10, 8, 7, 5, 3) 110 | plot(lodds, lprizes, ty='l', log='xy', ylab='prize [£]', xlab='odds', lwd=2) 111 | lines(eodds, eprizes, ty='l', log='xy', col='orange',lwd=2) 112 | legend('topright', legend = c('lotto', 'euromil'), col=c('black','orange'), lty=c('solid'), bty='n') 113 | for(i in 1:20){ 114 | plays = i*104 #104 draws per year 115 | lines(c(plays, plays), c(0,100), lwd=2) 116 | } 117 | # Should you play lotto or euromillions? 118 | #lotto odds 119 | lodds = 1/c(45057474,7509579, 144415, 2180, 97) 120 | ljackpot = mean(c(5078822, 3800000, 12634862, 10884211, 8609036)) 121 | lprizes = c(ljackpot, 1e6, 1750, 140, 30) 122 | #euromillions odds 123 | millionaireodds = mean(c(3.5e6, 5e6, 4.6e6, 6.4e6)) 124 | eodds = 1/c(139838160, millionaireodds, 6991908, 3107515, 621503, 31075, 14125, 13811, 985,706,314,188,49,22) 125 | eprizes = c(44236509, 1000000,315019, 50213,2944, 135, 54, 61, 13, 10, 8, 7, 5, 3) 126 | plot(lodds, lprizes, ty='l', log='xy', ylab='prize [£]', xlab='odds', lwd=2) 127 | lines(eodds, eprizes, ty='l', log='xy', col='orange',lwd=2) 128 | legend('topright', legend = c('lotto', 'euromil'), col=c('black','orange'), lty=c('solid'), bty='n') 129 | for(i in 1:20){ 130 | plays = i*104 #104 draws per year 131 | lines(c(plays, plays), c(0,100), lwd=2) 132 | } 133 | # Should you play lotto or euromillions? 134 | #lotto odds 135 | lodds = 1/c(45057474,7509579, 144415, 2180, 97) 136 | ljackpot = mean(c(5078822, 3800000, 12634862, 10884211, 8609036)) 137 | lprizes = c(ljackpot, 1e6, 1750, 140, 30) 138 | #euromillions odds 139 | millionaireodds = mean(c(3.5e6, 5e6, 4.6e6, 6.4e6)) 140 | eodds = 1/c(139838160, millionaireodds, 6991908, 3107515, 621503, 31075, 14125, 13811, 985,706,314,188,49,22) 141 | eprizes = c(44236509, 1000000,315019, 50213,2944, 135, 54, 61, 13, 10, 8, 7, 5, 3) 142 | plot(lodds, lprizes, ty='l', log='xy', ylab='prize [£]', xlab='odds', lwd=2) 143 | lines(eodds, eprizes, ty='l', col='orange',lwd=2) 144 | legend('topright', legend = c('lotto', 'euromil'), col=c('black','orange'), lty=c('solid'), bty='n') 145 | for(i in 1:20){ 146 | plays = i*104 #104 draws per year 147 | lines(c(plays, plays), c(0,100), lwd=2) 148 | } 149 | # Should you play lotto or euromillions? 150 | #lotto odds 151 | lodds = 1/c(45057474,7509579, 144415, 2180, 97) 152 | ljackpot = mean(c(5078822, 3800000, 12634862, 10884211, 8609036)) 153 | lprizes = c(ljackpot, 1e6, 1750, 140, 30) 154 | #euromillions odds 155 | millionaireodds = mean(c(3.5e6, 5e6, 4.6e6, 6.4e6)) 156 | eodds = 1/c(139838160, millionaireodds, 6991908, 3107515, 621503, 31075, 14125, 13811, 985,706,314,188,49,22) 157 | eprizes = c(44236509, 1000000,315019, 50213,2944, 135, 54, 61, 13, 10, 8, 7, 5, 3) 158 | plot(lodds, lprizes, ty='l', log='xy', ylab='prize [£]', xlab='odds', lwd=2) 159 | lines(eodds, eprizes, ty='l', col='orange',lwd=2) 160 | legend('topright', legend = c('lotto', 'euromil'), col=c('black','orange'), lty=c('solid'), bty='n') 161 | for(i in 1:20){ 162 | plays = 1/(i*104) #104 draws per year 163 | lines(c(plays, plays), c(0,100), lwd=2) 164 | } 165 | plays 166 | lines(c(1e-3,1e-3), c(0,100)) 167 | lines(c(1e-3,1e-3), c(0,1000)) 168 | # Should you play lotto or euromillions? 169 | #lotto odds 170 | lodds = 1/c(45057474,7509579, 144415, 2180, 97) 171 | ljackpot = mean(c(5078822, 3800000, 12634862, 10884211, 8609036)) 172 | lprizes = c(ljackpot, 1e6, 1750, 140, 30) 173 | #euromillions odds 174 | millionaireodds = mean(c(3.5e6, 5e6, 4.6e6, 6.4e6)) 175 | eodds = 1/c(139838160, millionaireodds, 6991908, 3107515, 621503, 31075, 14125, 13811, 985,706,314,188,49,22) 176 | eprizes = c(44236509, 1000000,315019, 50213,2944, 135, 54, 61, 13, 10, 8, 7, 5, 3) 177 | plot(lodds, lprizes, ty='l', log='xy', ylab='prize [£]', xlab='odds', lwd=2) 178 | lines(eodds, eprizes, ty='l', col='orange',lwd=2) 179 | legend('topright', legend = c('lotto', 'euromil'), col=c('black','orange'), lty=c('solid'), bty='n') 180 | for(i in 1:20){ 181 | plays = 1/(i*104) #104 draws per year 182 | lines(c(plays, plays), c(0,100), lwd=2) 183 | } 184 | # Should you play lotto or euromillions? 185 | #lotto odds 186 | lodds = 1/c(45057474,7509579, 144415, 2180, 97) 187 | ljackpot = mean(c(5078822, 3800000, 12634862, 10884211, 8609036)) 188 | lprizes = c(ljackpot, 1e6, 1750, 140, 30) 189 | #euromillions odds 190 | millionaireodds = mean(c(3.5e6, 5e6, 4.6e6, 6.4e6)) 191 | eodds = 1/c(139838160, millionaireodds, 6991908, 3107515, 621503, 31075, 14125, 13811, 985,706,314,188,49,22) 192 | eprizes = c(44236509, 1000000,315019, 50213,2944, 135, 54, 61, 13, 10, 8, 7, 5, 3) 193 | plot(lodds, lprizes, ty='l', log='xy', ylab='prize [£]', xlab='odds', lwd=2) 194 | lines(eodds, eprizes, ty='l', col='orange',lwd=2) 195 | legend('topright', legend = c('lotto', 'euromil'), col=c('black','orange'), lty=c('solid'), bty='n') 196 | for(i in 1:20){ 197 | plays = 1/(i*104) #104 draws per year 198 | lines(c(plays, plays), c(0.1,10), lwd=2) 199 | } 200 | 45057474/104 201 | install.packages('digest', 'httr', 'jsonlite') 202 | shiny::runApp('Documents/Projects/LoveMatchCalc/LoveMatchCalculator') 203 | if (interactive()) { 204 | ui <- fluidPage( 205 | actionButton("update", "Update other buttons"), 206 | br(), 207 | actionButton("goButton", "Go"), 208 | br(), 209 | actionButton("goButton2", "Go 2", icon = icon("area-chart")), 210 | br(), 211 | actionButton("goButton3", "Go 3") 212 | ) 213 | server <- function(input, output, session) { 214 | observe({ 215 | req(input$update) 216 | # Updates goButton's label and icon 217 | updateActionButton(session, "goButton", 218 | label = "New label", 219 | icon = icon("calendar")) 220 | # Leaves goButton2's label unchaged and 221 | # removes its icon 222 | updateActionButton(session, "goButton2", 223 | icon = character(0)) 224 | # Leaves goButton3's icon, if it exists, 225 | # unchaged and changes its label 226 | updateActionButton(session, "goButton3", 227 | label = "New label 3") 228 | }) 229 | } 230 | shinyApp(ui, server) 231 | } 232 | if (interactive()) { 233 | library(shiny) 234 | library(shinyMobile) 235 | shiny::shinyApp( 236 | ui = f7Page( 237 | title = "Update f7Button", 238 | init = f7Init(theme = "light", skin = "ios"), 239 | f7SingleLayout( 240 | navbar = f7Navbar(title = "Update f7Button"), 241 | f7Button( 242 | "test", 243 | "Test", 244 | color = "orange", 245 | outline = FALSE, 246 | fill = TRUE, 247 | shadow = FALSE, 248 | rounded = FALSE, 249 | size = NULL), 250 | f7Toggle("prout", "Update Button") 251 | ) 252 | ), 253 | server = function(input, output, session) { 254 | observe(print(input$test)) 255 | observeEvent(input$prout, { 256 | if (input$prout) { 257 | updateF7Button( 258 | inputId = "test", 259 | label = "Updated", 260 | color = "purple", 261 | shadow = TRUE, 262 | rounded = TRUE, 263 | size = "large" 264 | ) 265 | } 266 | }) 267 | } 268 | ) 269 | } 270 | if (interactive()) { 271 | library(shiny) 272 | library(shinyMobile) 273 | shiny::shinyApp( 274 | ui = f7Page( 275 | title = "Update f7Button", 276 | init = f7Init(theme = "light", skin = "ios"), 277 | f7SingleLayout( 278 | navbar = f7Navbar(title = "Update f7Button"), 279 | f7Button( 280 | "test", 281 | "Test", 282 | color = "orange", 283 | outline = FALSE, 284 | fill = TRUE, 285 | shadow = FALSE, 286 | rounded = FALSE, 287 | size = NULL), 288 | f7Toggle("prout", "Update Button") 289 | ) 290 | ), 291 | server = function(input, output, session) { 292 | observe(print(input$test)) 293 | observeEvent(input$prout, { 294 | if (input$prout) { 295 | updateF7Button( 296 | inputId = "test", 297 | label = "Updated", 298 | color = "purple", 299 | shadow = TRUE, 300 | rounded = TRUE, 301 | size = "large" 302 | ) 303 | } 304 | }) 305 | } 306 | ) 307 | } 308 | runApp('Documents/Projects/LoveMatchCalc/LoveMatchCalculator') 309 | runApp() 310 | runApp() 311 | runApp('Documents/Projects/LoveMatchCalc/LoveMatchCalculator') 312 | if (interactive()) { 313 | library(shiny) 314 | shinyApp( 315 | ui = fluidPage( 316 | useShinyjs(), # Set up shinyjs 317 | actionButton("btn", "Click me"), 318 | p(id = "element", "Watch what happens to me") 319 | ), 320 | server = function(input, output) { 321 | observeEvent(input$btn, { 322 | # Change the following line for more examples 323 | html("element", paste0("The date is ", date())) 324 | }) 325 | } 326 | ) 327 | } 328 | ## Not run: 329 | # The shinyjs function call in the above app can be replaced by 330 | # any of the following examples to produce similar Shiny apps 331 | html("element", "Hello!") 332 | html("element", " Hello!", TRUE) 333 | html("element", "bold that was achieved with HTML") 334 | local({val <- "some text"; html("element", val)}) 335 | html(id = "element", add = TRUE, html = input$btn) 336 | runApp('Documents/Projects/LoveMatchCalc/LoveMatchCalculator') 337 | runApp('Documents/Projects/LoveMatchCalc/LoveMatchCalculator') 338 | runApp('Documents/Projects/LoveMatchCalc/LoveMatchCalculator') 339 | runApp('Documents/Projects/LoveMatchCalc/LoveMatchCalculator') 340 | runApp('Documents/Projects/LoveMatchCalc/LoveMatchCalculator') 341 | runApp('Documents/Projects/LoveMatchCalc/LoveMatchCalculator') 342 | runApp('Documents/Projects/LoveMatchCalc/LoveMatchCalculator') 343 | runApp('Documents/Projects/LoveMatchCalc/LoveMatchCalculator') 344 | if (interactive()) { 345 | library(shiny) 346 | library(shinyMobile) 347 | ui <- f7Page( 348 | f7SingleLayout( 349 | navbar = f7Navbar(title = "updateF7Text"), 350 | f7Fab("trigger", "Click me"), 351 | f7Text( 352 | inputId = "text", 353 | label = "Caption", 354 | value = "Some text", 355 | placeholder = "Your text here" 356 | ), 357 | verbatimTextOutput("value") 358 | ) 359 | ) 360 | server <- function(input, output, session) { 361 | output$value <- renderPrint(input$text) 362 | observeEvent(input$trigger, { 363 | updateF7Text(session, "text", value = "Updated Text") 364 | }) 365 | } 366 | shinyApp(ui, server) 367 | } 368 | runApp('Documents/Projects/LoveMatchCalc/LoveMatchCalculator') 369 | runApp('Documents/Projects/LoveMatchCalc/LoveMatchCalculator') 370 | runApp('Documents/Projects/LoveMatchCalc/LoveMatchCalculator') 371 | runApp('Documents/Projects/LoveMatchCalc/LoveMatchCalculator') 372 | runApp('Documents/Projects/LoveMatchCalc/LoveMatchCalculator') 373 | runApp('Documents/Projects/LoveMatchCalc/LoveMatchCalculator') 374 | runApp('Documents/Projects/LoveMatchCalc/LoveMatchCalculator') 375 | runApp() 376 | runApp('Documents/Projects/LoveMatchCalc/LoveMatchCalculator') 377 | runApp('Documents/Projects/LoveMatchCalc/LoveMatchCalculator') 378 | runApp() 379 | runApp('Documents/Projects/LoveMatchCalc/LoveMatchCalculator') 380 | runApp('Documents/Projects/LoveMatchCalc/LoveMatchCalculator') 381 | runApp('Documents/Projects/LoveMatchCalc/LoveMatchCalculator') 382 | runApp() 383 | runApp('Documents/Projects/LoveMatchCalc/LoveMatchCalculator') 384 | runApp('Documents/Projects/LoveMatchCalc/LoveMatchCalculator') 385 | if (interactive()) { 386 | shiny::shinyApp( 387 | ui = shiny::fluidPage( 388 | useShinyjs(), # Set up shinyjs 389 | shiny::actionButton("btn", "Click me"), 390 | p(id = "element", "Click me to change my text"), 391 | verbatimTextOutput("element") 392 | ), 393 | server = function(input, output) { 394 | # Change the following lines for more examples 395 | onclick("btn", info(date())) 396 | onclick("element", output$element <- renderPrint({runif(n = 1,0,1000)})) 397 | } 398 | ) 399 | } 400 | runApp('Documents/Projects/LoveMatchCalc/LoveMatchCalculator') 401 | runApp('Documents/Projects/LoveMatchCalc/LoveMatchCalculator') 402 | runApp() 403 | runApp('Documents/Projects/LoveMatchCalc/LoveMatchCalculator') 404 | runApp('Documents/Projects/LoveMatchCalc/LoveMatchCalculator') 405 | runApp('Documents/Projects/LoveMatchCalc/LoveMatchCalculator') 406 | runApp('Documents/Projects/LoveMatchCalc/LoveMatchCalculator') 407 | runApp('Documents/Projects/LoveMatchCalc/LoveMatchCalculator') 408 | runApp('Documents/Projects/LoveMatchCalc/LoveMatchCalculator') 409 | runApp('Documents/Projects/LoveMatchCalc/LoveMatchCalculator') 410 | runApp('Documents/Projects/LoveMatchCalc/LoveMatchCalculator') 411 | data(iris) 412 | head(iris) 413 | versicolor = which(iris$Species == 'versicolor') 414 | x = iris$Sepal.Length[versicolor] 415 | y = iris$Petal.Length[versicolor] 416 | data = list( 417 | N = length(x), 418 | x = x, 419 | y = y 420 | ) 421 | data = list( 422 | N = length(x), 423 | x = x, 424 | y = y 425 | ) 426 | require(rstan) 427 | fit = stan(file = 'model.stan',data=data) 428 | print(fit) 429 | plot(x,y) 430 | params = extract(fit) 431 | alpha = mean(params$alpha) 432 | beta = mean(params$beta) 433 | abline(a=alpha, b=beta) 434 | xr=seq(4,7.5,0.1) 435 | yCI = sapply(xr, function(x) quantile(params$beta*x + params$alpha, probs=c(0.05,0.95) )) 436 | lines(xr, yCI[1,], col='red') 437 | plot(x,y) 438 | params = extract(fit) 439 | alpha = mean(params$alpha) 440 | beta = mean(params$beta) 441 | abline(a=alpha, b=beta) 442 | xr=seq(4,7.5,0.1) 443 | yCI = sapply(xr, function(x) quantile(params$beta*x + params$alpha, probs=c(0.05,0.95) )) 444 | lines(xr, yCI[1,], col='red') 445 | lines(xr, yCI[2,], col='red') 446 | plot(density(y), xlim=c(2,6.5), ylim=c(0,1.4)) 447 | for(i in 1:10){lines(density(params$y_sim[i,]), col='red')} 448 | y_new = params$y_sim[20, ] #20th simulated dataset 449 | data_new = list( 450 | N = length(x), 451 | x = x, 452 | y = y_new 453 | ) 454 | fit_new = stan(file ='model.stan', data=data_new) 455 | par(mfrow=c(1,2)) 456 | params_new=extract(fit_new) 457 | plot(density(params$alpha)) 458 | lines(density(params_new$alpha),col='red') 459 | plot(density(params$beta)) 460 | lines(density(params_new$beta),col='red') 461 | #get packages 462 | require(rstan) 463 | #get data 464 | input = read.table('data.txt') 465 | setwd('Desktop/STAN/GITHUB/STAN_tutorials/Split_testing_I/') 466 | #get data 467 | input = read.table('data.txt') 468 | #get data 469 | input = read.table('data.txt', header=TRUE) 470 | data = list(y1 = input$facebook, y2 = input$twitter, length = length(input$facebook)) 471 | #get data 472 | input = read.table('data.txt', header=TRUE) 473 | data = list(y1 = input$facebook, y2 = input$twitter, length = length(input$facebook)) 474 | print(data) 475 | #get data 476 | input = read.csv('data.txt') 477 | data = list(y1 = input$facebook, y2 = input$twitter, length = length(input$facebook)) 478 | print(data) 479 | #get data 480 | input = read.csv('data.txt') 481 | data = list(y1 = input$facebook, y2 = input$twitter, length = length(input$facebook)) 482 | print(data) 483 | #fit stan model 484 | fit = stan(file='my_model.stan', data=data) 485 | #fit stan model 486 | fit = stan(file='my_model.stan', data=data) 487 | #fit stan model 488 | fit = stan(file='my_model.stan', data=data) 489 | summary(fit) 490 | #get data 491 | input = read.csv('data.txt') 492 | data = list(y1 = input$facebook, y2 = input$twitter, n = length(input$facebook)) 493 | print(data) 494 | #fit stan model 495 | fit = stan(file='my_model.stan', data=data) 496 | summary(fit) 497 | #extract the parameters and plot the marginalised posteriors 498 | params = extract(fit) 499 | plot(density(params$theta1), main='', xlab='Click-through-rate', ylab='Probability', lwd=2, xlim=c(0,0.9)) 500 | lines(density(params$theta2), lty='dashed', col='gray',lwd=2) 501 | legend('topright', legend=c('theta1', 'theta2'), lty=c('solid', 'dashed'), col=c('black', 'gray'),lwd=2, bty='n') 502 | # plot the generated quantity, the difference between the rates 503 | plot(density(params$Delta_theta), main='', xlab=expression(Delta*theta), ylab='Probability',lwd=2) 504 | fit2 = stan(file='my_model.stan', data=data2) 505 | input2 = read.csv('data2.txt') 506 | data2 = list(y1 = input2$facebook, y2 = input2$twitter, n = length(input2$facebook)) 507 | fit2 = stan(file='my_model.stan', data=data2) 508 | summary(fit) 509 | params=extract(fit) 510 | plot(density(params$theta1), main='', xlab='Click-through-rate', ylab='Probability', lwd=2, xlim=c(0,0.9), ylim=c(0,8.2)) 511 | lines(density(params$theta2), lty='dashed', col='gray',lwd=2) 512 | legend('topright', legend=c(paste(5, ' samples', sep=''),'theta1', 'theta2'), lty=c('solid','solid', 'dashed'), col=c('white', 'black', 'gray'),lwd=2, bty='n') 513 | -------------------------------------------------------------------------------- /Split_testing_I/Split_testing_I.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Split testing I" 3 | output: html_notebook 4 | --- 5 | 6 | ### Split testing I 7 | This notebook accompanies the youtube tutorial "Split testing in Stan: Should I make a Twitter ad or a Facebook ad?" https://youtu.be/Z1IB4FT_Yog 8 | 9 | ```{r} 10 | #get packages 11 | require(rstan) 12 | ``` 13 | 14 | ```{r} 15 | #get data 16 | input = read.csv('data.txt') 17 | data = list(y1 = input$facebook, y2 = input$twitter, n = length(input$facebook)) 18 | print(data) 19 | ``` 20 | 21 | ```{r} 22 | #fit stan model 23 | fit = stan(file='my_model.stan', data=data) 24 | summary(fit) 25 | ``` 26 | 27 | ```{r} 28 | #extract the parameters and plot the marginalised posteriors 29 | params = extract(fit) 30 | plot(density(params$theta1), main='', xlab='Click-through-rate', ylab='Probability', lwd=2, xlim=c(0,0.9)) 31 | lines(density(params$theta2), lty='dashed', col='gray',lwd=2) 32 | legend('topright', legend=c('theta1', 'theta2'), lty=c('solid', 'dashed'), col=c('black', 'gray'),lwd=2, bty='n') 33 | ``` 34 | 35 | ```{r} 36 | # plot the generated quantity, the difference between the rates 37 | plot(density(params$Delta_theta), main='', xlab=expression(Delta*theta), ylab='Probability',lwd=2) 38 | ``` 39 | 40 | ### now try with more data 41 | ```{r} 42 | input2 = read.csv('data2.txt') 43 | data2 = list(y1 = input2$facebook, y2 = input2$twitter, n = length(input2$facebook)) 44 | ``` 45 | 46 | ```{r} 47 | fit2 = stan(file='my_model.stan', data=data2) 48 | summary(fit) 49 | ``` 50 | 51 | ```{r} 52 | params=extract(fit) 53 | plot(density(params$theta1), main='', xlab='Click-through-rate', ylab='Probability', lwd=2, xlim=c(0,0.9), ylim=c(0,8.2)) 54 | lines(density(params$theta2), lty='dashed', col='gray',lwd=2) 55 | legend('topright', legend=c(paste(5, ' samples', sep=''),'theta1', 'theta2'), lty=c('solid','solid', 'dashed'), col=c('white', 'black', 'gray'),lwd=2, bty='n') 56 | ``` 57 | 58 | -------------------------------------------------------------------------------- /Split_testing_I/data.txt: -------------------------------------------------------------------------------- 1 | facebook,twitter 2 | 0,1 3 | 0,1 4 | 0,0 5 | 0,0 6 | 0,0 7 | 1,1 8 | 0,0 9 | 0,0 10 | 0,0 11 | 1,1 12 | -------------------------------------------------------------------------------- /Split_testing_I/data2.txt: -------------------------------------------------------------------------------- 1 | facebook, twitter 2 | 0,1 3 | 0,1 4 | 0,0 5 | 0,0 6 | 0,0 7 | 1,1 8 | 0,0 9 | 0,0 10 | 0,0 11 | 1,1 12 | 1,0 13 | 0,0 14 | 1,0 15 | 0,1 16 | 0,0 17 | 1,0 18 | 0,0 19 | 0,1 20 | 1,1 21 | 1,0 22 | 0,0 23 | 0,0 24 | 1,1 25 | 1,0 26 | 1,0 27 | 0,0 28 | 0,1 29 | 1,0 30 | 1,1 31 | 0,0 32 | 0,0 33 | 1,1 34 | 1,0 35 | 0,1 36 | 1,0 37 | 1,0 38 | 0,1 39 | 1,0 40 | 0,0 41 | 1,1 42 | 0,1 43 | 1,0 44 | 1,0 45 | 0,1 46 | 1,1 47 | 0,0 48 | 0,0 49 | 1,0 50 | 0,1 51 | 0,1 52 | 1,0 53 | 1,0 54 | 0,0 55 | 0,1 56 | 0,1 57 | 1,0 58 | 0,0 59 | 1,0 60 | 1,1 61 | 1,0 62 | 0,1 63 | 0,0 64 | 1,0 65 | 1,1 66 | 0,0 67 | 1,0 68 | 0,0 69 | 1,1 70 | 0,0 71 | 1,0 72 | 0,0 73 | 0,1 74 | 1,0 75 | 1,1 76 | 0,0 77 | 1,0 78 | 0,1 79 | 1,0 80 | 1,0 81 | 1,0 82 | 1,0 83 | 0,0 84 | 1,1 85 | 0,0 86 | 1,0 87 | 1,1 88 | 0,0 89 | 1,0 90 | 0,1 91 | 1,0 92 | 1,0 93 | 0,1 94 | 1,0 95 | 1,0 96 | 1,0 97 | 0,0 98 | 1,1 99 | 0,0 100 | 1,0 101 | 0,1 102 | -------------------------------------------------------------------------------- /Split_testing_I/my_model.stan: -------------------------------------------------------------------------------- 1 | data { 2 | int n; // Number of data points 3 | int y1[n]; // number of facebook clicks 4 | int y2[n]; // number of twitter clicks 5 | } 6 | 7 | parameters { 8 | real theta1; 9 | real theta2; 10 | } 11 | 12 | model { 13 | theta1 ~ beta(1,1); // uniform prior 14 | theta2 ~ beta(1,1); // uniform prior 15 | y1 ~ bernoulli(theta1); // likelihood 16 | y2 ~ bernoulli(theta2); // likelihood 17 | } 18 | 19 | generated quantities { 20 | real Delta_theta; 21 | Delta_theta = theta1 - theta2; 22 | } 23 | --------------------------------------------------------------------------------