The samples in this dataset correspond to 30×30m patches of forest in the US, 115 | collected for the task of predicting each patch’s cover type, 116 | i.e. the dominant species of tree. 117 | We use the LIBSVM dataset, which transforms the data to a binary problem rather than multiclass.
118 | 119 | 120 |covertype
121 |
122 | A matrix with 581012 rows and 55 variables. The first column is the classification labels, the other columns are the 54 explanatory variables.
125 | 126 |https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary.html
129 | 130 | 131 |Install the python packages required by sgmcmc, including TensorFlow and TensorFlow probability. 115 | Uses the tensorflow::install_tensorflow function.
116 | 117 | 118 |installTF()
119 |
120 |
121 | The MNIST dataset is a dataset of handwritten digits from 0-9. Each image is 28x28 pixels. 115 | We can interpret this as a large matrix of numbers, representing the value at each pixel. 116 | These 28x28 matrices are then flattened to be vectors of length 784. For each image, there 117 | is an associated label, which determines which digit the image is of. This image is encoded 118 | as a vector of length 10, where element i is 1 if the digit is i-1 and 0 otherwise. 119 | The dataset is split into two parts: 55,000 data points of training data 120 | and 10,000 points of test data.
121 | 122 | 123 |mnist
124 |
125 | A list with two elements: train and test. 128 | The training set mnist$train is a list with two entries: images and labels, 129 | located at mnist$train$images, mnist$train$labels respectively. 130 | The dataset mnist$train$images is a matrix of size 55000x784, 131 | the labels mnist$train$labels is a matrix of size 55000x10. 132 | The test set mnist$test is a list with two entries: images and labels, 133 | located at mnist$test$images, mnist$test$labels respectively. 134 | The dataset mnist$test$images is a matrix of size 10000x784, 135 | the labels mnist$test$labels is a matrix of size 10000x10.
136 | 137 |The sgmcmc package implements some of the most popular stochastic gradient MCMC methods 115 | including SGLD, SGHMC, SGNHT. It also implements control variates as a way to increase 116 | the efficiency of these methods. The algorithms are implemented using TensorFlow 117 | which means no gradients need to be specified by the user as these are calculated 118 | automatically. It also means the algorithms are efficient.
119 | 120 | 121 | 122 |The main functions of the package are sgld, sghmc and sgnht which implement the methods 126 | stochastic gradient Langevin dynamics, stochastic gradient Hamiltonian Monte Carlo and 127 | stochastic gradient Nose-Hoover Thermostat respectively. Also included are control variate 128 | versions of these algorithms, which uses control variates to increase their efficiency. 129 | These are the functions sgldcv, sghmccv and sgnhtcv.
130 | 131 |Baker, J., Fearnhead, P., Fox, E. B., & Nemeth, C. (2017) 134 | control variates for stochastic gradient Langevin dynamics. Preprint.
135 |Welling, M., & Teh, Y. W. (2011). 136 | Bayesian learning via stochastic gradient Langevin dynamics. ICML (pp. 681-688).
137 |Chen, T., Fox, E. B., & Guestrin, C. (2014). 138 | stochastic gradient Hamiltonian Monte Carlo. In ICML (pp. 1683-1691).
139 |Ding, N., Fang, Y., Babbush, R., Chen, C., Skeel, R. D., & Neven, H. (2014). 140 | Bayesian sampling using stochastic gradient thermostats. NIPS (pp. 3203-3211).
141 | 142 | 143 |