Stan Development Team (2020). RStan: the R interface to Stan. R package version 2.21.2. https://mc-stan.org
87 |
88 |
89 |
90 |
91 |
93 |
94 |
95 |
96 |
105 |
106 |
107 |
108 |
109 |
110 |
111 |
112 |
113 |
--------------------------------------------------------------------------------
/docs/reference/find_inverted_chains-1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/fate-ewi/bayesdfa/f43eb7e57b00aca083abb690697449bc57a51982/docs/reference/find_inverted_chains-1.png
--------------------------------------------------------------------------------
/docs/reference/find_swans-1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/fate-ewi/bayesdfa/f43eb7e57b00aca083abb690697449bc57a51982/docs/reference/find_swans-1.png
--------------------------------------------------------------------------------
/docs/reference/find_swans-2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/fate-ewi/bayesdfa/f43eb7e57b00aca083abb690697449bc57a51982/docs/reference/find_swans-2.png
--------------------------------------------------------------------------------
/docs/reference/find_swans-3.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/fate-ewi/bayesdfa/f43eb7e57b00aca083abb690697449bc57a51982/docs/reference/find_swans-3.png
--------------------------------------------------------------------------------
/docs/reference/plot_fitted-1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/fate-ewi/bayesdfa/f43eb7e57b00aca083abb690697449bc57a51982/docs/reference/plot_fitted-1.png
--------------------------------------------------------------------------------
/docs/reference/plot_fitted-2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/fate-ewi/bayesdfa/f43eb7e57b00aca083abb690697449bc57a51982/docs/reference/plot_fitted-2.png
--------------------------------------------------------------------------------
/docs/reference/plot_loadings-1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/fate-ewi/bayesdfa/f43eb7e57b00aca083abb690697449bc57a51982/docs/reference/plot_loadings-1.png
--------------------------------------------------------------------------------
/docs/reference/plot_loadings-2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/fate-ewi/bayesdfa/f43eb7e57b00aca083abb690697449bc57a51982/docs/reference/plot_loadings-2.png
--------------------------------------------------------------------------------
/docs/reference/plot_loadings-3.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/fate-ewi/bayesdfa/f43eb7e57b00aca083abb690697449bc57a51982/docs/reference/plot_loadings-3.png
--------------------------------------------------------------------------------
/docs/reference/plot_loadings-4.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/fate-ewi/bayesdfa/f43eb7e57b00aca083abb690697449bc57a51982/docs/reference/plot_loadings-4.png
--------------------------------------------------------------------------------
/docs/reference/plot_regime_model-1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/fate-ewi/bayesdfa/f43eb7e57b00aca083abb690697449bc57a51982/docs/reference/plot_regime_model-1.png
--------------------------------------------------------------------------------
/docs/reference/plot_regime_model-2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/fate-ewi/bayesdfa/f43eb7e57b00aca083abb690697449bc57a51982/docs/reference/plot_regime_model-2.png
--------------------------------------------------------------------------------
/docs/reference/plot_regime_model-3.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/fate-ewi/bayesdfa/f43eb7e57b00aca083abb690697449bc57a51982/docs/reference/plot_regime_model-3.png
--------------------------------------------------------------------------------
/docs/reference/plot_trends-1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/fate-ewi/bayesdfa/f43eb7e57b00aca083abb690697449bc57a51982/docs/reference/plot_trends-1.png
--------------------------------------------------------------------------------
/docs/reference/rotate_trends-1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/fate-ewi/bayesdfa/f43eb7e57b00aca083abb690697449bc57a51982/docs/reference/rotate_trends-1.png
--------------------------------------------------------------------------------
/docs/reference/sim_dfa-1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/fate-ewi/bayesdfa/f43eb7e57b00aca083abb690697449bc57a51982/docs/reference/sim_dfa-1.png
--------------------------------------------------------------------------------
/docs/reference/sim_dfa-2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/fate-ewi/bayesdfa/f43eb7e57b00aca083abb690697449bc57a51982/docs/reference/sim_dfa-2.png
--------------------------------------------------------------------------------
/docs/reference/sim_dfa-3.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/fate-ewi/bayesdfa/f43eb7e57b00aca083abb690697449bc57a51982/docs/reference/sim_dfa-3.png
--------------------------------------------------------------------------------
/docs/reference/sim_dfa-4.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/fate-ewi/bayesdfa/f43eb7e57b00aca083abb690697449bc57a51982/docs/reference/sim_dfa-4.png
--------------------------------------------------------------------------------
/docs/reference/sim_dfa-5.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/fate-ewi/bayesdfa/f43eb7e57b00aca083abb690697449bc57a51982/docs/reference/sim_dfa-5.png
--------------------------------------------------------------------------------
/docs/reference/sim_dfa-6.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/fate-ewi/bayesdfa/f43eb7e57b00aca083abb690697449bc57a51982/docs/reference/sim_dfa-6.png
--------------------------------------------------------------------------------
/docs/reference/trend_cor-1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/fate-ewi/bayesdfa/f43eb7e57b00aca083abb690697449bc57a51982/docs/reference/trend_cor-1.png
--------------------------------------------------------------------------------
/docs/reference/trend_cor-2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/fate-ewi/bayesdfa/f43eb7e57b00aca083abb690697449bc57a51982/docs/reference/trend_cor-2.png
--------------------------------------------------------------------------------
/docs/sitemap.xml:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 | /404.html
5 |
6 |
7 | /articles/a1_bayesdfa.html
8 |
9 |
10 | /articles/a2_combining_data.html
11 |
12 |
13 | /articles/a3_covariates.html
14 |
15 |
16 | /articles/a4_smooth.html
17 |
18 |
19 | /articles/a5_estimate_process_sigma.html
20 |
21 |
22 | /articles/a6_compositional.html
23 |
24 |
25 | /articles/a7_bigdata.html
26 |
27 |
28 | /articles/index.html
29 |
30 |
31 | /authors.html
32 |
33 |
34 | /index.html
35 |
36 |
37 | /news/index.html
38 |
39 |
40 | /reference/bayesdfa-package.html
41 |
42 |
43 | /reference/dfa_cv.html
44 |
45 |
46 | /reference/dfa_fitted.html
47 |
48 |
49 | /reference/dfa_loadings.html
50 |
51 |
52 | /reference/dfa_trends.html
53 |
54 |
55 | /reference/find_dfa_trends.html
56 |
57 |
58 | /reference/find_inverted_chains.html
59 |
60 |
61 | /reference/find_regimes.html
62 |
63 |
64 | /reference/find_swans.html
65 |
66 |
67 | /reference/fit_dfa.html
68 |
69 |
70 | /reference/fit_regimes.html
71 |
72 |
73 | /reference/hmm_init.html
74 |
75 |
76 | /reference/index.html
77 |
78 |
79 | /reference/invert_chains.html
80 |
81 |
82 | /reference/is_converged.html
83 |
84 |
85 | /reference/loo.html
86 |
87 |
88 | /reference/plot_fitted.html
89 |
90 |
91 | /reference/plot_loadings.html
92 |
93 |
94 | /reference/plot_regime_model.html
95 |
96 |
97 | /reference/plot_trends.html
98 |
99 |
100 | /reference/predicted.html
101 |
102 |
103 | /reference/rotate_trends.html
104 |
105 |
106 | /reference/sim_dfa.html
107 |
108 |
109 | /reference/trend_cor.html
110 |
111 |
112 |
--------------------------------------------------------------------------------
/inst/include/stan_meta_header.hpp:
--------------------------------------------------------------------------------
1 | // Insert all #include statements here
2 |
--------------------------------------------------------------------------------
/inst/stan/corr.stan:
--------------------------------------------------------------------------------
1 | data {
2 | int N;
3 | vector[N] y;
4 | vector[N] x;
5 | }
6 | parameters {
7 | real beta;
8 | real sigma;
9 | }
10 | model {
11 | // priors
12 | sigma ~ student_t(3, 0, 2);
13 | y ~ normal(beta * x, sigma);
14 | }
15 |
--------------------------------------------------------------------------------
/inst/stan/hmm_gaussian.stan:
--------------------------------------------------------------------------------
1 | // copied with minor modifications from https://github.com/luisdamiano/stancon18
2 | // CC-BY 4.0
3 |
4 | functions {
5 | vector normalize(vector x) {
6 | return x / sum(x);
7 | }
8 | }
9 |
10 | data {
11 | int T; // number of observations (length)
12 | int K; // number of hidden states
13 | array[T] real x_t; // observations
14 | int est_sigma; // flag, whether to estimate sigma (1) or use values passed in (0)
15 | array[T] real sigma_t; // estimated sigma for each observation
16 | }
17 |
18 | parameters {
19 | // Discrete state model
20 | simplex[K] p_1k; // initial state probabilities
21 | array[K] simplex[K] A_ij; // transition probabilities
22 | // A_ij[i][j] = p(z_t = j | z_{t-1} = i)
23 | // Continuous observation model
24 | ordered[K] mu_k; // observation means
25 | array[K] real sigma_k; // observation standard deviations, optionally estimated if est_sigma == 1. Can the quantity K * est_sigma be used to dimension sigma_k?
26 | }
27 |
28 | transformed parameters {
29 | array[T] vector[K] unalpha_tk;
30 |
31 | { // Forward algorithm log p(z_t = j | x_{1:t})
32 | array[K] real accumulator;
33 |
34 | if(est_sigma == 1) {
35 | // use estimated sigma values
36 | unalpha_tk[1] = log(p_1k) + normal_lpdf(x_t[1] | mu_k, sigma_k);
37 | } else {
38 | // otherwise use values passed in by user, fixed
39 | unalpha_tk[1] = log(p_1k) + normal_lpdf(x_t[1] | mu_k, sigma_t[1]);
40 | }
41 |
42 | for (t in 2:T) {
43 | for (j in 1:K) { // j = current (t)
44 | for (i in 1:K) { // i = previous (t-1)
45 | // Murphy (2012) Eq. 17.48
46 | // belief state + transition prob + local evidence at t
47 | if(est_sigma == 1) {
48 | // use estimated sigma values
49 | accumulator[i] = unalpha_tk[t-1, i] + log(A_ij[i, j]) + normal_lpdf(x_t[t] | mu_k[j], sigma_k[j]);
50 | } else {
51 | // otherwise use values passed in by user, fixed
52 | accumulator[i] = unalpha_tk[t-1, i] + log(A_ij[i, j]) + normal_lpdf(x_t[t] | mu_k[j], sigma_t[t]);
53 | }
54 |
55 | }
56 | unalpha_tk[t, j] = log_sum_exp(accumulator);
57 | }
58 | }
59 | } // Forward
60 | }
61 |
62 | model {
63 | sigma_k ~ student_t(3, 0, 1);
64 | mu_k ~ student_t(3, 0, 3);
65 | target += log_sum_exp(unalpha_tk[T]); // Note: update based only on last unalpha_tk
66 | }
67 |
68 | generated quantities {
69 | array[T] vector[K] unbeta_tk;
70 | array[T] vector[K] ungamma_tk;
71 | array[T] vector[K] alpha_tk;
72 | array[T] vector[K] beta_tk;
73 | array[T] vector[K] gamma_tk;
74 | vector[T] log_lik; // added to store log-likelihood for calculation of LOOIC
75 | array[T] int zstar_t;
76 | real logp_zstar_t;
77 |
78 | { // Forward algortihm
79 | for (t in 1:T)
80 | alpha_tk[t] = softmax(unalpha_tk[t]);
81 | } // Forward
82 |
83 | { // Backward algorithm log p(x_{t+1:T} | z_t = j)
84 | array[K] real accumulator;
85 |
86 | for (j in 1:K)
87 | unbeta_tk[T, j] = 1;
88 |
89 | for (tforward in 0:(T-2)) {
90 | int t;
91 | t = T - tforward;
92 |
93 | for (j in 1:K) { // j = previous (t-1)
94 | for (i in 1:K) { // i = next (t)
95 | // Murphy (2012) Eq. 17.58
96 | // backwards t + transition prob + local evidence at t
97 | if(est_sigma == 1) {
98 | accumulator[i] = unbeta_tk[t, i] + log(A_ij[j, i]) + normal_lpdf(x_t[t] | mu_k[i], sigma_k[i]);
99 | } else {
100 | accumulator[i] = unbeta_tk[t, i] + log(A_ij[j, i]) + normal_lpdf(x_t[t] | mu_k[i], sigma_t[t]);
101 | }
102 |
103 | }
104 | unbeta_tk[t-1, j] = log_sum_exp(accumulator);
105 | }
106 | }
107 |
108 | for (t in 1:T)
109 | beta_tk[t] = softmax(unbeta_tk[t]);
110 | } // Backward
111 |
112 | { // Forwards-backwards algorithm log p(z_t = j | x_{1:T})
113 | for(t in 1:T) {
114 | ungamma_tk[t] = alpha_tk[t] .* beta_tk[t];
115 | gamma_tk[t] = normalize(ungamma_tk[t]);
116 | }
117 |
118 | for(t in 1:T) {
119 | // gamma_tk is vector of normalized probability of state given all data, p(z_t = j | x_{1:T})
120 |
121 | log_lik[t] = 0; // initialize
122 | // log_lik accumulator. need to sum to integrate over states,
123 | // p(x_t) = p(x_t | z_t = 1) * p(z_t = 1)...
124 | // gamma_tk is p(x_t | z_t = k), alpha_tk is p(z_t = k | x[1:T])
125 | //if(est_sigma == 1) {
126 | for (j in 1:K) {
127 | log_lik[t] = log_lik[t] + gamma_tk[t,j]*alpha_tk[t,j];
128 | }
129 | //} else {
130 | // for (j in 1:K)
131 | // log_lik[t] = log_lik[t] + gamma_tk[t,j]*alpha_tk[t,j];
132 | //}
133 | log_lik[t] = log(log_lik[t]);
134 | }
135 |
136 | } // Forwards-backwards
137 |
138 | { // Viterbi algorithm
139 | array[T, K] int a_tk; // backpointer to the most likely previous state on the most probable path
140 | array[T, K] real delta_tk; // max prob for the seq up to t
141 | // with final output from state k for time t
142 | if(est_sigma == 1) {
143 | for (j in 1:K)
144 | delta_tk[1, K] = normal_lpdf(x_t[1] | mu_k[j], sigma_k[j]);
145 | } else {
146 | for (j in 1:K)
147 | delta_tk[1, K] = normal_lpdf(x_t[1] | mu_k[j], sigma_t[1]);
148 | }
149 |
150 | for (t in 2:T) {
151 | for (j in 1:K) { // j = current (t)
152 | delta_tk[t, j] = negative_infinity();
153 | for (i in 1:K) { // i = previous (t-1)
154 | real logp;
155 | if(est_sigma == 1) {
156 | logp = delta_tk[t-1, i] + log(A_ij[i, j]) + normal_lpdf(x_t[t] | mu_k[j], sigma_k[j]);
157 | } else {
158 | logp = delta_tk[t-1, i] + log(A_ij[i, j]) + normal_lpdf(x_t[t] | mu_k[j], sigma_t[t]);
159 | }
160 | if (logp > delta_tk[t, j]) {
161 | a_tk[t, j] = i;
162 | delta_tk[t, j] = logp;
163 | }
164 | }
165 | }
166 | }
167 |
168 | logp_zstar_t = max(delta_tk[T]);
169 |
170 | for (j in 1:K)
171 | if (delta_tk[T, j] == logp_zstar_t)
172 | zstar_t[T] = j;
173 |
174 | for (t in 1:(T - 1)) {
175 | zstar_t[T - t] = a_tk[T - t + 1, zstar_t[T - t + 1]];
176 | }
177 | }
178 | }
179 |
--------------------------------------------------------------------------------
/inst/stan/include/license.stan:
--------------------------------------------------------------------------------
1 | /*
2 | path is free software: you can redistribute it and/or modify
3 | it under the terms of the GNU General Public License as published by
4 | the Free Software Foundation, either version 3 of the License, or
5 | (at your option) any later version.
6 |
7 | path is distributed in the hope that it will be useful,
8 | but WITHOUT ANY WARRANTY; without even the implied warranty of
9 | MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
10 | GNU General Public License for more details.
11 |
12 | You should have received a copy of the GNU General Public License
13 | along with path. If not, see .
14 | */
15 |
--------------------------------------------------------------------------------
/inst/stan/regime_1.stan:
--------------------------------------------------------------------------------
1 | data {
2 | int T; // number of observations (length)
3 | int K; // number of hidden states
4 | array[T] real x_t; // observations
5 | int est_sigma; // flag, whether to estimate sigma (1) or use values passed in (0)
6 | array[T] real sigma_t; // estimated sigma for each observation
7 | }
8 | parameters {
9 | real mu_k; // observation means
10 | real sigma_k; // observation standard deviations, optionally estimated if est_sigma == 1. Can the quantity K * est_sigma be used to dimension sigma_k?
11 | }
12 | transformed parameters {
13 | array[T] real sigmas;
14 | if(est_sigma == 1) {
15 | for(i in 1:T) sigmas[i] = sigma_k;
16 | } else {
17 | for(i in 1:T) sigmas[i] = sigma_t[i];
18 | }
19 | }
20 | model {
21 | mu_k ~ student_t(3, 0, 3);
22 | sigma_k ~ student_t(3, 0, 1);
23 |
24 | x_t ~ normal(mu_k, sigmas);
25 | }
26 | generated quantities {
27 | vector[T] log_lik;
28 | //regresssion example in loo() package
29 | for (n in 1:T) {
30 | log_lik[n] = normal_lpdf(x_t[n] | mu_k, sigmas[n]);
31 | }
32 | }
33 |
--------------------------------------------------------------------------------
/man/bayesdfa-package.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/bayesdfa-package.R
3 | \name{bayesdfa-package}
4 | \alias{bayesdfa-package}
5 | \alias{bayesdfa}
6 | \title{The 'bayesdfa' package.}
7 | \description{
8 | A DESCRIPTION OF THE PACKAGE
9 | }
10 | \references{
11 | Stan Development Team (2020). RStan: the R interface to Stan. R package version 2.21.2. https://mc-stan.org
12 | }
13 |
--------------------------------------------------------------------------------
/man/dfa_cv.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/dfa_cv.R
3 | \name{dfa_cv}
4 | \alias{dfa_cv}
5 | \title{Apply cross validation to DFA model}
6 | \usage{
7 | dfa_cv(
8 | stanfit,
9 | cv_method = c("loocv", "lfocv"),
10 | fold_ids = NULL,
11 | n_folds = 10,
12 | estimation = c("sampling", "optimizing", "vb"),
13 | iter = 2000,
14 | chains = 4,
15 | thin = 1,
16 | ...
17 | )
18 | }
19 | \arguments{
20 | \item{stanfit}{A stanfit object, to preserve the model structure from a call to fit_dfa()}
21 |
22 | \item{cv_method}{The method used for cross validation. The options are 'loocv', where time is ignored and each data point is
23 | assigned randomly to a fold. The method 'ltocv' is leave time out cross validation, and time slices are iteratively held out
24 | out. Finally the method 'lfocv' implements leave future out cross validation to do one-step ahead predictions.}
25 |
26 | \item{fold_ids}{A vector whose length is the same as the number of total data points. Elements are the fold id of each data point. If not all data points are
27 | used (e.g. the lfocv or ltocv approach might only use 10 time steps) the value can be something other than a numbber,
28 | e.g. NA}
29 |
30 | \item{n_folds}{Number of folds, defaults to 10}
31 |
32 | \item{estimation}{Character string. Should the model be sampled using \code{\link[rstan:stanmodel-method-sampling]{rstan::sampling()}} ("sampling",default),
33 | \code{\link[rstan:stanmodel-method-optimizing]{rstan::optimizing()}} ("optimizing"), variational inference \code{\link[rstan:stanmodel-method-vb]{rstan::vb()}} ("vb").}
34 |
35 | \item{iter}{Number of iterations in Stan sampling, defaults to 2000.}
36 |
37 | \item{chains}{Number of chains in Stan sampling, defaults to 4.}
38 |
39 | \item{thin}{Thinning rate in Stan sampling, defaults to 1.}
40 |
41 | \item{...}{Any other arguments to pass to \code{\link[rstan:stanmodel-method-sampling]{rstan::sampling()}}.}
42 | }
43 | \description{
44 | Apply cross validation to DFA model
45 | }
46 | \examples{
47 | \dontrun{
48 | set.seed(42)
49 | s <- sim_dfa(num_trends = 1, num_years = 20, num_ts = 3)
50 | obs <- c(s$y_sim[1, ], s$y_sim[2, ], s$y_sim[3, ])
51 | long <- data.frame("obs" = obs, "ts" = sort(rep(1:3, 20)),
52 | "time" = rep(1:20, 3))
53 | m <- fit_dfa(y = long, data_shape = "long", estimation="none")
54 | # random folds
55 | fit_cv <- dfa_cv(m, cv_method = "loocv", n_folds = 5, iter = 50,
56 | chains = 1, estimation="sampling")
57 |
58 | # folds can also be passed in
59 | fold_ids <- sample(1:5, size = nrow(long), replace = TRUE)
60 | m <- fit_dfa(y = long, data_shape = "long", estimation="none")
61 | fit_cv <- dfa_cv(m, cv_method = "loocv", n_folds = 5, iter = 50, chains = 1,
62 | fold_ids = fold_ids, estimation="sampling")
63 |
64 | # do an example of leave-time-out cross validation where years are dropped
65 | fold_ids <- long$time
66 | m <- fit_dfa(y = long, data_shape = "long", estimation="none")
67 | fit_cv <- dfa_cv(m, cv_method = "loocv", iter = 100, chains = 1,
68 | fold_ids = fold_ids)
69 |
70 | # example with covariates and long format data
71 | obs_covar <- expand.grid("time" = 1:20, "timeseries" = 1:3,
72 | "covariate" = 1:2)
73 | obs_covar$value <- rnorm(nrow(obs_covar), 0, 0.1)
74 | obs <- c(s$y_sim[1, ], s$y_sim[2, ], s$y_sim[3, ])
75 | m <- fit_dfa(y = long, obs_covar = obs_covar,
76 | data_shape = "long", estimation="none")
77 | fit_cv <- dfa_cv(m, cv_method = "loocv", n_folds = 5,
78 | iter = 50, chains = 1, estimation="sampling")
79 | }
80 |
81 | }
82 |
--------------------------------------------------------------------------------
/man/dfa_fitted.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/dfa_fitted.R
3 | \name{dfa_fitted}
4 | \alias{dfa_fitted}
5 | \title{Get the fitted values from a DFA as a data frame}
6 | \usage{
7 | dfa_fitted(modelfit, conf_level = 0.95, names = NULL)
8 | }
9 | \arguments{
10 | \item{modelfit}{Output from \code{\link{fit_dfa}}.}
11 |
12 | \item{conf_level}{Probability level for CI.}
13 |
14 | \item{names}{Optional vector of names for time series labels. Should be same length as the number of time series.}
15 | }
16 | \value{
17 | A data frame with the following columns: \code{ID} is an identifier for each time series, \code{time} is the time step, \code{y} is the observed values standardized to mean 0 and unit variance, \code{estimate} is the mean fitted value, \code{lower} is the lower CI, and \code{upper} is the upper CI.
18 | }
19 | \description{
20 | Get the fitted values from a DFA as a data frame
21 | }
22 | \examples{
23 | \donttest{
24 | y <- sim_dfa(num_trends = 2, num_years = 20, num_ts = 4)
25 | m <- fit_dfa(y = y$y_sim, num_trends = 2, iter = 50, chains = 1)
26 | fitted <- dfa_fitted(m)
27 | }
28 | }
29 | \seealso{
30 | predicted plot_fitted fit_dfa
31 | }
32 |
--------------------------------------------------------------------------------
/man/dfa_loadings.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/dfa_loadings.R
3 | \name{dfa_loadings}
4 | \alias{dfa_loadings}
5 | \title{Get the loadings from a DFA as a data frame}
6 | \usage{
7 | dfa_loadings(rotated_modelfit, names = NULL, summary = TRUE, conf_level = 0.95)
8 | }
9 | \arguments{
10 | \item{rotated_modelfit}{Output from \code{\link{rotate_trends}}.}
11 |
12 | \item{names}{An optional vector of names for plotting the loadings.}
13 |
14 | \item{summary}{Logical. Should the full posterior densities be returned? Defaults to \code{TRUE}.}
15 |
16 | \item{conf_level}{Confidence level for credible intervals. Defaults to 0.95.}
17 | }
18 | \value{
19 | A data frame with the following columns:
20 | \code{name} is an identifier for each loading, \code{trend} is the trend for the
21 | loading, \code{median} is the posterior median loading, \code{lower} is the lower CI,
22 | \code{upper} is the upper CI, and \code{prob_diff0} is the probability the loading is
23 | different than 0. When \code{summary = FALSE}, there is no \code{lower} or \code{upper}
24 | columns and instead there are columns \code{chain} and \code{draw}.
25 | }
26 | \description{
27 | Get the loadings from a DFA as a data frame
28 | }
29 | \examples{
30 | set.seed(42)
31 | s <- sim_dfa(num_trends = 2, num_ts = 4, num_years = 10)
32 | # only 1 chain and 180 iterations used so example runs quickly:
33 | m <- fit_dfa(y = s$y_sim, num_trends = 2, iter = 50, chains = 1)
34 | r <- rotate_trends(m)
35 | loadings <- dfa_loadings(r, summary = TRUE)
36 | loadings <- dfa_loadings(r, summary = FALSE)
37 | }
38 | \seealso{
39 | plot_loadings fit_dfa rotate_trends
40 | }
41 |
--------------------------------------------------------------------------------
/man/dfa_trends.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/dfa_trends.R
3 | \name{dfa_trends}
4 | \alias{dfa_trends}
5 | \title{Get the trends from a DFA as a data frame}
6 | \usage{
7 | dfa_trends(rotated_modelfit, years = NULL)
8 | }
9 | \arguments{
10 | \item{rotated_modelfit}{Output from \code{\link{rotate_trends}}.}
11 |
12 | \item{years}{Optional numeric vector of years.}
13 | }
14 | \value{
15 | A data frame with the following columns: \code{time} is the time step, \code{trend_number} is an identifier for each trend, \code{estimate} is the trend mean, \code{lower} is the lower CI, and \code{upper} is the upper CI.
16 | }
17 | \description{
18 | Get the trends from a DFA as a data frame
19 | }
20 | \examples{
21 | set.seed(1)
22 | s <- sim_dfa(num_trends = 1)
23 | m <- fit_dfa(y = s$y_sim, num_trends = 1, iter = 50, chains = 1)
24 | r <- rotate_trends(m)
25 | trends <- dfa_trends(r)
26 | }
27 | \seealso{
28 | plot_trends fit_dfa rotate_trends
29 | }
30 |
--------------------------------------------------------------------------------
/man/find_dfa_trends.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/find_dfa_trends.R
3 | \name{find_dfa_trends}
4 | \alias{find_dfa_trends}
5 | \title{Find the best number of trends according to LOOIC}
6 | \usage{
7 | find_dfa_trends(
8 | y = y,
9 | kmin = 1,
10 | kmax = 5,
11 | iter = 2000,
12 | thin = 1,
13 | compare_normal = FALSE,
14 | convergence_threshold = 1.05,
15 | variance = c("equal", "unequal"),
16 | ...
17 | )
18 | }
19 | \arguments{
20 | \item{y}{A matrix of data to fit. Columns represent time element.}
21 |
22 | \item{kmin}{Minimum number of trends, defaults to 1.}
23 |
24 | \item{kmax}{Maximum number of trends, defaults to 5.}
25 |
26 | \item{iter}{Iterations when sampling from each Stan model, defaults to 2000.}
27 |
28 | \item{thin}{Thinning rate when sampling from each Stan model, defaults to 1.}
29 |
30 | \item{compare_normal}{If \code{TRUE}, does model selection comparison of Normal vs.
31 | Student-t errors}
32 |
33 | \item{convergence_threshold}{The maximum allowed value of Rhat to determine
34 | convergence of parameters}
35 |
36 | \item{variance}{Vector of variance arguments for searching over large groups
37 | of models. Can be either or both of ("equal","unequal")}
38 |
39 | \item{...}{Other arguments to pass to \code{fit_dfa()}}
40 | }
41 | \description{
42 | Fit a DFA with different number of trends and return the leave one out (LOO)
43 | value as calculated by the \link[loo:loo-package]{loo} package.
44 | }
45 | \examples{
46 | \donttest{
47 | set.seed(42)
48 | s <- sim_dfa(num_trends = 2, num_years = 20, num_ts = 3)
49 | # only 1 chain and 180 iterations used so example runs quickly:
50 | m <- find_dfa_trends(
51 | y = s$y_sim, iter = 50,
52 | kmin = 1, kmax = 2, chains = 1, compare_normal = FALSE,
53 | variance = "equal", convergence_threshold = 1.1,
54 | control = list(adapt_delta = 0.95, max_treedepth = 20)
55 | )
56 | m$summary
57 | m$best_model
58 | }
59 | }
60 |
--------------------------------------------------------------------------------
/man/find_inverted_chains.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/invert_chains.R
3 | \name{find_inverted_chains}
4 | \alias{find_inverted_chains}
5 | \title{Find which chains to invert}
6 | \usage{
7 | find_inverted_chains(model, trend = 1, plot = FALSE)
8 | }
9 | \arguments{
10 | \item{model}{A Stan model, \code{rstanfit} object}
11 |
12 | \item{trend}{Which trend to check}
13 |
14 | \item{plot}{Logical: should a plot of the trend for each chain be made?
15 | Defaults to \code{FALSE}}
16 | }
17 | \description{
18 | Find which chains to invert by checking the sum of the squared
19 | deviations between the first chain and each other chain.
20 | }
21 | \examples{
22 | set.seed(2)
23 | s <- sim_dfa(num_trends = 2)
24 | set.seed(1)
25 | m <- fit_dfa(y = s$y_sim, num_trends = 1, iter = 30, chains = 2)
26 | # chains were already inverted, but we can redo that, as an example, with:
27 | find_inverted_chains(m$model, plot = TRUE)
28 | }
29 | \seealso{
30 | invert_chains
31 | }
32 |
--------------------------------------------------------------------------------
/man/find_regimes.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/find_regimes.R
3 | \name{find_regimes}
4 | \alias{find_regimes}
5 | \title{Fit multiple models with differing numbers of regimes to trend data}
6 | \usage{
7 | find_regimes(
8 | y,
9 | sds = NULL,
10 | min_regimes = 1,
11 | max_regimes = 3,
12 | iter = 2000,
13 | thin = 1,
14 | chains = 1,
15 | ...
16 | )
17 | }
18 | \arguments{
19 | \item{y}{Data, time series or trend from fitted DFA model.}
20 |
21 | \item{sds}{Optional time series of standard deviations of estimates. If
22 | passed in, residual variance not estimated.}
23 |
24 | \item{min_regimes}{Smallest of regimes to evaluate, defaults to 1.}
25 |
26 | \item{max_regimes}{Biggest of regimes to evaluate, defaults to 3.}
27 |
28 | \item{iter}{MCMC iterations, defaults to 2000.}
29 |
30 | \item{thin}{MCMC thinning rate, defaults to 1.}
31 |
32 | \item{chains}{MCMC chains; defaults to 1 (note that running multiple chains
33 | may result in a "label switching" problem where the regimes are identified
34 | with different IDs across chains).}
35 |
36 | \item{...}{Other parameters to pass to \code{\link[rstan:stanmodel-method-sampling]{rstan::sampling()}}.}
37 | }
38 | \description{
39 | Fit multiple models with differing numbers of regimes to trend data
40 | }
41 | \examples{
42 | data(Nile)
43 | find_regimes(log(Nile), iter = 50, chains = 1, max_regimes = 2)
44 | }
45 |
--------------------------------------------------------------------------------
/man/find_swans.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/find_swans.R
3 | \name{find_swans}
4 | \alias{find_swans}
5 | \title{Find outlying "black swan" jumps in trends}
6 | \usage{
7 | find_swans(rotated_modelfit, threshold = 0.01, plot = FALSE)
8 | }
9 | \arguments{
10 | \item{rotated_modelfit}{Output from \code{\link[=rotate_trends]{rotate_trends()}}.}
11 |
12 | \item{threshold}{A probability threshold below which to flag trend events as
13 | extreme}
14 |
15 | \item{plot}{Logical: should a plot be made?}
16 | }
17 | \value{
18 | Prints a ggplot2 plot if \code{plot = TRUE}; returns a data frame indicating the
19 | probability that any given point in time represents a "black swan" event
20 | invisibly.
21 | }
22 | \description{
23 | Find outlying "black swan" jumps in trends
24 | }
25 | \examples{
26 | set.seed(1)
27 | s <- sim_dfa(num_trends = 1, num_ts = 3, num_years = 30)
28 | s$y_sim[1, 15] <- s$y_sim[1, 15] - 6
29 | plot(s$y_sim[1, ], type = "o")
30 | abline(v = 15, col = "red")
31 | # only 1 chain and 250 iterations used so example runs quickly:
32 | m <- fit_dfa(y = s$y_sim, num_trends = 1, iter = 50, chains = 1, nu_fixed = 2)
33 | r <- rotate_trends(m)
34 | p <- plot_trends(r) #+ geom_vline(xintercept = 15, colour = "red")
35 | print(p)
36 | # a 1 in 1000 probability if was from a normal distribution:
37 | find_swans(r, plot = TRUE, threshold = 0.001)
38 | }
39 | \references{
40 | Anderson, S.C., Branch, T.A., Cooper, A.B., and Dulvy, N.K. 2017.
41 | Black-swan events in animal populations. Proceedings of the National Academy
42 | of Sciences 114(12): 3252–3257. https://doi.org/10.1073/pnas.1611525114
43 | }
44 |
--------------------------------------------------------------------------------
/man/fit_regimes.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/fit_regimes.R
3 | \name{fit_regimes}
4 | \alias{fit_regimes}
5 | \title{Fit models with differing numbers of regimes to trend data}
6 | \usage{
7 | fit_regimes(
8 | y,
9 | sds = NULL,
10 | n_regimes = 2,
11 | iter = 2000,
12 | thin = 1,
13 | chains = 1,
14 | ...
15 | )
16 | }
17 | \arguments{
18 | \item{y}{Data, time series or trend from fitted DFA model.}
19 |
20 | \item{sds}{Optional time series of standard deviations of estimates.
21 | If passed in, residual variance not estimated. Defaults to \code{NULL}.}
22 |
23 | \item{n_regimes}{Number of regimes to evaluate, defaults 2}
24 |
25 | \item{iter}{MCMC iterations, defaults to 2000.}
26 |
27 | \item{thin}{MCMC thinning rate, defaults to 1.}
28 |
29 | \item{chains}{MCMC chains, defaults to 1 (note that running multiple chains
30 | may result in a label switching problem where the regimes are identified
31 | with different IDs across chains).}
32 |
33 | \item{...}{Other parameters to pass to \code{\link[rstan:stanmodel-method-sampling]{rstan::sampling()}}.}
34 | }
35 | \description{
36 | Fit models with differing numbers of regimes to trend data
37 | }
38 | \examples{
39 | data(Nile)
40 | fit_regimes(log(Nile), iter = 50, n_regimes = 1)
41 | }
42 |
--------------------------------------------------------------------------------
/man/hmm_init.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/hmm_init.R
3 | \name{hmm_init}
4 | \alias{hmm_init}
5 | \title{Create initial values for the HMM model.}
6 | \usage{
7 | hmm_init(K, x_t)
8 | }
9 | \arguments{
10 | \item{K}{The number of regimes or clusters to fit. Called by \code{\link[rstan:stanmodel-method-sampling]{rstan::sampling()}}.}
11 |
12 | \item{x_t}{A matrix of values. Called by \code{\link[rstan:stanmodel-method-sampling]{rstan::sampling()}}.}
13 | }
14 | \value{
15 | list of initial values (mu, sigma)
16 | }
17 | \description{
18 | Create initial values for the HMM model.
19 | }
20 |
--------------------------------------------------------------------------------
/man/invert_chains.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/invert_chains.R
3 | \name{invert_chains}
4 | \alias{invert_chains}
5 | \title{Invert chains}
6 | \usage{
7 | invert_chains(model, trends = 1, print = FALSE, ...)
8 | }
9 | \arguments{
10 | \item{model}{A Stan model, rstanfit object}
11 |
12 | \item{trends}{The number of trends in the DFA, defaults to 1}
13 |
14 | \item{print}{Logical indicating whether the summary should be printed.
15 | Defaults to \code{FALSE}.}
16 |
17 | \item{...}{Other arguments to pass to \code{\link[=find_inverted_chains]{find_inverted_chains()}}.}
18 | }
19 | \description{
20 | Invert chains
21 | }
22 | \seealso{
23 | find_inverted_chains
24 | }
25 |
--------------------------------------------------------------------------------
/man/is_converged.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/converge_rhat.R
3 | \name{is_converged}
4 | \alias{is_converged}
5 | \title{Summarize Rhat convergence statistics across parameters}
6 | \usage{
7 | is_converged(fitted_model, threshold = 1.05, parameters = c("sigma", "x", "Z"))
8 | }
9 | \arguments{
10 | \item{fitted_model}{Samples extracted (with \code{permuted = FALSE}) from a Stan
11 | model. E.g. output from \code{\link[=invert_chains]{invert_chains()}}.}
12 |
13 | \item{threshold}{Threshold for maximum Rhat.}
14 |
15 | \item{parameters}{Vector of parameters to be included in convergence determination. Defaults = c("sigma","x","Z"). Other elements can be added including "pred", "log_lik", or "lp__"}
16 | }
17 | \description{
18 | Pass in \code{rstanfit} model object, and a threshold Rhat value for
19 | convergence. Returns boolean.
20 | }
21 |
--------------------------------------------------------------------------------
/man/loo.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/loo.R
3 | \name{loo.bayesdfa}
4 | \alias{loo.bayesdfa}
5 | \alias{loo}
6 | \title{LOO information criteria}
7 | \usage{
8 | \method{loo}{bayesdfa}(x, ...)
9 | }
10 | \arguments{
11 | \item{x}{Output from \code{\link[=fit_dfa]{fit_dfa()}}.}
12 |
13 | \item{...}{Arguments for \code{\link[loo:relative_eff]{loo::relative_eff()}} and \code{\link[loo:loo]{loo::loo.array()}}.}
14 | }
15 | \description{
16 | Extract the LOOIC (leave-one-out information criterion) using
17 | \code{\link[loo:loo]{loo::loo()}}. Note that we've implemented slightly different variants
18 | of loo, based on whether the DFA observation model includes correlation
19 | between time series or not (default is no correlation). Importantly,
20 | these different versions are not directly comparable to evaluate data support
21 | for including correlation or not in a DFA. If time series are not correlated,
22 | the point-wise log-likelihood for each observation is calculated and used
23 | in the loo calculations. However if time series are correlated, then each
24 | time slice is assumed to be a joint observation of
25 | all variables, and the point-wise log-likelihood is calculated as the
26 | joint likelihood of all variables under the multivariate normal distribution.
27 | }
28 | \examples{
29 | \donttest{
30 | set.seed(1)
31 | s <- sim_dfa(num_trends = 1, num_years = 20, num_ts = 3)
32 | m <- fit_dfa(y = s$y_sim, iter = 50, chains = 1, num_trends = 1)
33 | loo(m)
34 | }
35 | }
36 |
--------------------------------------------------------------------------------
/man/plot_fitted.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/plot_fitted.R
3 | \name{plot_fitted}
4 | \alias{plot_fitted}
5 | \title{Plot the fitted values from a DFA}
6 | \usage{
7 | plot_fitted(
8 | modelfit,
9 | conf_level = 0.95,
10 | names = NULL,
11 | spaghetti = FALSE,
12 | time_labels = NULL
13 | )
14 | }
15 | \arguments{
16 | \item{modelfit}{Output from \code{\link{fit_dfa}}, a rstanfit object}
17 |
18 | \item{conf_level}{Probability level for CI.}
19 |
20 | \item{names}{Optional vector of names for plotting labels TODO. Should be same length as the number of time series}
21 |
22 | \item{spaghetti}{Defaults to FALSE, but if TRUE puts all raw time series (grey) and fitted values on a single plot}
23 |
24 | \item{time_labels}{Optional vector of time labels for plotting, same length as number of time steps}
25 | }
26 | \description{
27 | Plot the fitted values from a DFA
28 | }
29 | \examples{
30 | \donttest{
31 | y <- sim_dfa(num_trends = 2, num_years = 20, num_ts = 4)
32 | m <- fit_dfa(y = y$y_sim, num_trends = 2, iter = 50, chains = 1)
33 | p <- plot_fitted(m)
34 | print(p)
35 |
36 | p <- plot_fitted(m, spaghetti = TRUE)
37 | print(p)
38 | }
39 | }
40 | \seealso{
41 | plot_loadings fit_dfa rotate_trends dfa_fitted
42 | }
43 |
--------------------------------------------------------------------------------
/man/plot_loadings.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/plot_loadings.R
3 | \name{plot_loadings}
4 | \alias{plot_loadings}
5 | \title{Plot the loadings from a DFA}
6 | \usage{
7 | plot_loadings(
8 | rotated_modelfit,
9 | names = NULL,
10 | facet = TRUE,
11 | violin = TRUE,
12 | conf_level = 0.95,
13 | threshold = NULL
14 | )
15 | }
16 | \arguments{
17 | \item{rotated_modelfit}{Output from \code{\link[=rotate_trends]{rotate_trends()}}.}
18 |
19 | \item{names}{An optional vector of names for plotting the loadings.}
20 |
21 | \item{facet}{Logical. Should there be a separate facet for each trend?
22 | Defaults to \code{TRUE}.}
23 |
24 | \item{violin}{Logical. Should the full posterior densities be shown as a
25 | violin plot? Defaults to \code{TRUE}.}
26 |
27 | \item{conf_level}{Confidence level for credible intervals. Defaults to 0.95.}
28 |
29 | \item{threshold}{Numeric (0-1). Optional for plots, if included, only plot
30 | loadings who have Pr(<0) or Pr(>0) > threshold. For example \code{threshold = 0.8}
31 | would only display estimates where 80\% of posterior density was above/below
32 | zero. Defaults to \code{NULL} (not used).}
33 | }
34 | \description{
35 | Plot the loadings from a DFA
36 | }
37 | \examples{
38 | set.seed(42)
39 | s <- sim_dfa(num_trends = 2, num_ts = 4, num_years = 10)
40 | # only 1 chain and 180 iterations used so example runs quickly:
41 | m <- fit_dfa(y = s$y_sim, num_trends = 2, iter = 50, chains = 1)
42 | r <- rotate_trends(m)
43 | plot_loadings(r, violin = FALSE, facet = TRUE)
44 | plot_loadings(r, violin = FALSE, facet = FALSE)
45 | plot_loadings(r, violin = TRUE, facet = FALSE)
46 | plot_loadings(r, violin = TRUE, facet = TRUE)
47 | }
48 | \seealso{
49 | plot_trends fit_dfa rotate_trends
50 | }
51 |
--------------------------------------------------------------------------------
/man/plot_regime_model.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/plot_regime_model.R
3 | \name{plot_regime_model}
4 | \alias{plot_regime_model}
5 | \title{Plot the state probabilities from \code{\link[=find_regimes]{find_regimes()}}}
6 | \usage{
7 | plot_regime_model(
8 | model,
9 | probs = c(0.05, 0.95),
10 | type = c("probability", "means"),
11 | regime_prob_threshold = 0.9,
12 | plot_prob_indices = NULL,
13 | flip_regimes = FALSE
14 | )
15 | }
16 | \arguments{
17 | \item{model}{A model returned by \code{\link[=find_regimes]{find_regimes()}}.}
18 |
19 | \item{probs}{A numeric vector of quantiles to plot the credible intervals at.
20 | Defaults to \code{c(0.05, 0.95)}.}
21 |
22 | \item{type}{Whether to plot the probabilities (default) or means.}
23 |
24 | \item{regime_prob_threshold}{The probability density that must be above 0.5.
25 | Defaults to 0.9 before we classify a regime (only affects \code{"means"} plot).}
26 |
27 | \item{plot_prob_indices}{Optional indices of probability plots to plot.
28 | Defaults to showing all.}
29 |
30 | \item{flip_regimes}{Optional whether to flip regimes in plots, defaults to FALSE}
31 | }
32 | \description{
33 | Plot the state probabilities from \code{\link[=find_regimes]{find_regimes()}}
34 | }
35 | \details{
36 | Note that the original timeseries data (dots) are shown scaled
37 | between 0 and 1.
38 | }
39 | \examples{
40 | \donttest{
41 | data(Nile)
42 | m <- fit_regimes(log(Nile), n_regimes = 2, chains = 1, iter = 50)
43 | plot_regime_model(m)
44 | plot_regime_model(m, plot_prob_indices = c(2))
45 | plot_regime_model(m, type = "means")
46 | }
47 |
48 | }
49 |
--------------------------------------------------------------------------------
/man/plot_trends.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/plot_trends.R
3 | \name{plot_trends}
4 | \alias{plot_trends}
5 | \title{Plot the trends from a DFA}
6 | \usage{
7 | plot_trends(
8 | rotated_modelfit,
9 | years = NULL,
10 | highlight_outliers = FALSE,
11 | threshold = 0.01
12 | )
13 | }
14 | \arguments{
15 | \item{rotated_modelfit}{Output from \code{\link{rotate_trends}}}
16 |
17 | \item{years}{Optional numeric vector of years for the plot}
18 |
19 | \item{highlight_outliers}{Logical. Should trend events
20 | that exceed the probability of occurring with a normal distribution as
21 | defined by \code{threshold} be highlighted? Defaults to FALSE}
22 |
23 | \item{threshold}{A probability threshold below which to
24 | flag trend events as extreme. Defaults to 0.01}
25 | }
26 | \description{
27 | Plot the trends from a DFA
28 | }
29 | \examples{
30 | set.seed(1)
31 | s <- sim_dfa(num_trends = 1)
32 | m <- fit_dfa(y = s$y_sim, num_trends = 1, iter = 50, chains = 1)
33 | r <- rotate_trends(m)
34 | p <- plot_trends(r)
35 | print(p)
36 | }
37 | \seealso{
38 | dfa_trends plot_loadings fit_dfa rotate_trends
39 | }
40 |
--------------------------------------------------------------------------------
/man/predicted.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/predicted.R
3 | \name{predicted}
4 | \alias{predicted}
5 | \title{Calculate predicted value from DFA object}
6 | \usage{
7 | predicted(fitted_model)
8 | }
9 | \arguments{
10 | \item{fitted_model}{Samples extracted (with \code{permuted = FALSE}) from a Stan
11 | model. E.g. output from \code{\link[=invert_chains]{invert_chains()}}.}
12 | }
13 | \description{
14 | Pass in \code{rstanfit} model object. Returns array of predictions, dimensioned
15 | number of MCMC draws x number of MCMC chains x time series length x number of time series
16 | }
17 | \examples{
18 | \dontrun{
19 | set.seed(42)
20 | s <- sim_dfa(num_trends = 1, num_years = 20, num_ts = 3)
21 | # only 1 chain and 1000 iterations used so example runs quickly:
22 | m <- fit_dfa(y = s$y_sim, iter = 2000, chains = 3, num_trends = 1)
23 | pred <- predicted(m)
24 | }
25 | }
26 |
--------------------------------------------------------------------------------
/man/rotate_trends.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/rotate_trends.R
3 | \name{rotate_trends}
4 | \alias{rotate_trends}
5 | \title{Rotate the trends from a DFA}
6 | \usage{
7 | rotate_trends(fitted_model, conf_level = 0.95, invert = FALSE)
8 | }
9 | \arguments{
10 | \item{fitted_model}{Output from \code{\link[=fit_dfa]{fit_dfa()}}.}
11 |
12 | \item{conf_level}{Probability level for CI.}
13 |
14 | \item{invert}{Whether to invert the trends and loadings for plotting purposes}
15 | }
16 | \description{
17 | Rotate the trends from a DFA
18 | }
19 | \examples{
20 | set.seed(42)
21 | s <- sim_dfa(num_trends = 1, num_years = 20, num_ts = 3)
22 | # only 1 chain and 800 iterations used so example runs quickly:
23 | m <- fit_dfa(y = s$y_sim, iter = 50, chains = 1)
24 | r <- rotate_trends(m)
25 | plot_trends(r)
26 | }
27 |
--------------------------------------------------------------------------------
/man/sim_dfa.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/sim.R
3 | \name{sim_dfa}
4 | \alias{sim_dfa}
5 | \title{Simulate from a DFA}
6 | \usage{
7 | sim_dfa(
8 | num_trends = 1,
9 | num_years = 20,
10 | num_ts = 4,
11 | loadings_matrix = matrix(nrow = num_ts, ncol = num_trends, rnorm(num_ts * num_trends,
12 | 0, 1)),
13 | sigma = rlnorm(1, meanlog = log(0.2), 0.1),
14 | varIndx = rep(1, num_ts),
15 | trend_model = c("rw", "bs"),
16 | spline_weights = matrix(ncol = 6, nrow = num_trends, data = rnorm(6 * num_trends)),
17 | extreme_value = NULL,
18 | extreme_loc = NULL,
19 | nu_fixed = 100,
20 | user_supplied_deviations = NULL
21 | )
22 | }
23 | \arguments{
24 | \item{num_trends}{The number of trends.}
25 |
26 | \item{num_years}{The number of years.}
27 |
28 | \item{num_ts}{The number of timeseries.}
29 |
30 | \item{loadings_matrix}{A loadings matrix. The number of rows should match the
31 | number of timeseries and the number of columns should match the number of
32 | trends. Note that this loadings matrix will be internally manipulated by
33 | setting some elements to 0 and constraining some elements to 1 so that the
34 | model can be fitted. See \code{\link[=fit_dfa]{fit_dfa()}}. See the outfit element \code{Z} in
35 | the returned list is to see the manipulated loadings matrix. If not
36 | specified, a random matrix \code{~ N(0, 1)} is used.}
37 |
38 | \item{sigma}{A vector of standard deviations on the observation error. Should
39 | be of the same length as the number of trends. If not specified, random
40 | numbers are used \code{rlnorm(1, meanlog = log(0.2), 0.1)}.}
41 |
42 | \item{varIndx}{Indices of unique observation variances. Defaults to \code{c(1, 1, 1, 1)}. Unique observation error variances would be specified as \code{c(1, 2, 3, 4)} in the case of 4 time series.}
43 |
44 | \item{trend_model}{The type of trend model. Random walk (\code{"rw"}) or basis
45 | spline (\code{"bs"})}
46 |
47 | \item{spline_weights}{A matrix of basis function weights that is used
48 | if \code{trend_model = "bs"}. The number of columns should correspond to
49 | the number of knots and the number of rows should correspond to the
50 | number of trends.}
51 |
52 | \item{extreme_value}{Value added to the random walk in the extreme time step.
53 | Defaults to not included.}
54 |
55 | \item{extreme_loc}{Location of single extreme event in the process. The same
56 | for all processes, and defaults to \code{round(n_t/2)} where \code{n_t} is the time
57 | series length}
58 |
59 | \item{nu_fixed}{Nu is the degrees of freedom parameter for the
60 | t-distribution, defaults to 100, which is effectively normal.}
61 |
62 | \item{user_supplied_deviations}{An optional matrix of deviations for the trend
63 | random walks. Columns are for trends and rows are for each time step.}
64 | }
65 | \value{
66 | A list with the following elements: \code{y_sim} is the simulated data,
67 | pred is the true underlying data without observation error added, \code{x} is
68 | the underlying trends, \code{Z} is the manipulated loadings matrix that is fed
69 | to the model.
70 | }
71 | \description{
72 | Simulate from a DFA
73 | }
74 | \examples{
75 | x <- sim_dfa(num_trends = 2)
76 | names(x)
77 | matplot(t(x$y_sim), type = "l")
78 | matplot(t(x$x), type = "l")
79 |
80 | set.seed(42)
81 | x <- sim_dfa(extreme_value = -4, extreme_loc = 10)
82 | matplot(t(x$x), type = "l")
83 | abline(v = 10)
84 | matplot(t(x$pred), type = "l")
85 | abline(v = 10)
86 |
87 | set.seed(42)
88 | x <- sim_dfa()
89 | matplot(t(x$x), type = "l")
90 | abline(v = 10)
91 | matplot(t(x$pred), type = "l")
92 | abline(v = 10)
93 | }
94 |
--------------------------------------------------------------------------------
/man/trend_cor.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/trend_cor.R
3 | \name{trend_cor}
4 | \alias{trend_cor}
5 | \title{Estimate the correlation between a DFA trend and some other timeseries}
6 | \usage{
7 | trend_cor(
8 | rotated_modelfit,
9 | y,
10 | trend = 1,
11 | time_window = seq_len(length(y)),
12 | trend_samples = 100,
13 | stan_iter = 300,
14 | stan_chains = 1,
15 | ...
16 | )
17 | }
18 | \arguments{
19 | \item{rotated_modelfit}{Output from \code{\link[=rotate_trends]{rotate_trends()}}.}
20 |
21 | \item{y}{A numeric vector to correlate with the DFA trend. Must be the same
22 | length as the DFA trend.}
23 |
24 | \item{trend}{A number corresponding to which trend to use, defaults to 1.}
25 |
26 | \item{time_window}{Indices indicating a time window slice to use in the
27 | correlation. Defaults to using the entire time window. Can be used to walk
28 | through the timeseries and test the cross correlations.}
29 |
30 | \item{trend_samples}{The number of samples from the trend posterior to use. A
31 | model will be run for each trend sample so this value shouldn't be too
32 | large. Defaults to 100.}
33 |
34 | \item{stan_iter}{The number of samples from the posterior with each Stan
35 | model run, defaults to 300.}
36 |
37 | \item{stan_chains}{The number of chains for each Stan model run, defaults to
38 | 1.}
39 |
40 | \item{...}{Other arguments to pass to \code{\link[rstan]{sampling}}}
41 | }
42 | \value{
43 | A numeric vector of samples from the correlation coefficient
44 | posterior.
45 | }
46 | \description{
47 | Fully incorporates the uncertainty from the posterior of the DFA trend
48 | }
49 | \details{
50 | Uses a \code{sigma ~ half_t(3, 0, 2)} prior on the residual standard
51 | deviation and a \code{uniform(-1, 1)} prior on the correlation coefficient.
52 | Fitted as a linear regression of \code{y ~ x}, where y represents the \code{y}
53 | argument to \code{\link[=trend_cor]{trend_cor()}} and \code{x} represents the DFA trend, and both \code{y}
54 | and \code{x} have been scaled by subtracting their means and dividing by their
55 | standard deviations. Samples are drawn from the posterior of the trend and
56 | repeatedly fed through the Stan regression to come up with a combined
57 | posterior of the correlation.
58 | }
59 | \examples{
60 | set.seed(1)
61 | s <- sim_dfa(num_trends = 1, num_years = 15)
62 | m <- fit_dfa(y = s$y_sim, num_trends = 1, iter = 50, chains = 1)
63 | r <- rotate_trends(m)
64 | n_years <- ncol(r$trends[, 1, ])
65 | fake_dat <- rnorm(n_years, 0, 1)
66 | correlation <- trend_cor(r, fake_dat, trend_samples = 25)
67 | hist(correlation)
68 | correlation <- trend_cor(r,
69 | y = fake_dat, time_window = 5:15,
70 | trend_samples = 25
71 | )
72 | hist(correlation)
73 | }
74 |
--------------------------------------------------------------------------------
/pkgdown/extra.css:
--------------------------------------------------------------------------------
1 | @import url("https://nmfs-ost.github.io/nmfspalette/extra.css");
2 |
--------------------------------------------------------------------------------
/src/Makevars:
--------------------------------------------------------------------------------
1 | # Generated by rstantools. Do not edit by hand.
2 |
3 | STANHEADERS_SRC = $(shell "$(R_HOME)/bin$(R_ARCH_BIN)/Rscript" -e "message()" -e "cat(system.file('include', 'src', package = 'StanHeaders', mustWork = TRUE))" -e "message()" | grep "StanHeaders")
4 |
5 | STANC_FLAGS = $(shell "$(R_HOME)/bin$(R_ARCH_BIN)/Rscript" -e "cat(ifelse(utils::packageVersion('rstan') >= '2.26', '-DUSE_STANC3',''))")
6 | PKG_CPPFLAGS = -I"../inst/include" -I"$(STANHEADERS_SRC)" -DBOOST_DISABLE_ASSERTS -DEIGEN_NO_DEBUG -DBOOST_MATH_OVERFLOW_ERROR_POLICY=errno_on_error $(STANC_FLAGS) -D_HAS_AUTO_PTR_ETC=0
7 | PKG_CXXFLAGS = $(shell "$(R_HOME)/bin$(R_ARCH_BIN)/Rscript" -e "RcppParallel::CxxFlags()") $(shell "$(R_HOME)/bin$(R_ARCH_BIN)/Rscript" -e "StanHeaders:::CxxFlags()")
8 | PKG_LIBS = $(shell "$(R_HOME)/bin$(R_ARCH_BIN)/Rscript" -e "RcppParallel::RcppParallelLibs()") $(shell "$(R_HOME)/bin$(R_ARCH_BIN)/Rscript" -e "StanHeaders:::LdFlags()")
9 |
10 | CXX_STD = CXX17
11 |
--------------------------------------------------------------------------------
/src/Makevars.win:
--------------------------------------------------------------------------------
1 | # Generated by rstantools. Do not edit by hand.
2 |
3 | STANHEADERS_SRC = $(shell "$(R_HOME)/bin$(R_ARCH_BIN)/Rscript" -e "message()" -e "cat(system.file('include', 'src', package = 'StanHeaders', mustWork = TRUE))" -e "message()" | grep "StanHeaders")
4 |
5 | PKG_CPPFLAGS = -I"../inst/include" -I"$(STANHEADERS_SRC)" -DBOOST_DISABLE_ASSERTS -DEIGEN_NO_DEBUG -DRCPP_PARALLEL_USE_TBB=1
6 | PKG_CXXFLAGS = $(shell "$(R_HOME)/bin$(R_ARCH_BIN)/Rscript" -e "RcppParallel::CxxFlags()") $(shell "$(R_HOME)/bin$(R_ARCH_BIN)/Rscript" -e "StanHeaders:::CxxFlags()")
7 | PKG_LIBS = $(shell "$(R_HOME)/bin$(R_ARCH_BIN)/Rscript" -e "RcppParallel::RcppParallelLibs()") $(shell "$(R_HOME)/bin$(R_ARCH_BIN)/Rscript" -e "StanHeaders:::LdFlags()")
8 |
9 | CXX_STD = CXX14
10 |
--------------------------------------------------------------------------------
/src/RcppExports-507982db.o.tmp:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/fate-ewi/bayesdfa/f43eb7e57b00aca083abb690697449bc57a51982/src/RcppExports-507982db.o.tmp
--------------------------------------------------------------------------------
/src/RcppExports.cpp:
--------------------------------------------------------------------------------
1 | // Generated by using Rcpp::compileAttributes() -> do not edit by hand
2 | // Generator token: 10BE3573-1514-4C36-9D1C-5A225CD40393
3 |
4 | #include
5 | #include
6 |
7 | using namespace Rcpp;
8 |
9 | #ifdef RCPP_USE_GLOBAL_ROSTREAM
10 | Rcpp::Rostream& Rcpp::Rcout = Rcpp::Rcpp_cout_get();
11 | Rcpp::Rostream& Rcpp::Rcerr = Rcpp::Rcpp_cerr_get();
12 | #endif
13 |
14 |
15 | RcppExport SEXP _rcpp_module_boot_stan_fit4corr_mod();
16 | RcppExport SEXP _rcpp_module_boot_stan_fit4dfa_mod();
17 | RcppExport SEXP _rcpp_module_boot_stan_fit4hmm_gaussian_mod();
18 | RcppExport SEXP _rcpp_module_boot_stan_fit4regime_1_mod();
19 |
20 | static const R_CallMethodDef CallEntries[] = {
21 | {"_rcpp_module_boot_stan_fit4corr_mod", (DL_FUNC) &_rcpp_module_boot_stan_fit4corr_mod, 0},
22 | {"_rcpp_module_boot_stan_fit4dfa_mod", (DL_FUNC) &_rcpp_module_boot_stan_fit4dfa_mod, 0},
23 | {"_rcpp_module_boot_stan_fit4hmm_gaussian_mod", (DL_FUNC) &_rcpp_module_boot_stan_fit4hmm_gaussian_mod, 0},
24 | {"_rcpp_module_boot_stan_fit4regime_1_mod", (DL_FUNC) &_rcpp_module_boot_stan_fit4regime_1_mod, 0},
25 | {NULL, NULL, 0}
26 | };
27 |
28 | RcppExport void R_init_bayesdfa(DllInfo *dll) {
29 | R_registerRoutines(dll, NULL, CallEntries, NULL, NULL);
30 | R_useDynamicSymbols(dll, FALSE);
31 | }
32 |
--------------------------------------------------------------------------------
/src/stanExports_corr.cc:
--------------------------------------------------------------------------------
1 | // Generated by rstantools. Do not edit by hand.
2 |
3 | #include
4 | using namespace Rcpp ;
5 | #include "stanExports_corr.h"
6 |
7 | RCPP_MODULE(stan_fit4corr_mod) {
8 |
9 |
10 | class_ >("rstantools_model_corr")
11 |
12 | .constructor()
13 |
14 |
15 | .method("call_sampler", &rstan::stan_fit ::call_sampler)
16 | .method("param_names", &rstan::stan_fit ::param_names)
17 | .method("param_names_oi", &rstan::stan_fit ::param_names_oi)
18 | .method("param_fnames_oi", &rstan::stan_fit ::param_fnames_oi)
19 | .method("param_dims", &rstan::stan_fit ::param_dims)
20 | .method("param_dims_oi", &rstan::stan_fit ::param_dims_oi)
21 | .method("update_param_oi", &rstan::stan_fit ::update_param_oi)
22 | .method("param_oi_tidx", &rstan::stan_fit ::param_oi_tidx)
23 | .method("grad_log_prob", &rstan::stan_fit ::grad_log_prob)
24 | .method("log_prob", &rstan::stan_fit ::log_prob)
25 | .method("unconstrain_pars", &rstan::stan_fit ::unconstrain_pars)
26 | .method("constrain_pars", &rstan::stan_fit ::constrain_pars)
27 | .method("num_pars_unconstrained", &rstan::stan_fit ::num_pars_unconstrained)
28 | .method("unconstrained_param_names", &rstan::stan_fit ::unconstrained_param_names)
29 | .method("constrained_param_names", &rstan::stan_fit ::constrained_param_names)
30 | .method("standalone_gqs", &rstan::stan_fit ::standalone_gqs)
31 | ;
32 | }
33 |
--------------------------------------------------------------------------------
/src/stanExports_dfa-ec272765.o.tmp:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/fate-ewi/bayesdfa/f43eb7e57b00aca083abb690697449bc57a51982/src/stanExports_dfa-ec272765.o.tmp
--------------------------------------------------------------------------------
/src/stanExports_dfa.cc:
--------------------------------------------------------------------------------
1 | // Generated by rstantools. Do not edit by hand.
2 |
3 | #include
4 | using namespace Rcpp ;
5 | #include "stanExports_dfa.h"
6 |
7 | RCPP_MODULE(stan_fit4dfa_mod) {
8 |
9 |
10 | class_ >("rstantools_model_dfa")
11 |
12 | .constructor()
13 |
14 |
15 | .method("call_sampler", &rstan::stan_fit ::call_sampler)
16 | .method("param_names", &rstan::stan_fit ::param_names)
17 | .method("param_names_oi", &rstan::stan_fit ::param_names_oi)
18 | .method("param_fnames_oi", &rstan::stan_fit ::param_fnames_oi)
19 | .method("param_dims", &rstan::stan_fit ::param_dims)
20 | .method("param_dims_oi", &rstan::stan_fit ::param_dims_oi)
21 | .method("update_param_oi", &rstan::stan_fit ::update_param_oi)
22 | .method("param_oi_tidx", &rstan::stan_fit ::param_oi_tidx)
23 | .method("grad_log_prob", &rstan::stan_fit ::grad_log_prob)
24 | .method("log_prob", &rstan::stan_fit ::log_prob)
25 | .method("unconstrain_pars", &rstan::stan_fit ::unconstrain_pars)
26 | .method("constrain_pars", &rstan::stan_fit ::constrain_pars)
27 | .method("num_pars_unconstrained", &rstan::stan_fit ::num_pars_unconstrained)
28 | .method("unconstrained_param_names", &rstan::stan_fit ::unconstrained_param_names)
29 | .method("constrained_param_names", &rstan::stan_fit ::constrained_param_names)
30 | .method("standalone_gqs", &rstan::stan_fit ::standalone_gqs)
31 | ;
32 | }
33 |
--------------------------------------------------------------------------------
/src/stanExports_hmm_gaussian.cc:
--------------------------------------------------------------------------------
1 | // Generated by rstantools. Do not edit by hand.
2 |
3 | #include
4 | using namespace Rcpp ;
5 | #include "stanExports_hmm_gaussian.h"
6 |
7 | RCPP_MODULE(stan_fit4hmm_gaussian_mod) {
8 |
9 |
10 | class_ >("rstantools_model_hmm_gaussian")
11 |
12 | .constructor()
13 |
14 |
15 | .method("call_sampler", &rstan::stan_fit ::call_sampler)
16 | .method("param_names", &rstan::stan_fit ::param_names)
17 | .method("param_names_oi", &rstan::stan_fit ::param_names_oi)
18 | .method("param_fnames_oi", &rstan::stan_fit ::param_fnames_oi)
19 | .method("param_dims", &rstan::stan_fit ::param_dims)
20 | .method("param_dims_oi", &rstan::stan_fit ::param_dims_oi)
21 | .method("update_param_oi", &rstan::stan_fit ::update_param_oi)
22 | .method("param_oi_tidx", &rstan::stan_fit ::param_oi_tidx)
23 | .method("grad_log_prob", &rstan::stan_fit ::grad_log_prob)
24 | .method("log_prob", &rstan::stan_fit ::log_prob)
25 | .method("unconstrain_pars", &rstan::stan_fit ::unconstrain_pars)
26 | .method("constrain_pars", &rstan::stan_fit ::constrain_pars)
27 | .method("num_pars_unconstrained", &rstan::stan_fit ::num_pars_unconstrained)
28 | .method("unconstrained_param_names", &rstan::stan_fit ::unconstrained_param_names)
29 | .method("constrained_param_names", &rstan::stan_fit ::constrained_param_names)
30 | .method("standalone_gqs", &rstan::stan_fit ::standalone_gqs)
31 | ;
32 | }
33 |
--------------------------------------------------------------------------------
/src/stanExports_regime_1.cc:
--------------------------------------------------------------------------------
1 | // Generated by rstantools. Do not edit by hand.
2 |
3 | #include
4 | using namespace Rcpp ;
5 | #include "stanExports_regime_1.h"
6 |
7 | RCPP_MODULE(stan_fit4regime_1_mod) {
8 |
9 |
10 | class_ >("rstantools_model_regime_1")
11 |
12 | .constructor()
13 |
14 |
15 | .method("call_sampler", &rstan::stan_fit ::call_sampler)
16 | .method("param_names", &rstan::stan_fit ::param_names)
17 | .method("param_names_oi", &rstan::stan_fit ::param_names_oi)
18 | .method("param_fnames_oi", &rstan::stan_fit ::param_fnames_oi)
19 | .method("param_dims", &rstan::stan_fit ::param_dims)
20 | .method("param_dims_oi", &rstan::stan_fit ::param_dims_oi)
21 | .method("update_param_oi", &rstan::stan_fit ::update_param_oi)
22 | .method("param_oi_tidx", &rstan::stan_fit ::param_oi_tidx)
23 | .method("grad_log_prob", &rstan::stan_fit ::grad_log_prob)
24 | .method("log_prob", &rstan::stan_fit ::log_prob)
25 | .method("unconstrain_pars", &rstan::stan_fit ::unconstrain_pars)
26 | .method("constrain_pars", &rstan::stan_fit ::constrain_pars)
27 | .method("num_pars_unconstrained", &rstan::stan_fit ::num_pars_unconstrained)
28 | .method("unconstrained_param_names", &rstan::stan_fit ::unconstrained_param_names)
29 | .method("constrained_param_names", &rstan::stan_fit ::constrained_param_names)
30 | .method("standalone_gqs", &rstan::stan_fit ::standalone_gqs)
31 | ;
32 | }
33 |
--------------------------------------------------------------------------------
/tests/testthat.R:
--------------------------------------------------------------------------------
1 | library(testthat)
2 | library(bayesdfa)
3 |
4 | test_check("bayesdfa")
5 |
--------------------------------------------------------------------------------
/tests/testthat/test-chain-flipping.R:
--------------------------------------------------------------------------------
1 | if (interactive()) options(mc.cores = parallel::detectCores())
2 |
3 | # set.seed(1)
4 | # num_trends <- 2
5 | # num_ts <- 3
6 | # num_years <- 30
7 | #
8 | #
9 | # dat <- sim_dfa(
10 | # num_trends = num_trends,
11 | # num_years = num_years,
12 | # num_ts = num_ts,
13 | # loadings_matrix = loadings_matrix,
14 | # sigma = 0.2, nu_fixed = 200)
15 | #
16 | # m2 <- fit_dfa(dat$y_sim, num_trends = num_trends, zscore = TRUE,
17 | # iter = 1000, chains = 4, seed = 1)
18 | #
19 | # x <- rotate_trends(m2)
20 | # plot_trends(x)
21 | # plot_loadings(x)
22 |
--------------------------------------------------------------------------------
/vignettes/a2_combining_data.Rmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Combining data with bayesdfa"
3 | author: "Eric J. Ward, Sean C. Anderson, Mary E. Hunsicker, Mike A. Litzow, Luis A. Damiano, Mark D. Scheuerell, Elizabeth E. Holmes, Nick Tolimieri"
4 | date: "`r Sys.Date()`"
5 | output: rmarkdown::html_vignette
6 | vignette: >
7 | %\VignetteIndexEntry{Combining data with bayesdfa}
8 | %\VignetteEngine{knitr::rmarkdown}
9 | \usepackage[utf8]{inputenc}
10 | ---
11 |
12 | For some applications using DFA, datasets may need to be combined from several data sources, and they may differ in time series length (or precision). Here we'll use some simple examples using `fit_dfa()` and to illustrate some cautionary points.
13 |
14 | ```{r set-knitr-options, cache=FALSE, echo=FALSE, warning=FALSE, message=FALSE}
15 | library("knitr")
16 | opts_chunk$set(message = FALSE, fig.width = 5.5)
17 | ```
18 |
19 | Let's load the necessary packages:
20 |
21 | ```{r, message=FALSE, warning=FALSE}
22 | library(bayesdfa)
23 | library(ggplot2)
24 | library(dplyr)
25 | library(rstan)
26 | chains = 1
27 | iter = 10
28 | #library(viridis)
29 | ```
30 |
31 | ## Example
32 |
33 | First, let's simulate some simple data with `sim_dfa()`. First, using just a 1 - trend model. This example has 3 time series, and is simulated from 1 underlying trend. The first time series doesn't load heavily on the trend (it's mostly generated via white noise) but time series 2-3 have stronger loadings on the trend.
34 |
35 | ```{r}
36 | set.seed(123)
37 | loadings = matrix(0, 3, 1)
38 | loadings[1,1] = c(0.1)
39 | loadings[2:3,1] = runif(2, 0.4,1)
40 | round(loadings,3)
41 | sim = sim_dfa(num_trends = 1, num_years = 100,
42 | num_ts = 3, loadings_matrix = loadings,
43 | sigma=0.6)
44 | ```
45 |
46 | Here we can see that time series 1 is more variable because the random component is playing a relatively larger role.
47 |
48 | ```{r echo=FALSE, warning=FALSE, message=FALSE}
49 | #id variable for position in matrix
50 | Y = as.data.frame(sim$y_sim)
51 | #as.data.frame(t(scale(t(Y))))
52 | Y$ts <- as.factor(1:nrow(Y))
53 | plot_data <- reshape2::melt(Y,id.var="ts")
54 | plot_data$x = as.numeric(substr(plot_data$variable, 2, length(plot_data$variable)))
55 | g1 = ggplot(plot_data, aes(x=x,y=value,group=ts,colour=ts)) +
56 | geom_point()+
57 | geom_line() + xlab("Time") +
58 | theme_bw()#+ scale_color_viridis(end=0.8, discrete = TRUE)
59 |
60 |
61 | g1
62 | #grid.arrange(g1,g2,nrow=2)
63 | ```
64 |
65 | ```{r echo=FALSE, warning=FALSE, message=FALSE}
66 | Y = as.data.frame(t(scale(t(sim$y_sim))))
67 | Y$ts <- as.factor(1:nrow(Y))
68 | plot_data <- reshape2::melt(Y,id.var="ts")
69 | plot_data$x = as.numeric(substr(plot_data$variable, 2, length(plot_data$variable)))
70 | g2 = ggplot(plot_data, aes(x=x,y=value,group=ts,colour=ts)) +
71 | geom_point()+
72 | geom_line() + xlab("Time") +
73 | ylab("Standardized time series") +
74 | theme_bw() #+
75 | #scale_color_viridis(end=0.8, discrete = TRUE)
76 | g2
77 | ```
78 |
79 | Let's initially treat the first ~ 50 time points as a burn in, and fit a DFA model to the latter half of the time series, using all data. We don't really have to rotate trends (because there's just 1). The loadings are estimated ok for trend 1, but underestimated for the others.
80 |
81 | ```{r results='hide'}
82 | fit_1 = fit_dfa(y = sim$y_sim[,51:100], num_trends = 1, chains=chains, iter=iter)
83 | r = rotate_trends(fit_1)
84 | ```
85 | ```{r}
86 | round(r$Z_rot_mean,3)
87 | ```
88 |
89 | Now, we'll pretend that in time steps 1:50 we have observations from time series 1 (but not the others). We'll fit several additional models, adding in back data points in steps of 10, and going backwards in time. All these runs would use time points 51:100 for time series 2 and 3, but they would include time steps 51:100, then 41:100, 31:100, etc. for time series 1.
90 |
91 | *Note for comparison purposes, we'll also standardize all time series 1 time before passing them in as an argument. Time series # 1 won't be re-scaled, but will be re-centered for each iteration. This is important because the time-series are non-stationary.*
92 |
93 | ```{r results='hide', warning=FALSE, message=FALSE}
94 |
95 | output = expand.grid("ts_start"=c(0,25,50),
96 | "x"=1:100, "estimated_trend"=NA, "obs"=NA)
97 |
98 | l = matrix(0, 3, 3)
99 |
100 | for(i in 1:nrow(l)) {
101 | idx = c(1,26,51) # seq(1,60,10)[nrow(l)+1-i]
102 | Y = sim$y_sim
103 | Y = t(scale(t(Y)))
104 | Y[1,1:(idx-1)] = NA
105 | Y[2:3,1:50] = NA
106 | fit_2 = fit_dfa(y = Y, num_trends = 1, chains=1, iter=10, scale="center")
107 | r = rotate_trends(fit_2)
108 | l[i,] = c(r$Z_rot_mean)
109 | output$estimated_trend[which(output$ts_start==(idx-1))] = scale((r$Z_rot_mean %*% r$trends_mean)[2,])
110 | output$obs[which(output$ts_start==(idx-1))] = Y[2,51:100]
111 | }
112 | ```
113 |
114 | Now we can look at the effects of adding in the extra data from time series 2. Here are the predictions for time series 2 over time steps 51:100, adding more data in from time series 1. What this shows is that in general the trends are the same -- though there are nuanced differences between them.
115 |
116 | ```{r echo=FALSE}
117 | #output$estimated_trend[which(output$ts_start==41)] = -1 * output$estimated_trend[which(output$ts_start==41)]
118 | output$ts_start=as.factor(output$ts_start)
119 | Y = sim$y_sim
120 | Y = t(scale(t(Y)))
121 | ts2 = data.frame(x = 1:100, y = Y[2,])
122 | ggplot(output, aes(x,y=estimated_trend,group=ts_start,col=ts_start)) + geom_line(linewidth=2, alpha=0.7) +
123 | #scale_color_viridis(end=0.8,discrete = TRUE) + xlim(51,100) +
124 | xlab("Time") +
125 | ylab("Estimated time series (# 2)") +theme_bw()
126 | ```
127 |
128 | As a cautionary note, any time time series of different lenghts are combined using similar approaches, simulations should be done to try to estimate the influence of adding new data to shared trends or other quantities of interest.
129 |
130 | ```{r echo=FALSE, results='hide', include=FALSE}
131 | L = as.data.frame(t(l))
132 | L$trend = as.factor(seq(1:nrow(L)))
133 | plot_data <- reshape2::melt(L,id.var="trend")
134 | plot_data$x = as.numeric(substr(plot_data$variable, 2, length(plot_data$variable)))
135 | plot_data$x = seq(1,50,5)[11-plot_data$x]
136 | ggplot(plot_data, aes(x=x,y=value,group=trend,colour=trend)) +
137 | geom_point()+
138 | geom_line() + xlab("Time") #+ scale_color_viridis(end=0.8, discrete = TRUE)
139 | ```
140 |
141 |
--------------------------------------------------------------------------------
/vignettes/a3_covariates.Rmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Examples of including covariates with bayesdfa"
3 | author: "Eric J. Ward, Sean C. Anderson, Mary E. Hunsicker, Mike A. Litzow, Luis A. Damiano, Mark D. Scheuerell, Elizabeth E. Holmes, Nick Tolimieri"
4 | date: "`r Sys.Date()`"
5 | output: rmarkdown::html_vignette
6 | vignette: >
7 | %\VignetteIndexEntry{Examples of including covariates with bayesdfa}
8 | %\VignetteEngine{knitr::rmarkdown}
9 | \usepackage[utf8]{inputenc}
10 | ---
11 |
12 | Here we will walk through how to use the bayesdfa package to fit dynamic factor analysis (DFA) models with covariates.
13 |
14 | ```{r set-knitr-options, cache=FALSE, echo=FALSE, warning=FALSE, message=FALSE}
15 | library("knitr")
16 | opts_chunk$set(message = FALSE, fig.width = 5.5)
17 | ```
18 |
19 | Let's load the necessary packages:
20 |
21 | ```{r, message=FALSE, warning=FALSE}
22 | library(bayesdfa)
23 | library(ggplot2)
24 | library(dplyr)
25 | library(rstan)
26 | chains = 1
27 | iter = 10
28 | ```
29 |
30 | ## Notation review for DFA models
31 |
32 | Covariates in dynamic factor analysis are generally included in the observation model, rather than the process model. Without covariates, the model can be expressed as
33 |
34 | $${x}_{t}={x}_{t-1}+{e}_t\\ { e }_{ t }\sim MVN(0,\textbf{Q})\\ { y }_{ t }=\textbf{Z}{ x }_{ t }+{ v }_{ t }\\ { v }_{ t }\sim MVN(0,\textbf{R})$$
35 |
36 | where the matrix $\textbf{Z}$ is dimensioned as the number of time series by number of trends, and maps the observed data $y_{t}$ to the latent trends $x_{t}$.
37 |
38 | ### Observation covariates
39 |
40 | Observation covariates can be
41 | $${x}_{t}={x}_{t-1}+{e}_t\\ { e }_{ t }\sim MVN(0,\textbf{Q})\\ { y }_{ t }=\textbf{Z}{ x }_{ t }+\textbf{D}{ d }_{ t }+{ v }_{ t }\\ { v }_{ t }\sim MVN(0,\textbf{R})$$
42 | where the matrix $\textbf{D}$ represents time series by number of covariates at time $t$. For a single covariate, such as temperature, this would mean estimating $P$ parameters, where $P$ is the number of time series. For a model including 2 covariates, the number of estimated coefficients would be $2P$ and so forth.
43 |
44 | ### Process covariates
45 |
46 | Process covariates on the trends are less common but can be written as
47 | $${x}_{t}={x}_{t-1}+\textbf{C}{ c }_{ t }+{e}_t\\ { e }_{ t }\sim MVN(0,\textbf{Q})\\ { y }_{ t }=\textbf{Z}{ x }_{ t }+{ v }_{ t }\\ { v }_{ t }\sim MVN(0,\textbf{R})$$
48 | where the matrix $\textbf{C}$ represents the number of trends by number of covariates at time $t$. For a single trend, this would mean estimating $K$ parameters, where $K$ is the number of trends. For a model including 2 covariates, the number of estimated coefficients would be $2K$ and so forth.
49 |
50 |
51 | ## Examples -- observation covariates
52 |
53 | We'll start by simulating some random trends using the `sim_dat` function,
54 | ```{r simulate-data-obs}
55 | set.seed(1)
56 | sim_dat <- sim_dfa(
57 | num_trends = 1,
58 | num_years = 20,
59 | num_ts = 4
60 | )
61 | ```
62 |
63 | Next, we can add a covariate effect to the trend estimate, `x`. For example,
64 |
65 | ```{r}
66 | cov = expand.grid("time"=1:20, "timeseries"=1:4, "covariate"=1)
67 | cov$value = rnorm(nrow(cov),0,0.1)
68 |
69 | for(i in 1:nrow(cov)) {
70 | sim_dat$y[cov$timeseries[i],cov$time[i]] = sim_dat$pred[cov$timeseries[i],cov$time[i]] +
71 | c(0.1,0.2,0.3,0.4)[cov$timeseries[i]]*cov$value[i]
72 | }
73 | ```
74 |
75 | And now fit the model with `fit_dfa`
76 |
77 | ```{r results='hide', warning=FALSE, message=FALSE}
78 | mod = fit_dfa(y = sim_dat$y, obs_covar = cov, num_trends = 1,
79 | chains=chains, iter=iter)
80 | ```
81 |
82 | We can then make plots of the true and estimated trend,
83 |
84 | ```{r}
85 | plot(c(sim_dat$x), xlab="Time", ylab="True trend")
86 | ```
87 |
88 | ```{r}
89 | plot_trends(rotate_trends(mod)) + ylab("Estimated trend") + theme_bw()
90 | ```
91 |
92 | This approach could be modified to have covariates not affecting some time series. For example, if we didn't want the covariate to affect the last time series, we could say
93 |
94 | ```{r}
95 | cov = cov[which(cov$timeseries!=4),]
96 | ```
97 |
98 | And then again fit the model
99 | ```{r eval=FALSE, results='hide', warning=FALSE, message=FALSE}
100 | mod = fit_dfa(y = sim_dat$y, obs_covar = cov, num_trends = trends,
101 | chains=chains)
102 | ```
103 |
104 | ## Examples -- process covariates
105 |
106 | As a cautionary note, there's some identifiability issues with including covariates in the process model. Covariates need to be standardized or centered prior to being included. Future versions of this vignette will include more clear examples and recommendations.
107 |
108 | We'll start by simulating some random trends using the `sim_dat` function,
109 |
110 | ```{r simulate-data}
111 | set.seed(1)
112 | sim_dat <- sim_dfa(
113 | num_trends = 2,
114 | num_years = 20,
115 | num_ts = 3
116 | )
117 | ```
118 |
119 | Next, we can add a covariate effect to the trend estimate, `x`. For example,
120 |
121 | ```{r}
122 | cov = rnorm(20, 0, 1)
123 | b_pro = c(1,0.3)
124 | x = matrix(0,2,20)
125 |
126 | for(i in 1:2) {
127 | x[i,1] = cov[1]*b_pro[i]
128 | }
129 |
130 | for(i in 2:length(cov)) {
131 | x[1,i] = x[1,i-1] + cov[i]*b_pro[1] + rnorm(1,0,1)
132 | x[2,i] = x[2,i-1] + cov[i]*b_pro[2] + rnorm(1,0,1)
133 | }
134 |
135 | y = sim_dat$Z %*% x
136 | ```
137 |
138 | And now fit the model with `fit_dfa`
139 |
140 | ```{r, eval=FALSE, results='hide', warning=FALSE, message=FALSE}
141 | pro_cov = expand.grid("trend"=1:2, "time"=1:20, "covariate"=1)
142 | pro_cov$value = cov[pro_cov$time]
143 |
144 | mod = fit_dfa(y = sim_dat$y, pro_covar = pro_cov, num_trends = 2,
145 | chains=chains, iter=iter)
146 |
147 | ```
148 |
--------------------------------------------------------------------------------
/vignettes/a4_smooth.Rmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Examples of fitting smooth trend DFA models"
3 | author: "Eric J. Ward, Sean C. Anderson, Mary E. Hunsicker, Mike A. Litzow, Luis A. Damiano, Mark D. Scheuerell, Elizabeth E. Holmes, Nick Tolimieri"
4 | date: "`r Sys.Date()`"
5 | output: rmarkdown::html_vignette
6 | vignette: >
7 | %\VignetteIndexEntry{Examples of fitting smooth trend DFA models}
8 | %\VignetteEngine{knitr::rmarkdown}
9 | \usepackage[utf8]{inputenc}
10 | ---
11 |
12 | In addition to fitting conventional DFA models with trends modeled as random walks (or ARMA processes), we can also construct models where underlying trends are treated as smooth trends (B-splines, P-splines, or Gaussian processes).
13 |
14 | ```{r set-knitr-options, cache=FALSE, echo=FALSE, warning=FALSE, message=FALSE}
15 | library("knitr")
16 | opts_chunk$set(message = FALSE, fig.width = 5.5)
17 | ```
18 |
19 | Let's load the necessary packages:
20 |
21 | ```{r, message=FALSE, warning=FALSE}
22 | library(bayesdfa)
23 | library(ggplot2)
24 | library(dplyr)
25 | library(rstan)
26 | chains = 1
27 | iter = 10
28 | ```
29 |
30 | ## Data simulation
31 |
32 | The `sim_dfa` function normally simulates loadings $\sim N(0,1)$, but here we will simulate time series that are more similar with loadings $\sim N(1,0.1)$
33 |
34 | ```{r}
35 | set.seed(1)
36 | s = sim_dfa(num_trends = 1, num_years = 1000, num_ts = 4,
37 | loadings_matrix = matrix(nrow = 4, ncol = 1, rnorm(4 * 1,
38 | 1, 0.1)), sigma=0.05)
39 | ```
40 |
41 | ```{r}
42 | matplot(t(s$y_sim), type="l")
43 | ```
44 |
45 | ## Estimating trends as B-splines
46 |
47 | As a first approach, we can fit models where trends are estimated as B-splines. To do this, we change the `trend_model` argument, and specify the number of knots. More knots translates to smoother functions. For example,
48 |
49 | ```{r eval = FALSE}
50 | set.seed(1)
51 | fit = fit_dfa(y = s$y_sim, num_trends = 1,
52 | trend_model = "bs", n_knots = 7)
53 | ```
54 |
55 | Or for a model with more knots,
56 |
57 | ```{r eval = FALSE}
58 | set.seed(1)
59 | fit = fit_dfa(y = s$y_sim, num_trends = 1,
60 | trend_model = "bs", n_knots = 14)
61 | ```
62 |
63 |
64 | ## Estimating trends as P-splines
65 |
66 | Obviously, trends from the B-spline model are sensitive to the number of knots. As an alternative, we also allow trends to be modeled as penalized regression splines ("P-splines"). These methods are less sensitive to the numbers of knots, and only require the knots to be enough to adequately describe the wiggliness of the function.
67 |
68 | We can fit these kinds of models by changing the `trend_model` argument
69 | ```{r eval = FALSE}
70 | set.seed(1)
71 | fit = fit_dfa(y = s$y_sim, num_trends = 1,
72 | trend_model = "ps", n_knots = 7)
73 | ```
74 |
75 | ## Estimating trends as Gaussian processes
76 |
77 | Finally, another type of smoothing that can be done is treating the trends as Gaussian processes. Both full rank models (knots = time points) or predictive process models may be fit (fewer knots results in smoother functions). These types of models may be specified by again changing the `trend_model` argument,
78 |
79 | ```{r eval = FALSE}
80 | set.seed(1)
81 | fit = fit_dfa(y = s$y_sim, num_trends = 1,
82 | trend_model = "gp", n_knots = 7)
83 | ```
84 |
85 | ## Comparing approaches
86 |
87 | All of the smooth trend methods are flexible and able to capture the wiggliness of latent trends. Based on our experience, the B-spline and P-spline models will generally fit faster than the Gaussian predicitve process models (because they omit a critical matrix inversion step). The full rank Gaussian process models tend to be faster than the predictive process models. All of these approaches can be compared using cross validation, or similar predictive performance criterion.
88 |
--------------------------------------------------------------------------------
/vignettes/a6_compositional.Rmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Fitting compositional dynamic factor models with bayesdfa"
3 | author: "Eric J. Ward, Sean C. Anderson, Mary E. Hunsicker, Mike A. Litzow, Luis A. Damiano, Mark D. Scheuerell, Elizabeth E. Holmes, Nick Tolimieri"
4 | date: "`r Sys.Date()`"
5 | output: rmarkdown::html_vignette
6 | vignette: >
7 | %\VignetteIndexEntry{Fitting compositional dynamic factor models with bayesdfa}
8 | %\VignetteEngine{knitr::rmarkdown}
9 | \usepackage[utf8]{inputenc}
10 | ---
11 |
12 | An extension of conventional DFA is to change the constraints on the estimated loadings, $Z$. DFA models generally impose constraints for identifiability, where individual time series are allowed to map onto dynamic factors with loadings ranging from large negative to large positive ones. Instead, if we viewed a collection of multivariate time series as originating from a compositional model - such that the loadings $Z$ were forced to be positive, and each of the time series arises as a mixture of the estimated loadings.
13 |
14 | ```{r set-knitr-options, cache=FALSE, echo=FALSE, warning=FALSE, message=FALSE}
15 | library("knitr")
16 | opts_chunk$set(message = FALSE, fig.width = 5.5)
17 | ```
18 |
19 | Let's load the necessary packages:
20 |
21 | ```{r, message=FALSE, warning=FALSE}
22 | library(bayesdfa)
23 | library(ggplot2)
24 | library(dplyr)
25 | library(rstan)
26 | chains = 1
27 | iter = 10
28 | ```
29 |
30 | ## 2 - trend model
31 |
32 | First, let's simulate some data. The compositional DFA model is not interesting with a single trend, so we'll initially start with 2 trends. We'll use the sim_dfa function to simulate the random walks (assuming 20 time steps, and 2 latent trends).
33 |
34 | ```{r}
35 | set.seed(1)
36 | s = sim_dfa(num_trends = 2, num_years = 20,
37 | num_ts = 5)
38 | ```
39 |
40 | Next, we can create some compositional loadings for the mixture. These could random (e.g. using 'rdirichlet'), but we'll pass in known values.
41 |
42 | ```{r}
43 | m = matrix(0, nrow=5,ncol=2)
44 | m[1,] = c(0.8, 0.2) # time series # 1 is 80% trend 1
45 | m[2,] = c(0.9, 0.1) # time series # 2 is 90% trend 1
46 | m[3,] = c(0.3, 0.7) # time series # 3 is 30% trend 1
47 | m[4,] = c(0.35, 0.65) # time series # 4 is 35% trend 1
48 | m[5,] = c(0.7, 0.2) # time series # 5 is 70% trend 1
49 | ```
50 |
51 | Using the simulated trends and compositional data $m$, we can now create the simulated data,
52 |
53 | ```{r}
54 | pred = m%*%s$x
55 | y = pred + matrix(rnorm(nrow(pred)*ncol(pred),0,0.1), nrow=nrow(pred), ncol = ncol(pred))
56 | ```
57 |
58 | Next, we can pass this data to the fit_dfa() function and try to recover our known loadings. Note: the compositional model has a nasty label-switching issue that our 'chain flipping' function doesn't handle well. Multiple chains may be run, but need to be examined to evaluate whether they need to be flipped. To avoid that situation, we'll just use a single chain.
59 |
60 | ```{r, message=FALSE, warning=FALSE, results='hide'}
61 | fit <- fit_dfa(y = y, iter = iter, chains = chains, num_trends = 2, seed = 42,
62 | z_model = "proportion",scale="center")
63 | ```
64 |
65 | We can't rotate the Z matrix -- because the compositional constraint would be destroyed. Other than the loadings on time series 3 and 4, these loadings are close to those in the simualting model,
66 |
67 | ```{r}
68 | pars = rstan::extract(fit$model,permuted=TRUE)
69 | rounded_Z = round(apply(pars$Z,c(2,3),mean),2)
70 | print(rounded_Z[,c(2,1)])
71 | ```
72 |
73 | Combining the estimated trends and true trends in the simulation shows that the trends are offset by an intercept, but track the overall simulated values very well (time series 1 represents the estimated trend trying to recover the true trend indicated with time series 3, time series 2 represents the estimated trend trying to recover the true trend indicated with time series 4)
74 |
75 | ```{r}
76 | x = apply(pars$x, c(2,3), mean)[c(2,1),]
77 | matplot(t(rbind(x,s$x)))
78 | ```
79 |
80 |
81 | ## 3 - trend model
82 |
83 | We can extend this approach to also include a model with 3 trends,
84 |
85 | ```{r}
86 | set.seed(1)
87 | s = sim_dfa(num_trends = 3, num_years = 20,
88 | num_ts = 5)
89 | ```
90 |
91 | Next, we can create some compositional loadings for the mixture. These could random (e.g. using 'rdirichlet'), but we'll pass in known values.
92 |
93 | ```{r}
94 | m = matrix(0, nrow=5,ncol=3)
95 | m[1,] = c(0.31, 0.48,0.21) # time series # 1
96 | m[2,] = c(0.25, 0.04, 0.71) # time series # 2
97 | m[3,] = c(0.21, 0.28, 0.51) # time series # 3
98 | m[4,] = c(0.6, 0.02, 0.38) # time series # 4
99 | m[5,] = c(0.15, 0.21, 0.64) # time series # 5
100 | ```
101 |
102 | Add observation error
103 | ```{r}
104 | pred = m%*%s$x
105 | y = pred + matrix(rnorm(nrow(pred)*ncol(pred),0,0.01), nrow=nrow(pred), ncol = ncol(pred))
106 | ```
107 |
108 | Next we can fit the compositional DFA model,
109 |
110 | ```{r message=FALSE, warning=FALSE, results='hide'}
111 | fit <- fit_dfa(y = y, iter = iter, chains = chains, num_trends = 3, seed = 42,
112 | z_model = "proportion",scale="center")
113 | ```
114 |
115 | We can look at the true values of the loadings (open triangles) and estimated values (solid circles) to see that for some of the loadings, the model estimates the parameters well.
116 |
117 | ```{r echo=FALSE}
118 | pars = rstan::extract(fit$model,permuted=TRUE)
119 | rounded_Z = round(apply(pars$Z,c(2,3),mean),2)
120 |
121 | df = data.frame("value"=c(rounded_Z), "id" = "estimated",
122 | "trend"=as.factor(sort(rep(1:3,5))), "ts" = as.factor(rep(1:5,3)))
123 |
124 | df2 = data.frame("value"=c(m[,c(3,2,1)]), "id" = "true",
125 | "trend"=as.factor(sort(rep(1:3,5))), "ts" = as.factor(rep(1:5,3)))
126 |
127 | ggplot(data=rbind(df,df2), aes(ts,value,group=trend,col=trend,
128 | fill=trend,shape=id)) +
129 | geom_point(size=4) +
130 | xlab("Time series") + ylab("Value")
131 | ```
132 |
133 |
134 |
135 |
136 |
--------------------------------------------------------------------------------
/vignettes/a7_bigdata.Rmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Examples of fitting DFA models with lots of data"
3 | author: "Eric J. Ward, Sean C. Anderson, Mary E. Hunsicker, Mike A. Litzow, Luis A. Damiano, Mark D. Scheuerell, Elizabeth E. Holmes, Nick Tolimieri"
4 | date: "`r Sys.Date()`"
5 | output: rmarkdown::html_vignette
6 | vignette: >
7 | %\VignetteIndexEntry{Examples of fitting DFA models with lots of data}
8 | %\VignetteEngine{knitr::rmarkdown}
9 | \usepackage[utf8]{inputenc}
10 | ---
11 |
12 | For some applications, there may be a huge number of observations (e.g. daily stream flow measurements, bird counts) making estimation with MCMC slow. While estimation (and uncertainty) for final models should be done with MCMC, there are a few much faster alternatives that we can use for these cases. They may be generally useful for other DFA problems -- both in diagnosing convergence problems, and doing preliminary model selection.
13 |
14 | ```{r set-knitr-options, cache=FALSE, echo=FALSE, warning=FALSE, message=FALSE}
15 | library("knitr")
16 | opts_chunk$set(message = FALSE, fig.width = 5.5)
17 | ```
18 |
19 | Let's load the necessary packages:
20 |
21 | ```{r, message=FALSE, warning=FALSE}
22 | library(bayesdfa)
23 | library(ggplot2)
24 | library(dplyr)
25 | library(rstan)
26 | chains = 1
27 | iter = 10
28 | ```
29 |
30 | ## Data simulation
31 |
32 | The `sim_dfa` function normally simulates loadings $\sim N(0,1)$, but here we will simulate time series that are more similar with loadings $\sim N(1,0.1)$
33 |
34 | ```{r}
35 | set.seed(1)
36 | s = sim_dfa(num_trends = 1, num_years = 1000, num_ts = 4,
37 | loadings_matrix = matrix(nrow = 4, ncol = 1, rnorm(4 * 1,
38 | 1, 0.1)), sigma=0.05)
39 | ```
40 |
41 | ```{r}
42 | matplot(t(s$y_sim), type="l")
43 | ```
44 |
45 | ## Sampling argument
46 |
47 | In the examples below, we'll take advantage of the `estimation` argument. By default, this defaults to MCMC ("sampling") but can be a few other options described below. If you want to construct a model object, but do no sampling, you can also set this to "none".
48 |
49 | ```{r eval = FALSE}
50 | fit <- fit_dfa(..., estimation = "sampling")
51 | ```
52 |
53 |
54 | ## Posterior optimization
55 |
56 | The fastest estimation approach is to do optimze the posterior (this is similar to maximum likelihood but also involves the prior distribution). We can implement this with by setting the estimation argument to "optimizing"
57 |
58 | Note -- because this model has a lot of parameters, estimation can be finicky, and can get stuck in local minima. You may have to start this from several seeds to get the model to converge successfully -- or if there is a mismatch between the model and data, it may not converge at all.
59 |
60 | For example, this model does not converge
61 |
62 | ```{r}
63 | set.seed(123)
64 | m <- fit_dfa(y = s$y_sim, estimation = "optimizing")
65 | ```
66 |
67 | The optimizing output is saved here (`value` = log posterior, `par` = estimated parameters)
68 |
69 | ```{r}
70 | names(m$model)
71 | ```
72 |
73 | And if convergence is successful, the optimizer code will be 0 (this model isn't converging)
74 |
75 | ```{r}
76 | m$model$return_code
77 | ```
78 |
79 | But if we change the seed, the model will converge ok:
80 |
81 | ```{r}
82 | set.seed(124)
83 | m <- fit_dfa(y = s$y_sim, estimation = "optimizing")
84 | ```
85 |
86 | ```{r}
87 | m$model$return_code
88 | ```
89 |
90 | ## Posterior approximation
91 |
92 | A second approach to quickly estimating parameters is to use Variational Bayes, which is also implemented in Stan. This is implemented by changing the `estimation` to "vb", as shown below. Note: this gives a helpful message that the maximum number of iterations has been reached, so these results should not be trusted.
93 |
94 | ```{r, message=FALSE, warning=FALSE, eval=FALSE}
95 | m <- fit_dfa(y = s$y_sim, estimation = "vb", seed=123)
96 | ```
97 |
98 | There are a number of other arguments that can be passed into `rstan::vb()`. These include `iter` (maximum iterations, defaults to 10000), `tol_rel_obj` (convergence tolerance, defaults to 0.01), and `output_samples` (posterior samples to save, defaults to 1000). To use these, a function call would be
99 |
100 | ```{r, message=FALSE, warning=FALSE, eval=FALSE}
101 | m <- fit_dfa(y = s$y_sim, estimation = "vb", seed=123, iter=20000,
102 | tol_rel_obj = 0.005, output_samples = 2000)
103 | ```
104 |
105 |
106 |
--------------------------------------------------------------------------------