Title: | Parallel Chain Tools for Bayesian Kernel Machine Regression |
---|---|
Description: | Bayesian kernel machine regression (from the 'bkmr' package) is a Bayesian semi-parametric generalized linear model approach under identity and probit links. There are a number of functions in this package that extend Bayesian kernel machine regression fits to allow multiple-chain inference and diagnostics, which leverage functions from the 'future', 'rstan', and 'coda' packages. Reference: Bobb, J. F., Henn, B. C., Valeri, L., & Coull, B. A. (2018). Statistical software for analyzing the health effects of multiple concurrent exposures via Bayesian kernel machine regression. ; <doi:10.1186/s12940-018-0413-y>. |
Authors: | Alexander Keil [aut, cre] |
Maintainer: | Alexander Keil <[email protected]> |
License: | GPL (>= 3) |
Version: | 1.1.3 |
Built: | 2024-11-09 04:09:25 UTC |
Source: | https://github.com/alexpkeil1/bkmrhat |
Converts a kmrfit
(from the bkmr package) into
an mcmc
object from the coda
package. The
coda
package enables many different types of single chain MCMC
diagnostics, including geweke.diag
, traceplot
and
effectiveSize
. Posterior summarization is also available,
such as HPDinterval
and summary.mcmc
.
## S3 method for class 'bkmrfit' as.mcmc(x, iterstart = 1, thin = 1, ...)
## S3 method for class 'bkmrfit' as.mcmc(x, iterstart = 1, thin = 1, ...)
x |
object of type kmrfit (from bkmr package) |
iterstart |
first iteration to use (e.g. for implementing burnin) |
thin |
keep 1/thin % of the total iterations (at regular intervals) |
... |
unused |
An mcmc
object
# following example from https://jenfb.github.io/bkmr/overview.html set.seed(111) library(coda) library(bkmr) dat <- bkmr::SimData(n = 50, M = 4) y <- dat$y Z <- dat$Z X <- dat$X set.seed(111) fitkm <- kmbayes(y = y, Z = Z, X = X, iter = 500, verbose = FALSE, varsel = FALSE) mcmcobj <- as.mcmc(fitkm, iterstart=251) summary(mcmcobj) # posterior summaries of model parameters # compare with default from bkmr package, which omits first 1/2 of chain summary(fitkm) # note this only works on multiple chains (see kmbayes_parallel) # gelman.diag(mcmcobj) # lots of functions in the coda package to use traceplot(mcmcobj) # will also fail with delta functions (when using variable selection) try(geweke.plot(mcmcobj))
# following example from https://jenfb.github.io/bkmr/overview.html set.seed(111) library(coda) library(bkmr) dat <- bkmr::SimData(n = 50, M = 4) y <- dat$y Z <- dat$Z X <- dat$X set.seed(111) fitkm <- kmbayes(y = y, Z = Z, X = X, iter = 500, verbose = FALSE, varsel = FALSE) mcmcobj <- as.mcmc(fitkm, iterstart=251) summary(mcmcobj) # posterior summaries of model parameters # compare with default from bkmr package, which omits first 1/2 of chain summary(fitkm) # note this only works on multiple chains (see kmbayes_parallel) # gelman.diag(mcmcobj) # lots of functions in the coda package to use traceplot(mcmcobj) # will also fail with delta functions (when using variable selection) try(geweke.plot(mcmcobj))
Converts a kmrfit.list
(from the bkmrhat package) into
an mcmc.list
object from the coda
package.The
coda
package enables many different types of MCMC diagnostics,
including geweke.diag
, traceplot
and
effectiveSize
. Posterior summarization is also available,
such as HPDinterval
and summary.mcmc
.
Using multiple chains is necessary for certain MCMC diagnostics, such as
gelman.diag
, and gelman.plot
.
## S3 method for class 'list.bkmrfit.list' as.mcmc(x, ...)
## S3 method for class 'list.bkmrfit.list' as.mcmc(x, ...)
x |
object of type kmrfit.list (from bkmrhat package) |
... |
arguments to |
An mcmc.list
object
# following example from https://jenfb.github.io/bkmr/overview.html set.seed(111) library(coda) dat <- bkmr::SimData(n = 50, M = 4) y <- dat$y Z <- dat$Z X <- dat$X set.seed(111) future::plan(strategy = future::multisession, workers=2) # run 2 parallel Markov chains (more usually better) fitkm.list <- kmbayes_parallel(nchains=2, y = y, Z = Z, X = X, iter = 1000, verbose = FALSE, varsel = FALSE) mcmcobj = as.mcmc.list(fitkm.list) summary(mcmcobj) # Gelman/Rubin diagnostics won't work on certain objects, # like delta parameters (when using variable selection), # so the rstan version of this will work better (does not give errors) try(gelman.diag(mcmcobj)) # lots of functions in the coda package to use plot(mcmcobj) # both of these will also fail with delta functions (when using variable selection) try(gelman.plot(mcmcobj)) try(geweke.plot(mcmcobj)) closeAllConnections()
# following example from https://jenfb.github.io/bkmr/overview.html set.seed(111) library(coda) dat <- bkmr::SimData(n = 50, M = 4) y <- dat$y Z <- dat$Z X <- dat$X set.seed(111) future::plan(strategy = future::multisession, workers=2) # run 2 parallel Markov chains (more usually better) fitkm.list <- kmbayes_parallel(nchains=2, y = y, Z = Z, X = X, iter = 1000, verbose = FALSE, varsel = FALSE) mcmcobj = as.mcmc.list(fitkm.list) summary(mcmcobj) # Gelman/Rubin diagnostics won't work on certain objects, # like delta parameters (when using variable selection), # so the rstan version of this will work better (does not give errors) try(gelman.diag(mcmcobj)) # lots of functions in the coda package to use plot(mcmcobj) # both of these will also fail with delta functions (when using variable selection) try(gelman.plot(mcmcobj)) try(geweke.plot(mcmcobj)) closeAllConnections()
Posterior inclusion probabilities by chain
ExtractPIPs_parallel(x, ...)
ExtractPIPs_parallel(x, ...)
x |
bkmrfit.list object from |
... |
arguments to |
data.frame with all chains together
Combine multiple chains comprising BKMR fits at different starting values.
kmbayes_combine( fitkm.list, burnin = NULL, excludeburnin = FALSE, reorder = TRUE ) comb_bkmrfits(fitkm.list, burnin = NULL, excludeburnin = FALSE, reorder = TRUE)
kmbayes_combine( fitkm.list, burnin = NULL, excludeburnin = FALSE, reorder = TRUE ) comb_bkmrfits(fitkm.list, burnin = NULL, excludeburnin = FALSE, reorder = TRUE)
fitkm.list |
output from |
burnin |
(numeric, or default=NULL) add in custom burnin (number of burnin iterations per chain). If NULL, then default to half of the chain |
excludeburnin |
(logical, default=FALSE) should burnin iterations be excluded from the final chains? Note that all bkmr package functions automatically exclude burnin from calculations. |
reorder |
(logical, default=TRUE) ensures that the first half of the combined chain contains
only the first half of each individual chain - this allows unaltered use
of standard functions from bkmr package, which automatically trims the first
half of the iterations. This can be used for posterior summaries, but
certain diagnostics may not work well (autocorrelation, effective sample size)
so the diagnostics should be done on the individual chains
#' @param ... arguments to |
Chains are not combined fully sequentially
a bkmrplusfit
object, which inherits from bkmrfit
(from the kmbayes
function)
with multiple chains combined into a single object and additional parameters
given by chain
and iters
, which index the specific chains and
iterations for each posterior sample in the bkmrplusfit
object
# following example from https://jenfb.github.io/bkmr/overview.html set.seed(111) library(bkmr) dat <- bkmr::SimData(n = 50, M = 4) y <- dat$y Z <- dat$Z X <- dat$X set.seed(111) future::plan(strategy = future::multisession, workers=2) # run 4 parallel Markov chains (low iterations used for illustration) fitkm.list <- kmbayes_parallel(nchains=2, y = y, Z = Z, X = X, iter = 500, verbose = FALSE, varsel = TRUE) # use bkmr defaults for burnin, but keep them bigkm = kmbayes_combine(fitkm.list, excludeburnin=FALSE) ests = ExtractEsts(bigkm) # defaults to keeping second half of samples ExtractPIPs(bigkm) pred.resp.univar <- PredictorResponseUnivar(fit = bigkm) risks.overall <- OverallRiskSummaries(fit = bigkm, y = y, Z = Z, X = X, qs = seq(0.25, 0.75, by = 0.05), q.fixed = 0.5, method = "exact") # additional objects that are not in a standard bkmrfit object: summary(bigkm$iters) # note that this reflects how fits are re-ordered to reflect burnin table(bigkm$chain) closeAllConnections()
# following example from https://jenfb.github.io/bkmr/overview.html set.seed(111) library(bkmr) dat <- bkmr::SimData(n = 50, M = 4) y <- dat$y Z <- dat$Z X <- dat$X set.seed(111) future::plan(strategy = future::multisession, workers=2) # run 4 parallel Markov chains (low iterations used for illustration) fitkm.list <- kmbayes_parallel(nchains=2, y = y, Z = Z, X = X, iter = 500, verbose = FALSE, varsel = TRUE) # use bkmr defaults for burnin, but keep them bigkm = kmbayes_combine(fitkm.list, excludeburnin=FALSE) ests = ExtractEsts(bigkm) # defaults to keeping second half of samples ExtractPIPs(bigkm) pred.resp.univar <- PredictorResponseUnivar(fit = bigkm) risks.overall <- OverallRiskSummaries(fit = bigkm, y = y, Z = Z, X = X, qs = seq(0.25, 0.75, by = 0.05), q.fixed = 0.5, method = "exact") # additional objects that are not in a standard bkmrfit object: summary(bigkm$iters) # note that this reflects how fits are re-ordered to reflect burnin table(bigkm$chain) closeAllConnections()
Combine multiple chains comprising BKMR fits at different starting values. This function writes some results to disk, rather than trying to process fully within memory which, in some cases, will result in avoiding "out of memory" errors that can happen with kmbayes_combine.
kmbayes_combine_lowmem( fitkm.list, burnin = NULL, excludeburnin = FALSE, reorder = TRUE ) comb_bkmrfits_lowmem( fitkm.list, burnin = NULL, excludeburnin = FALSE, reorder = TRUE )
kmbayes_combine_lowmem( fitkm.list, burnin = NULL, excludeburnin = FALSE, reorder = TRUE ) comb_bkmrfits_lowmem( fitkm.list, burnin = NULL, excludeburnin = FALSE, reorder = TRUE )
fitkm.list |
output from |
burnin |
(numeric, or default=NULL) add in custom burnin (number of burnin iterations per chain). If NULL, then default to half of the chain |
excludeburnin |
(logical, default=FALSE) should burnin iterations be excluded from the final chains? Note that all bkmr package functions automatically exclude burnin from calculations. |
reorder |
(logical, default=TRUE) ensures that the first half of the combined chain contains
only the first half of each individual chain - this allows unaltered use
of standard functions from bkmr package, which automatically trims the first
half of the iterations. This can be used for posterior summaries, but
certain diagnostics may not work well (autocorrelation, effective sample size)
so the diagnostics should be done on the individual chains
#' @param ... arguments to |
Chains are not combined fully sequentially (see "reorder")
a bkmrplusfit
object, which inherits from bkmrfit
(from the kmbayes
function)
with multiple chains combined into a single object and additional parameters
given by chain
and iters
, which index the specific chains and
iterations for each posterior sample in the bkmrplusfit
object
# following example from https://jenfb.github.io/bkmr/overview.html set.seed(111) library(bkmr) dat <- bkmr::SimData(n = 50, M = 4) y <- dat$y Z <- dat$Z X <- dat$X set.seed(111) future::plan(strategy = future::multisession, workers=2) # run 4 parallel Markov chains (low iterations used for illustration) fitkm.list <- kmbayes_parallel(nchains=2, y = y, Z = Z, X = X, iter = 500, verbose = FALSE, varsel = TRUE) # use bkmr defaults for burnin, but keep them bigkm = kmbayes_combine_lowmem(fitkm.list, excludeburnin=FALSE) ests = ExtractEsts(bigkm) # defaults to keeping second half of samples ExtractPIPs(bigkm) pred.resp.univar <- PredictorResponseUnivar(fit = bigkm) risks.overall <- OverallRiskSummaries(fit = bigkm, y = y, Z = Z, X = X, qs = seq(0.25, 0.75, by = 0.05), q.fixed = 0.5, method = "exact") # additional objects that are not in a standard bkmrfit object: summary(bigkm$iters) # note that this reflects how fits are re-ordered to reflect burnin table(bigkm$chain) closeAllConnections()
# following example from https://jenfb.github.io/bkmr/overview.html set.seed(111) library(bkmr) dat <- bkmr::SimData(n = 50, M = 4) y <- dat$y Z <- dat$Z X <- dat$X set.seed(111) future::plan(strategy = future::multisession, workers=2) # run 4 parallel Markov chains (low iterations used for illustration) fitkm.list <- kmbayes_parallel(nchains=2, y = y, Z = Z, X = X, iter = 500, verbose = FALSE, varsel = TRUE) # use bkmr defaults for burnin, but keep them bigkm = kmbayes_combine_lowmem(fitkm.list, excludeburnin=FALSE) ests = ExtractEsts(bigkm) # defaults to keeping second half of samples ExtractPIPs(bigkm) pred.resp.univar <- PredictorResponseUnivar(fit = bigkm) risks.overall <- OverallRiskSummaries(fit = bigkm, y = y, Z = Z, X = X, qs = seq(0.25, 0.75, by = 0.05), q.fixed = 0.5, method = "exact") # additional objects that are not in a standard bkmrfit object: summary(bigkm$iters) # note that this reflects how fits are re-ordered to reflect burnin table(bigkm$chain) closeAllConnections()
Use this when you've used MCMC sampling with the kmbayes
function, but you did not take enough samples and do not want to start over.
kmbayes_continue(fit, ...)
kmbayes_continue(fit, ...)
fit |
output from |
... |
arguments to |
Note this does not fully start from the prior values of the MCMC chains. The kmbayes
function does not allow full specification of the kernel function parameters, so this will restart the chain at the last values of all fixed effect parameters, and start the kernel r
parmeters at the arithmetic mean of all r
parameters from the last step in the previous chain.
a bkmrfit.continued
object, which inherits from bkmrfit
objects similar to kmbayes
output, and which can be used to make inference using functions from the bkmr
package.
set.seed(111) dat <- bkmr::SimData(n = 50, M = 4) y <- dat$y Z <- dat$Z X <- dat$X ## Not run: fitty1 = bkmr::kmbayes(y=y,Z=Z,X=X, est.h=TRUE, iter=100) # do some diagnostics here to see if 100 iterations (default) is enough # add 100 additional iterations (for illustration - still will not be enough) fitty2 = kmbayes_continue(fitty1, iter=100) cobj = as.mcmc(fitty2) varnames(cobj) ## End(Not run)
set.seed(111) dat <- bkmr::SimData(n = 50, M = 4) y <- dat$y Z <- dat$Z X <- dat$X ## Not run: fitty1 = bkmr::kmbayes(y=y,Z=Z,X=X, est.h=TRUE, iter=100) # do some diagnostics here to see if 100 iterations (default) is enough # add 100 additional iterations (for illustration - still will not be enough) fitty2 = kmbayes_continue(fitty1, iter=100) cobj = as.mcmc(fitty2) varnames(cobj) ## End(Not run)
Give MCMC diagnostistics from the rstan
package
using the Rhat
, ess_bulk
,
and ess_tail
functions. Note that r-hat is only
reported for bkmrfit.list
objects from kmbayes_parallel
kmbayes_diagnose(kmobj, ...) kmbayes_diag(kmobj, ...)
kmbayes_diagnose(kmobj, ...) kmbayes_diag(kmobj, ...)
kmobj |
Either an object from |
... |
arguments to |
set.seed(111) dat <- bkmr::SimData(n = 50, M = 4) y <- dat$y Z <- dat$Z X <- dat$X set.seed(111) future::plan(strategy = future::multisession) fitkm.list <- kmbayes_parallel(nchains=2, y = y, Z = Z, X = X, iter = 1000, verbose = FALSE, varsel = TRUE) kmbayes_diag(fitkm.list) kmbayes_diag(fitkm.list[[1]]) # just the first chain closeAllConnections()
set.seed(111) dat <- bkmr::SimData(n = 50, M = 4) y <- dat$y Z <- dat$Z X <- dat$X set.seed(111) future::plan(strategy = future::multisession) fitkm.list <- kmbayes_parallel(nchains=2, y = y, Z = Z, X = X, iter = 1000, verbose = FALSE, varsel = TRUE) kmbayes_diag(fitkm.list) kmbayes_diag(fitkm.list[[1]]) # just the first chain closeAllConnections()
Fit parallel chains from the kmbayes
function.
These chains leverage parallel processing from the future
package, which
can speed fitting and enable diagnostics that rely on multiple Markov
chains from dispersed initial values.
kmbayes_parallel(nchains = 4, ...)
kmbayes_parallel(nchains = 4, ...)
nchains |
number of parallel chains |
... |
arguments to kmbayes |
a "bkmrfit.list" object, which is just an R list object in which each entry is a "bkmrfit" object kmbayes
set.seed(111) dat <- bkmr::SimData(n = 50, M = 4) y <- dat$y Z <- dat$Z X <- dat$X set.seed(111) future::plan(strategy = future::multisession, workers=2) # only 50 iterations fit to save installation time fitkm.list <- kmbayes_parallel(nchains=2, y = y, Z = Z, X = X, iter = 50, verbose = FALSE, varsel = TRUE) closeAllConnections()
set.seed(111) dat <- bkmr::SimData(n = 50, M = 4) y <- dat$y Z <- dat$Z X <- dat$X set.seed(111) future::plan(strategy = future::multisession, workers=2) # only 50 iterations fit to save installation time fitkm.list <- kmbayes_parallel(nchains=2, y = y, Z = Z, X = X, iter = 50, verbose = FALSE, varsel = TRUE) closeAllConnections()
Use this when you've used MCMC sampling with the kmbayes_parallel
function, but you did not take enough samples and do not want to start over.
kmbayes_parallel_continue(fitkm.list, ...)
kmbayes_parallel_continue(fitkm.list, ...)
fitkm.list |
output from |
... |
arguments to |
a bkmrfit.list
object, which is just a list of bkmrfit
objects similar to kmbayes_parallel
set.seed(111) dat <- bkmr::SimData(n = 50, M = 4) y <- dat$y Z <- dat$Z X <- dat$X ## Not run: future::plan(strategy = future::multisession, workers=2) fitty1p = kmbayes_parallel(nchains=2, y=y,Z=Z,X=X) fitty2p = kmbayes_parallel_continue(fitty1p, iter=3000) cobj = as.mcmc.list(fitty2p) plot(cobj) ## End(Not run)
set.seed(111) dat <- bkmr::SimData(n = 50, M = 4) y <- dat$y Z <- dat$Z X <- dat$X ## Not run: future::plan(strategy = future::multisession, workers=2) fitty1p = kmbayes_parallel(nchains=2, y=y,Z=Z,X=X) fitty2p = kmbayes_parallel_continue(fitty1p, iter=3000) cobj = as.mcmc.list(fitty2p) plot(cobj) ## End(Not run)
Overall summary by chain
OverallRiskSummaries_parallel(x, ...)
OverallRiskSummaries_parallel(x, ...)
x |
bkmrfit.list object from |
... |
arguments to |
data.frame with all chains together
Provides observation level predictions based on the posterior mean, or, alternatively, yields the posterior standard deviations of predictions for an observation. This function is useful for interfacing with ensemble machine learning packages such as SuperLearner
, which utilize only point estimates.
## S3 method for class 'bkmrfit' predict(object, ptype = c("mean", "sd.fit"), ...)
## S3 method for class 'bkmrfit' predict(object, ptype = c("mean", "sd.fit"), ...)
object |
fitted object of class inheriting from "bkmrfit". |
ptype |
"mean" or "sd.fit", where "mean" yields posterior mean prediction for every observation in the data, and "sd.fit" yields the posterior standard deviation for every observation in the data. |
... |
arguments to |
vector of predictions the same length as the outcome in the bkmrfit object
# following example from https://jenfb.github.io/bkmr/overview.html library(bkmr) set.seed(111) dat <- bkmr::SimData(n = 50, M = 4) y <- dat$y Z <- dat$Z X <- dat$X set.seed(111) fitkm <- kmbayes(y = y, Z = Z, X = X, iter = 200, verbose = FALSE, varsel = TRUE) postmean = predict(fitkm) postmean2 = predict(fitkm, Znew=Z/2) # mean difference in posterior means mean(postmean-postmean2)
# following example from https://jenfb.github.io/bkmr/overview.html library(bkmr) set.seed(111) dat <- bkmr::SimData(n = 50, M = 4) y <- dat$y Z <- dat$Z X <- dat$X set.seed(111) fitkm <- kmbayes(y = y, Z = Z, X = X, iter = 200, verbose = FALSE, varsel = TRUE) postmean = predict(fitkm) postmean2 = predict(fitkm, Znew=Z/2) # mean difference in posterior means mean(postmean-postmean2)
Bivariate predictor response by chain
PredictorResponseBivar_parallel(x, ...)
PredictorResponseBivar_parallel(x, ...)
x |
bkmrfit.list object from |
... |
arguments to |
data.frame with all chains together
Univariate predictor response summary by chain
PredictorResponseUnivar_parallel(x, ...)
PredictorResponseUnivar_parallel(x, ...)
x |
bkmrfit.list object from |
... |
arguments to |
data.frame with all chains together
Posterior samples of E(Y|h(Z),X,beta) by chain
SamplePred_parallel(x, ...)
SamplePred_parallel(x, ...)
x |
bkmrfit.list object from |
... |
arguments to |
data.frame with all chains together
Single variable summary by chain
SingVarRiskSummaries_parallel(x, ...)
SingVarRiskSummaries_parallel(x, ...)
x |
bkmrfit.list object from |
... |
arguments to |
data.frame with all chains together