Title: | Quantile G-Computation |
---|---|
Description: | G-computation for a set of time-fixed exposures with quantile-based basis functions, possibly under linearity and homogeneity assumptions. This approach estimates a regression line corresponding to the expected change in the outcome (on the link basis) given a simultaneous increase in the quantile-based category for all exposures. Works with continuous, binary, and right-censored time-to-event outcomes. Reference: Alexander P. Keil, Jessie P. Buckley, Katie M. OBrien, Kelly K. Ferguson, Shanshan Zhao, and Alexandra J. White (2019) A quantile-based g-computation approach to addressing the effects of exposure mixtures; <doi:10.1289/EHP5838>. |
Authors: | Alexander Keil [aut, cre] |
Maintainer: | Alexander Keil <[email protected]> |
License: | GPL (>= 2) |
Version: | 2.16.5 |
Built: | 2025-01-22 15:26:04 UTC |
Source: | https://github.com/alexpkeil1/qgcomp |
This is an internal function called by qgcomp
,
qgcomp.glm.boot
, and qgcomp.glm.noboot
,
but is documented here for clarity. Generally, users will not need to call
this function directly. This function tries to determine whether there are
non-linear terms in the underlying model, which helps infer whether the
appropriate function is called, and whether more explicit function calls
are needed.
checknames(terms)
checknames(terms)
terms |
model terms from attr(terms(modelfunction, data), "term.labels") |
this is an internal function called by qgcomp.cox.noboot
,
qgcomp.cox.boot
, and qgcomp.cox.noboot
,
but is documented here for clarity. Generally, users will not need to call
this function directly.
coxmsm_fit( f, qdata, intvals, expnms, main = TRUE, degree = 1, id = NULL, weights, cluster = NULL, MCsize = 10000, ... )
coxmsm_fit( f, qdata, intvals, expnms, main = TRUE, degree = 1, id = NULL, weights, cluster = NULL, MCsize = 10000, ... )
f |
an R formula representing the conditional model for the outcome, given all
exposures and covariates. Interaction terms that include exposure variables
should be represented via the |
qdata |
a data frame with quantized exposures (as well as outcome and other covariates) |
intvals |
sequence, the sequence of integer values that the joint exposure is 'set' to for estimating the msm. For quantile g-computation, this is just 0:(q-1), where q is the number of quantiles of exposure. |
expnms |
a character vector with the names of the columns in qdata that represent the exposures of interest (main terms only!) |
main |
logical, internal use: produce estimates of exposure effect (psi) and expected outcomes under g-computation and the MSM |
degree |
polynomial bases for marginal model (e.g. degree = 2 allows that the relationship between the whole exposure mixture and the outcome is quadratic. Default=1) |
id |
(optional) NULL, or variable name indexing individual units of observation (only needed if analyzing data with multiple observations per id/cluster) |
weights |
"case weights" - passed to the "weight" argument of
|
cluster |
not yet implemented |
MCsize |
integer: sample size for simulation to approximate marginal hazards ratios |
... |
arguments to coxph (e.g. ties) |
This function first computes expected outcomes under hypothetical interventions to simultaneously set all exposures to a specific quantile. These predictions are based on g-computation, where the exposures are ‘quantized’, meaning that they take on ordered integer values according to their ranks, and the integer values are determined by the number of quantile cutpoints used. The function then takes these expected outcomes and fits an additional model (a marginal structural model) with the expected outcomes as the outcome and the intervention value of the exposures (the quantile integer) as the exposure. Under causal identification assumptions and correct model specification, the MSM yields a causal exposure-response representing the incremental change in the expected outcome given a joint intervention on all exposures.
qgcomp.cox.boot
, and qgcomp.cox.noboot
set.seed(50) dat <- data.frame(time=(tmg <- pmin(.1,rweibull(50, 10, 0.1))), d=1.0*(tmg<0.1), x1=runif(50), x2=runif(50), z=runif(50)) expnms=paste0("x", 1:2) qdata = quantize(dat, expnms)$data f = survival::Surv(time, d)~x1 + x2 fit <- survival::coxph(f, data = qdata, y=TRUE, x=TRUE) r1 = qdata[1,,drop=FALSE] times = survival::survfit(fit, newdata=r1, se.fit=FALSE)$time (obj <- coxmsm_fit(f, qdata, intvals=c(0,1,2,3), expnms, main=TRUE, degree=1, id=NULL, MCsize=100)) #dat2 <- data.frame(psi=seq(1,4, by=0.1)) #summary(predict(obj)) #summary(predict(obj, newdata=dat2))
set.seed(50) dat <- data.frame(time=(tmg <- pmin(.1,rweibull(50, 10, 0.1))), d=1.0*(tmg<0.1), x1=runif(50), x2=runif(50), z=runif(50)) expnms=paste0("x", 1:2) qdata = quantize(dat, expnms)$data f = survival::Surv(time, d)~x1 + x2 fit <- survival::coxph(f, data = qdata, y=TRUE, x=TRUE) r1 = qdata[1,,drop=FALSE] times = survival::survfit(fit, newdata=r1, se.fit=FALSE)$time (obj <- coxmsm_fit(f, qdata, intvals=c(0,1,2,3), expnms, main=TRUE, degree=1, id=NULL, MCsize=100)) #dat2 <- data.frame(psi=seq(1,4, by=0.1)) #summary(predict(obj)) #summary(predict(obj, newdata=dat2))
Glance accepts a model object and returns a tibble::tibble() with exactly one row of model summaries. The summaries are typically goodness of fit measures, p-values for hypothesis tests on residuals, or model convergence information.
Glance never returns information from the original call to the modeling function. This includes the name of the modeling function or any arguments passed to the modeling function.
Glance does not calculate summary measures. Rather, it farms out these computations
to appropriate methods and gathers the results together. Sometimes a goodness of
fit measure will be undefined. In these cases the measure will be reported as NA.
(Description taken from broom::glance
help file.)
## S3 method for class 'qgcompfit' glance(x, ...)
## S3 method for class 'qgcompfit' glance(x, ...)
x |
a qgcompfit object |
... |
Not used |
Hypothesis testing about joint effect of exposures on a multinomial outcome
homogeneity_test(x, ...)
homogeneity_test(x, ...)
x |
Result from fit. |
... |
Unused |
Tests the null hypothesis that the joint effect of the mixture components is identical across all referent outcome types (homogeneity test for linear effect of the mixture on a quantized basis)
## S3 method for class 'qgcompmultfit' homogeneity_test(x, ...)
## S3 method for class 'qgcompmultfit' homogeneity_test(x, ...)
x |
Result from qgcomp multinomial fit (qgcompmultfit object). |
... |
Unused |
qgcompmulttest object (list) with results of a chi-squared test
this is an internal function called by
qgcomp.hurdle.boot
,
but is documented here for clarity. Generally, users will not need to call
this function directly.
hurdlemsm_fit( f, qdata, intvals, expnms, main = TRUE, degree = 1, id = NULL, weights, MCsize = 10000, containmix = list(count = TRUE, zero = TRUE), bayes = FALSE, x = FALSE, msmcontrol = hurdlemsm_fit.control(), ... )
hurdlemsm_fit( f, qdata, intvals, expnms, main = TRUE, degree = 1, id = NULL, weights, MCsize = 10000, containmix = list(count = TRUE, zero = TRUE), bayes = FALSE, x = FALSE, msmcontrol = hurdlemsm_fit.control(), ... )
f |
an r formula representing the conditional model for the outcome, given all
exposures and covariates. Interaction terms that include exposure variables
should be represented via the |
qdata |
a data frame with quantized exposures (as well as outcome and other covariates) |
intvals |
sequence, the sequence of integer values that the joint exposure is 'set' to for estimating the msm. For quantile g-computation, this is just 0:(q-1), where q is the number of quantiles of exposure. |
expnms |
a character vector with the names of the columns in qdata that represent the exposures of interest (main terms only!) |
main |
logical, internal use: produce estimates of exposure effect (psi) and expected outcomes under g-computation and the MSM |
degree |
polynomial bases for marginal model (e.g. degree = 2 allows that the relationship between the whole exposure mixture and the outcome is quadratic. Default=1) |
id |
(optional) NULL, or variable name indexing individual units of observation (only needed if analyzing data with multiple observations per id/cluster) |
weights |
not yet implemented |
MCsize |
integer: sample size for simulation to approximate marginal hazards ratios |
containmix |
named list of logical scalars with names "count" and "zero" |
bayes |
not used |
x |
keep design matrix? (logical) |
msmcontrol |
named list from |
... |
arguments to hurdle (e.g. dist) |
This function first computes expected outcomes under hypothetical interventions to simultaneously set all exposures to a specific quantile. These predictions are based on g-computation, where the exposures are ‘quantized’, meaning that they take on ordered integer values according to their ranks, and the integer values are determined by the number of quantile cutpoints used. The function then takes these expected outcomes and fits an additional model (a marginal structural model) with the expected outcomes as the outcome and the intervention value of the exposures (the quantile integer) as the exposure. Under causal identification assumptions and correct model specification, the MSM yields a causal exposure-response representing the incremental change in the expected outcome given a joint intervention on all exposures.
qgcomp.cox.boot
, and qgcomp.cox.noboot
set.seed(50) n=100 ## Not run: dat <- data.frame(y=rbinom(n, 1, 0.5)*rpois(n, 1.2), x1=runif(n), x2=runif(n), z=runif(n)) expnms = c("x1", "x2") q = 4 qdata = quantize(dat, q=q, expnms=expnms)$data f = y ~ x1 + x2 + z | 1 msmfit <- hurdlemsm_fit(f, qdata, intvals=(1:q)-1, expnms, main=TRUE, degree=1, id=NULL, MCsize=10000, containmix=list(count=TRUE, zero=FALSE), x=FALSE) msmfit$msmfit ## End(Not run)
set.seed(50) n=100 ## Not run: dat <- data.frame(y=rbinom(n, 1, 0.5)*rpois(n, 1.2), x1=runif(n), x2=runif(n), z=runif(n)) expnms = c("x1", "x2") q = 4 qdata = quantize(dat, q=q, expnms=expnms)$data f = y ~ x1 + x2 + z | 1 msmfit <- hurdlemsm_fit(f, qdata, intvals=(1:q)-1, expnms, main=TRUE, degree=1, id=NULL, MCsize=10000, containmix=list(count=TRUE, zero=FALSE), x=FALSE) msmfit$msmfit ## End(Not run)
this is an internal function called by
qgcomp.hurdle.boot
,
but is documented here for clarity. Generally, users will not need to call
this function directly.
hurdlemsm_fit.control(predmethod = rev(c("components", "catprobs")))
hurdlemsm_fit.control(predmethod = rev(c("components", "catprobs")))
predmethod |
character in c("components", "catprobs"). "components" simulates from the model parameters directly while "catprobs" simulates outcomes from the category specific probabilities, which is output from predict.hurdle. The former is slightly more flexible and stable, but the latter is preferred in zero inflated negative bionomial models. |
Provides fine control over zero inflated MSM fitting
Hypothesis testing about joint effect of exposures on a multinomial outcome
joint_test(x, ...)
joint_test(x, ...)
x |
Result from fit. |
... |
Unused |
Tests the null hypothesis that the joint effect of the mixture components is null across all referent outcome types (Test of global null effect of the mixture on a quantized basis)
## S3 method for class 'qgcompmultfit' joint_test(x, ...)
## S3 method for class 'qgcompmultfit' joint_test(x, ...)
x |
Result from qgcomp multinomial fit (qgcompmultfit object). |
... |
Unused |
qgcompmulttest object (list) with results of a chi-squared test
Simulated well water measurements in North Carolina: 16 metals, 6 water chemistry measures, and 2 health outcomes (y = continuous; disease_state = binary/time-to-event in combination with disease_time)
A dataset containing well water measurements and health outcomes for 253 individuals. All continuous variables are standardized to have mean 0, standard deviation 1.
data(metals)
data(metals)
A data frame with 253 rows and 24 variables:
continuous birth outcome
binary outcome
time-to-disease_state: survival outcome censored at approximately the median
metal
metal
metal
metal
metal
metal
metal
metal
metal
metal
metal
metal
metal
metal
metal
metal
Binary covariate: maternal age > 35
water chemistry measure
water chemistry measure
water chemistry measure
water chemistry measure
water chemistry measure
water chemistry measure
This function integrates with mice
to impute values below the LOD using a left
censored log-normal distribution. Note that "tobit" is an alias that uses a familiar term for this model.
mice.impute.leftcenslognorm( y, ry, x, wy = NULL, lod = NULL, debug = FALSE, ... ) mice.impute.tobit(y, ry, x, wy = NULL, lod = NULL, debug = FALSE, ...)
mice.impute.leftcenslognorm( y, ry, x, wy = NULL, lod = NULL, debug = FALSE, ... ) mice.impute.tobit(y, ry, x, wy = NULL, lod = NULL, debug = FALSE, ...)
y |
Vector to be imputed |
ry |
Logical vector of length length(y) indicating the the subset of elements in y to which the imputation model is fitted. The ry generally distinguishes the observed (TRUE) and missing values (FALSE) in y. |
x |
Numeric design matrix with length(y) rows with predictors for y. Matrix x may have no missing values. |
wy |
Logical vector of length length(y). A TRUE value indicates locations in y for which imputations are created. |
lod |
numeric vector of limits of detection (must correspond to index in original data) OR list in which each element corresponds to observation level limits of detection for each variable (list index must correspond to index in original data) |
debug |
logical, print extra info |
... |
arguments to |
While this function has utility far beyond qgcomp,
it is included in the qgcomp package because it will be useful for a variety of
settings in which qgcomp is useful. Note that LOD problems where the LOD is small,
and the q
parameter from qgcomp.glm.noboot
or
qgcomp.glm.boot
is not large, the LOD may be below the lowest
quantile cutpoint which will yield identical datasets from the MICE procedure in terms
of quantized exposure data. If only exposures are missing, and they have low LODs, then
there will be no benefit in qgcomp from using MICE rather than imputing some small value
below the LOD.
Vector with imputed data, same type as y, and of length sum(wy)
N = 100 set.seed(123) dat <- data.frame(y=runif(N), x1=runif(N), x2=runif(N), z=runif(N)) true = qgcomp.glm.noboot(f=y ~ z + x1 + x2, expnms = c('x1', 'x2'), data=dat, q=2, family=gaussian()) mdat <- dat mdat$x1 = ifelse(mdat$x1>0.5, mdat$x1, NA) mdat$x2 = ifelse(mdat$x2>0.75, mdat$x2, NA) cc <- qgcomp.glm.noboot(f=y ~ z + x1 + x2, expnms = c('x1', 'x2'), data=mdat[complete.cases(mdat),], q=2, family=gaussian()) ## Not run: # note the following example imputes from the wrong parametric model and is expected to be biased # as a result (but it demonstrates how to use qgcomp and mice together) library("mice") library("survival") set.seed(1231) impdat = mice(data = mdat, method = c("", "leftcenslognorm", "leftcenslognorm", ""), lod=c(NA, 0.5, 0.75, NA), debug=FALSE, m=10) qc.fit.imp <- list( call = call("qgcomp.glm.noboot(y~., expnms = c('x1', 'x2'), family=gaussian())"), call1 = impdat$call, nmis = impdat$nmis, analyses = lapply(1:10, function(x) qgcomp.glm.noboot(y~., expnms = c("x1", "x2"), data=complete(impdat, x), family=gaussian(), bayes=TRUE)) ) #alternative way to specify limits of detection (useful if not all observations have same limit) lodlist = list(rep(NA, N), rep(0.5, N), rep(0.75, N), rep(NA, N)) #lodlist = data.frame(rep(NA, N), rep(0.5, N), rep(0.75, N), rep(NA, N)) # also works set.seed(1231) impdat_alt = mice(data = mdat, method = c("", "leftcenslognorm", "leftcenslognorm", ""), lod=lodlist, debug=FALSE, m=10) qc.fit.imp_alt <- list( call = call("qgcomp.glm.noboot(y~., expnms = c('x1', 'x2'), family=gaussian())"), call1 = impdat_alt$call, nmis = impdat_alt$nmis, analyses = lapply(1:10, function(x) qgcomp.glm.noboot(y~., expnms = c("x1", "x2"), data=complete(impdat_alt, x), family=gaussian(), bayes=TRUE)) ) obj = pool(as.mira(qc.fit.imp)) obj_alt = pool(as.mira(qc.fit.imp_alt)) # true values true # complete case analysis cc # MI based analysis (identical answers for different ways to specify limits of detection) summary(obj) summary(obj_alt) # summarizing weights (note that the weights should *not* be pooled # because they mean different things depending on their direction) expnms = c("x1", "x2") wts = as.data.frame(t(sapply(qc.fit.imp$analyses, function(x) c(-x$neg.weights, x$pos.weights)[expnms]))) eachwt = do.call(c, wts) expwts = data.frame(Exposure = rep(expnms, each=nrow(wts)), Weight=eachwt) library(ggplot2) ggplot(data=expwts)+ theme_classic() + geom_point(aes(x=Exposure, y=Weight)) + geom_hline(aes(yintercept=0)) # this function can be used to impute from an intercept only model. # but you need to "trick" mice to bypass checks for collinearity by including # a variable that does not need to have values imputed (here, y). # The internal collinearity checks by the mice package remove collinear variables # and then throws an error if no predictor variabls are retained. Here, the # trick is to use the "predictorMatrix" parameter to "impute" the non-missing # variable y using x1 (which does nothing), and remove all predictors from the model for x1. # This function only imputes x1 from a log normal distribution and thus is similar # to many published examples of imputation from a Tobit model. impdat2 = mice(data = mdat[,c("y","x1")], method = c("", "tobit"), remove.collinear=FALSE, lod=c(NA, 0.5), debug=FALSE, m=1, maxit=1, # maxit=1 because there is only 1 variable to impute predictorMatrix = as.matrix(rbind(c(0,1), c(0,0)))) plot(density(complete(impdat2, 1)$x1)) # now with survival data (very similar) impdat = mice(data = mdat, method = c("", "tobit", "tobit", ""), lod=c(NA, 0.5, 0.75, NA), debug=FALSE) qc.fit.imp <- list( call = call("qgcomp.cox.noboot(Surv(y)~., expnms = c('x1', 'x2'))"), call1 = impdat$call, nmis = impdat$nmis, analyses = lapply(1:5, function(x) qgcomp.cox.noboot(Surv(y)~., expnms = c("x1", "x2"), data=complete(impdat, x))) ) obj = pool(as.mira(qc.fit.imp)) # MI based analysis summary(obj) ## End(Not run)
N = 100 set.seed(123) dat <- data.frame(y=runif(N), x1=runif(N), x2=runif(N), z=runif(N)) true = qgcomp.glm.noboot(f=y ~ z + x1 + x2, expnms = c('x1', 'x2'), data=dat, q=2, family=gaussian()) mdat <- dat mdat$x1 = ifelse(mdat$x1>0.5, mdat$x1, NA) mdat$x2 = ifelse(mdat$x2>0.75, mdat$x2, NA) cc <- qgcomp.glm.noboot(f=y ~ z + x1 + x2, expnms = c('x1', 'x2'), data=mdat[complete.cases(mdat),], q=2, family=gaussian()) ## Not run: # note the following example imputes from the wrong parametric model and is expected to be biased # as a result (but it demonstrates how to use qgcomp and mice together) library("mice") library("survival") set.seed(1231) impdat = mice(data = mdat, method = c("", "leftcenslognorm", "leftcenslognorm", ""), lod=c(NA, 0.5, 0.75, NA), debug=FALSE, m=10) qc.fit.imp <- list( call = call("qgcomp.glm.noboot(y~., expnms = c('x1', 'x2'), family=gaussian())"), call1 = impdat$call, nmis = impdat$nmis, analyses = lapply(1:10, function(x) qgcomp.glm.noboot(y~., expnms = c("x1", "x2"), data=complete(impdat, x), family=gaussian(), bayes=TRUE)) ) #alternative way to specify limits of detection (useful if not all observations have same limit) lodlist = list(rep(NA, N), rep(0.5, N), rep(0.75, N), rep(NA, N)) #lodlist = data.frame(rep(NA, N), rep(0.5, N), rep(0.75, N), rep(NA, N)) # also works set.seed(1231) impdat_alt = mice(data = mdat, method = c("", "leftcenslognorm", "leftcenslognorm", ""), lod=lodlist, debug=FALSE, m=10) qc.fit.imp_alt <- list( call = call("qgcomp.glm.noboot(y~., expnms = c('x1', 'x2'), family=gaussian())"), call1 = impdat_alt$call, nmis = impdat_alt$nmis, analyses = lapply(1:10, function(x) qgcomp.glm.noboot(y~., expnms = c("x1", "x2"), data=complete(impdat_alt, x), family=gaussian(), bayes=TRUE)) ) obj = pool(as.mira(qc.fit.imp)) obj_alt = pool(as.mira(qc.fit.imp_alt)) # true values true # complete case analysis cc # MI based analysis (identical answers for different ways to specify limits of detection) summary(obj) summary(obj_alt) # summarizing weights (note that the weights should *not* be pooled # because they mean different things depending on their direction) expnms = c("x1", "x2") wts = as.data.frame(t(sapply(qc.fit.imp$analyses, function(x) c(-x$neg.weights, x$pos.weights)[expnms]))) eachwt = do.call(c, wts) expwts = data.frame(Exposure = rep(expnms, each=nrow(wts)), Weight=eachwt) library(ggplot2) ggplot(data=expwts)+ theme_classic() + geom_point(aes(x=Exposure, y=Weight)) + geom_hline(aes(yintercept=0)) # this function can be used to impute from an intercept only model. # but you need to "trick" mice to bypass checks for collinearity by including # a variable that does not need to have values imputed (here, y). # The internal collinearity checks by the mice package remove collinear variables # and then throws an error if no predictor variabls are retained. Here, the # trick is to use the "predictorMatrix" parameter to "impute" the non-missing # variable y using x1 (which does nothing), and remove all predictors from the model for x1. # This function only imputes x1 from a log normal distribution and thus is similar # to many published examples of imputation from a Tobit model. impdat2 = mice(data = mdat[,c("y","x1")], method = c("", "tobit"), remove.collinear=FALSE, lod=c(NA, 0.5), debug=FALSE, m=1, maxit=1, # maxit=1 because there is only 1 variable to impute predictorMatrix = as.matrix(rbind(c(0,1), c(0,0)))) plot(density(complete(impdat2, 1)$x1)) # now with survival data (very similar) impdat = mice(data = mdat, method = c("", "tobit", "tobit", ""), lod=c(NA, 0.5, 0.75, NA), debug=FALSE) qc.fit.imp <- list( call = call("qgcomp.cox.noboot(Surv(y)~., expnms = c('x1', 'x2'))"), call1 = impdat$call, nmis = impdat$nmis, analyses = lapply(1:5, function(x) qgcomp.cox.noboot(Surv(y)~., expnms = c("x1", "x2"), data=complete(impdat, x))) ) obj = pool(as.mira(qc.fit.imp)) # MI based analysis summary(obj) ## End(Not run)
Calculates: expected outcome (on the link scale), and upper and lower confidence intervals (both pointwise and simultaneous)
modelbound.boot(x, alpha = 0.05, pwonly = FALSE)
modelbound.boot(x, alpha = 0.05, pwonly = FALSE)
x |
"qgcompfit" object from |
alpha |
alpha level for confidence intervals |
pwonly |
logical: return only pointwise estimates (suppress simultaneous estimates) |
This method leverages the bootstrap distribution of qgcomp model coefficients to estimate pointwise regression line confidence bounds. These are defined as the bounds that, for each value of the independent variable X (here, X is the joint exposure quantiles) the 95% bounds (for example) for the model estimate of the regression line E(Y|X) are expected to include the true value of E(Y|X) in 95% of studies. The "simultaneous" bounds are also calculated, and the 95% simultaneous bounds contain the true value of E(Y|X) for all values of X in 95% of studies. The latter are more conservative and account for the multiple testing implied by the former. Pointwise bounds are calculated via the standard error for the estimates of E(Y|X), while the simultaneous bounds are estimated using the bootstrap method of Cheng (reference below). All bounds are large sample bounds that assume normality and thus will be underconservative in small samples. These bounds may also inclue illogical values (e.g. values less than 0 for a dichotomous outcome) and should be interpreted cautiously in small samples.
Reference:
Cheng, Russell CH. "Bootstrapping simultaneous confidence bands." Proceedings of the Winter Simulation Conference, 2005.. IEEE, 2005.
A data frame containing
The linear predictor from the marginal structural model
The canonical measure (risk/odds/mean) for the marginal structural model link
the stndard error of linpred
Confidence bounds for the effect measure, and bounds centered at the canonical measure (for plotting purposes)
The confidence bounds are either "pointwise" (pw) and "simultaneous" (simul) confidence intervals at each each quantized value of all exposures.
set.seed(12) ## Not run: dat <- data.frame(x1=(x1 <- runif(50)), x2=runif(50), x3=runif(50), z=runif(50), y=runif(50)+x1+x1^2) ft <- qgcomp.glm.boot(y ~ z + x1 + x2 + x3, expnms=c('x1','x2','x3'), data=dat, q=5) modelbound.boot(ft, 0.05) ## End(Not run)
set.seed(12) ## Not run: dat <- data.frame(x1=(x1 <- runif(50)), x2=runif(50), x3=runif(50), z=runif(50), y=runif(50)+x1+x1^2) ft <- qgcomp.glm.boot(y ~ z + x1 + x2 + x3, expnms=c('x1','x2','x3'), data=dat, q=5) modelbound.boot(ft, 0.05) ## End(Not run)
This is an internal function called by qgcomp
,
qgcomp.glm.boot
, and qgcomp.glm.noboot
,
but is documented here for clarity. Generally, users will not need to call
this function directly.
msm_fit( f, qdata, intvals, expnms, rr = TRUE, main = TRUE, degree = 1, id = NULL, weights, bayes = FALSE, MCsize = nrow(qdata), hasintercept = TRUE, ... )
msm_fit( f, qdata, intvals, expnms, rr = TRUE, main = TRUE, degree = 1, id = NULL, weights, bayes = FALSE, MCsize = nrow(qdata), hasintercept = TRUE, ... )
f |
an r formula representing the conditional model for the outcome, given all
exposures and covariates. Interaction terms that include exposure variables
should be represented via the |
qdata |
a data frame with quantized exposures |
intvals |
sequence, the sequence of integer values that the joint exposure is 'set' to for estimating the msm. For quantile g-computation, this is just 0:(q-1), where q is the number of quantiles of exposure. |
expnms |
a character vector with the names of the columns in qdata that represent the exposures of interest (main terms only!) |
rr |
logical, estimate log(risk ratio) (family='binomial' only) |
main |
logical, internal use: produce estimates of exposure effect (psi) and expected outcomes under g-computation and the MSM |
degree |
polynomial bases for marginal model (e.g. degree = 2 allows that the relationship between the whole exposure mixture and the outcome is quadratic. Default=1) |
id |
(optional) NULL, or variable name indexing individual units of observation (only needed if analyzing data with multiple observations per id/cluster) |
weights |
"case weights" - passed to the "weight" argument of
|
bayes |
use underlying Bayesian model ( |
MCsize |
integer: sample size for simulation to approximate marginal zero inflated model parameters. This can be left small for testing, but should be as large as needed to reduce simulation error to an acceptable magnitude (can compare psi coefficients for linear fits with qgcomp.zi.noboot to gain some intuition for the level of expected simulation error at a given value of MCsize) |
hasintercept |
(logical) does the model have an intercept? |
... |
arguments to glm (e.g. family) |
This function first computes expected outcomes under hypothetical interventions to simultaneously set all exposures to a specific quantile. These predictions are based on g-computation, where the exposures are ‘quantized’, meaning that they take on ordered integer values according to their ranks, and the integer values are determined by the number of quantile cutpoints used. The function then takes these expected outcomes and fits an additional model (a marginal structural model) with the expected outcomes as the outcome and the intervention value of the exposures (the quantile integer) as the exposure. Under causal identification assumptions and correct model specification, the MSM yields a causal exposure-response representing the incremental change in the expected outcome given a joint intervention on all exposures.
qgcomp.glm.boot
, and qgcomp
set.seed(50) dat <- data.frame(y=runif(200), x1=runif(200), x2=runif(200), z=runif(200)) X <- c('x1', 'x2') qdat <- quantize(dat, X, q=4)$data mod <- msm_fit(f=y ~ z + x1 + x2 + I(x1*x2), expnms = c('x1', 'x2'), qdata=qdat, intvals=1:4, bayes=FALSE) summary(mod$fit) # outcome regression model summary(mod$msmfit) # msm fit (variance not valid - must be obtained via bootstrap)
set.seed(50) dat <- data.frame(y=runif(200), x1=runif(200), x2=runif(200), z=runif(200)) X <- c('x1', 'x2') qdat <- quantize(dat, X, q=4)$data mod <- msm_fit(f=y ~ z + x1 + x2 + I(x1*x2), expnms = c('x1', 'x2'), qdata=qdat, intvals=1:4, bayes=FALSE) summary(mod$fit) # outcome regression model summary(mod$msmfit) # msm fit (variance not valid - must be obtained via bootstrap)
This is an internal function called by qgcomp.multinomial.boot
,
but is documented here for clarity. Generally, users will not need to call
this function directly.
msm_multinomial_fit( f, qdata, intvals, expnms, main = TRUE, degree = 1, id = NULL, weights, bayes = FALSE, MCsize = nrow(qdata), hasintercept = TRUE, ... )
msm_multinomial_fit( f, qdata, intvals, expnms, main = TRUE, degree = 1, id = NULL, weights, bayes = FALSE, MCsize = nrow(qdata), hasintercept = TRUE, ... )
f |
an r formula representing the conditional model for the outcome, given all
exposures and covariates. Interaction terms that include exposure variables
should be represented via the |
qdata |
a data frame with quantized exposures |
intvals |
sequence, the sequence of integer values that the joint exposure is 'set' to for estimating the msm. For quantile g-computation, this is just 0:(q-1), where q is the number of quantiles of exposure. |
expnms |
a character vector with the names of the columns in qdata that represent the exposures of interest (main terms only!) |
main |
logical, internal use: produce estimates of exposure effect (psi) and expected outcomes under g-computation and the MSM |
degree |
polynomial bases for marginal model (e.g. degree = 2 allows that the relationship between the whole exposure mixture and the outcome is quadratic. Default=1) |
id |
(optional) NULL, or variable name indexing individual units of observation (only needed if analyzing data with multiple observations per id/cluster) |
weights |
"case weights" - passed to the "weight" argument of
|
bayes |
use underlying Bayesian model ( |
MCsize |
integer: sample size for simulation to approximate marginal zero inflated model parameters. This can be left small for testing, but should be as large as needed to reduce simulation error to an acceptable magnitude (can compare psi coefficients for linear fits with qgcomp.zi.noboot to gain some intuition for the level of expected simulation error at a given value of MCsize) |
hasintercept |
(logical) does the model have an intercept? |
... |
arguments to nnet::multinom |
This function first computes expected outcomes under hypothetical interventions to simultaneously set all exposures to a specific quantile. These predictions are based on g-computation, where the exposures are ‘quantized’, meaning that they take on ordered integer values according to their ranks, and the integer values are determined by the number of quantile cutpoints used. The function then takes these expected outcomes and fits an additional model (a marginal structural model) with the expected outcomes as the outcome and the intervention value of the exposures (the quantile integer) as the exposure. Under causal identification assumptions and correct model specification, the MSM yields a causal exposure-response representing the incremental change in the expected outcome given a joint intervention on all exposures.
qgcomp.glm.boot
, and qgcomp
data("metals") # from qgcomp package # create categorical outcome from the existing continuous outcome (usually, one will already exist) metals$ycat = factor(quantize(metals, "y",q=4)$data$y, levels=c("0", "1", "2", "3"), labels=c("cct", "ccg", "aat", "aag")) # restrict to smaller dataset for simplicity smallmetals = metals[,c("ycat", "arsenic", "lead", "cadmium", "mage35")] ### 1: Define mixture and underlying model #### mixture = c("arsenic", "lead", "cadmium") f0 = ycat ~ arsenic + lead + cadmium # the multinomial model # (be sure that factor variables are properly coded ahead of time in the dataset) qdat <- quantize(smallmetals, mixture, q=4)$data mod <- msm_multinomial_fit(f0, expnms = mixture, qdata=qdat, intvals=1:4, bayes=FALSE) summary(mod$fit) # outcome regression model summary(mod$msmfit) # msm fit (variance not valid - must be obtained via bootstrap)
data("metals") # from qgcomp package # create categorical outcome from the existing continuous outcome (usually, one will already exist) metals$ycat = factor(quantize(metals, "y",q=4)$data$y, levels=c("0", "1", "2", "3"), labels=c("cct", "ccg", "aat", "aag")) # restrict to smaller dataset for simplicity smallmetals = metals[,c("ycat", "arsenic", "lead", "cadmium", "mage35")] ### 1: Define mixture and underlying model #### mixture = c("arsenic", "lead", "cadmium") f0 = ycat ~ arsenic + lead + cadmium # the multinomial model # (be sure that factor variables are properly coded ahead of time in the dataset) qdat <- quantize(smallmetals, mixture, q=4)$data mod <- msm_multinomial_fit(f0, expnms = mixture, qdata=qdat, intvals=1:4, bayes=FALSE) summary(mod$fit) # outcome regression model summary(mod$msmfit) # msm fit (variance not valid - must be obtained via bootstrap)
this is an internal function called by
qgcomp.glm.boot
,
but is documented here for clarity. Generally, users will not need to call
this function directly.
Get predicted values from a qgcompfit object from
qgcomp.glm.boot
.
msm.predict(object, newdata = NULL)
msm.predict(object, newdata = NULL)
object |
"qgcompfit" object from |
newdata |
(optional) new set of data (data frame) with a variable
called |
(Not usually called by user) Makes predictions from the MSM
(rather than the conditional g-computation
fit) from a "qgcompfit" object. Generally, this should not be used in
favor of the default predict.qgcompfit
function. This
function can only be used following the qgcomp.glm.boot
function. For the
qgcomp.glm.noboot
function, predict.qgcompfit
gives
identical inference to predicting from an MSM.
set.seed(50) dat <- data.frame(y=runif(50), x1=runif(50), x2=runif(50), z=runif(50)) obj <- qgcomp.glm.boot(y ~ z + x1 + x2 + I(z*x1), expnms = c('x1', 'x2'), data=dat, q=4, B=10, seed=125) dat2 <- data.frame(psi=seq(1,4, by=0.1)) summary(msm.predict(obj)) summary(msm.predict(obj, newdata=dat2))
set.seed(50) dat <- data.frame(y=runif(50), x1=runif(50), x2=runif(50), z=runif(50)) obj <- qgcomp.glm.boot(y ~ z + x1 + x2 + I(z*x1), expnms = c('x1', 'x2'), data=dat, q=4, B=10, seed=125) dat2 <- data.frame(psi=seq(1,4, by=0.1)) summary(msm.predict(obj)) summary(msm.predict(obj, newdata=dat2))
Plot a quantile g-computation object. For qgcomp.glm.noboot, this function will create a butterfly plot of weights. For qgcomp.glm.boot, this function will create a box plot with smoothed line overlaying that represents a non-parametric fit of a model to the expected outcomes in the population at each quantile of the joint exposures (e.g. '1' represents 'at the first quantile for every exposure')
## S3 method for class 'qgcompfit' plot( x, suppressprint = FALSE, pointwisebars = TRUE, modelfitline = TRUE, modelband = TRUE, flexfit = TRUE, pointwiseref = ceiling(x$q/2), ... ) ## S3 method for class 'qgcompmultfit' plot( x, suppressprint = FALSE, pointwisebars = TRUE, modelfitline = TRUE, modelband = TRUE, flexfit = TRUE, pointwiseref = ceiling(x$q/2), ... )
## S3 method for class 'qgcompfit' plot( x, suppressprint = FALSE, pointwisebars = TRUE, modelfitline = TRUE, modelband = TRUE, flexfit = TRUE, pointwiseref = ceiling(x$q/2), ... ) ## S3 method for class 'qgcompmultfit' plot( x, suppressprint = FALSE, pointwisebars = TRUE, modelfitline = TRUE, modelband = TRUE, flexfit = TRUE, pointwiseref = ceiling(x$q/2), ... )
x |
"qgcompfit" object from |
suppressprint |
If TRUE, suppresses the plot, rather than printing it by default (it can be saved as a ggplot2 object (or list of ggplot2 objects if x is from a zero- inflated model) and used programmatically) (default = FALSE) |
pointwisebars |
(boot.gcomp only) If TRUE, adds 95% error bars for pointwise comparisons of E(Y|joint exposure) to the smooth regression line plot |
modelfitline |
(boot.gcomp only) If TRUE, adds fitted (MSM) regression line of E(Y|joint exposure) to the smooth regression line plot |
modelband |
If TRUE, adds 95% prediction bands for E(Y|joint exposure) (the MSM fit) |
flexfit |
(boot.gcomp only) if TRUE, adds flexible interpolation of predictions from underlying (conditional) model |
pointwiseref |
(boot.gcomp only) integer: which category of exposure (from 1 to q) should serve as the referent category for pointwise comparisons? (default=1) |
... |
unused |
plot(qgcompmultfit)
: Plot method for qgcomp multinomial fits
qgcomp.glm.noboot
, qgcomp.glm.boot
, and qgcomp
set.seed(12) dat <- data.frame(x1=(x1 <- runif(100)), x2=runif(100), x3=runif(100), z=runif(100), y=runif(100)+x1+x1^2) ft <- qgcomp.glm.noboot(y ~ z + x1 + x2 + x3, expnms=c('x1','x2','x3'), data=dat, q=4) ft # display weights plot(ft) # examining fit plot(ft$fit, which=1) # residual vs. fitted is not straight line! ## Not run: # using non-linear outcome model ft2 <- qgcomp.glm.boot(y ~ z + x1 + x2 + x3 + I(x1*x1), expnms=c('x1','x2','x3'), data=dat, q=4, B=10) ft2 plot(ft2$fit, which=1) # much better looking fit diagnostics suggests # it is better to include interaction term for x plot(ft2) # the msm predictions don't match up with a smooth estimate # of the expected outcome, so we should consider a non-linear MSM # using non-linear marginal structural model ft3 <- qgcomp.glm.boot(y ~ z + x1 + x2 + x3 + I(x1*x1), expnms=c('x1','x2','x3'), data=dat, q=4, B=10, degree=2) # plot(ft3$fit, which=1) - not run - this is identical to ft2 fit plot(ft3) # the MSM estimates look much closer to the smoothed estimates # suggesting the non-linear MSM fits the data better and should be used # for inference about the effect of the exposure # binary outcomes, logistic model with or without a log-binomial marginal structural model dat <- data.frame(y=rbinom(100,1,0.5), x1=runif(100), x2=runif(100), z=runif(100)) fit1 <- qgcomp.glm.boot(y ~ z + x1 + x2, family="binomial", expnms = c('x1', 'x2'), data=dat, q=9, B=100, rr=FALSE) fit2 <- qgcomp.glm.boot(y ~ z + x1 + x2, family="binomial", expnms = c('x1', 'x2'), data=dat, q=9, B=100, rr=TRUE) plot(fit1) plot(fit2) # Using survival data () set.seed(50) N=200 dat <- data.frame(time=(tmg <- pmin(.1,rweibull(N, 10, 0.1))), d=1.0*(tmg<0.1), x1=runif(N), x2=runif(N), z=runif(N)) expnms=paste0("x", 1:2) f = survival::Surv(time, d)~x1 + x2 (fit1 <- survival::coxph(f, data = dat)) # non-bootstrap method to get a plot of weights (obj <- qgcomp.cox.noboot(f, expnms = expnms, data = dat)) plot(obj) # bootstrap method to get a survival curve # this plots the expected survival curve for the underlying (conditional) model # as well as the expected survival curve for the MSM under the following scenarios: # 1) highest joint exposure category # 2) lowest joint exposure category # 3) average across all exposure categories # differences between the MSM and conditional fit suggest that the MSM is not flexible # enough to accomodate non-linearities in the underlying fit (or they may simply signal that # MCSize should be higher). Note that if linearity # is assumed in the conditional model, the MSM will typically also appear linear and # will certainly appear linear if no non-exposure covariates are included in the model # not run (slow when using boot version to proper precision) (obj2 <- qgcomp.cox.boot(f, expnms = expnms, data = dat, B=10, MCsize=2000)) plot(obj2) ## End(Not run)
set.seed(12) dat <- data.frame(x1=(x1 <- runif(100)), x2=runif(100), x3=runif(100), z=runif(100), y=runif(100)+x1+x1^2) ft <- qgcomp.glm.noboot(y ~ z + x1 + x2 + x3, expnms=c('x1','x2','x3'), data=dat, q=4) ft # display weights plot(ft) # examining fit plot(ft$fit, which=1) # residual vs. fitted is not straight line! ## Not run: # using non-linear outcome model ft2 <- qgcomp.glm.boot(y ~ z + x1 + x2 + x3 + I(x1*x1), expnms=c('x1','x2','x3'), data=dat, q=4, B=10) ft2 plot(ft2$fit, which=1) # much better looking fit diagnostics suggests # it is better to include interaction term for x plot(ft2) # the msm predictions don't match up with a smooth estimate # of the expected outcome, so we should consider a non-linear MSM # using non-linear marginal structural model ft3 <- qgcomp.glm.boot(y ~ z + x1 + x2 + x3 + I(x1*x1), expnms=c('x1','x2','x3'), data=dat, q=4, B=10, degree=2) # plot(ft3$fit, which=1) - not run - this is identical to ft2 fit plot(ft3) # the MSM estimates look much closer to the smoothed estimates # suggesting the non-linear MSM fits the data better and should be used # for inference about the effect of the exposure # binary outcomes, logistic model with or without a log-binomial marginal structural model dat <- data.frame(y=rbinom(100,1,0.5), x1=runif(100), x2=runif(100), z=runif(100)) fit1 <- qgcomp.glm.boot(y ~ z + x1 + x2, family="binomial", expnms = c('x1', 'x2'), data=dat, q=9, B=100, rr=FALSE) fit2 <- qgcomp.glm.boot(y ~ z + x1 + x2, family="binomial", expnms = c('x1', 'x2'), data=dat, q=9, B=100, rr=TRUE) plot(fit1) plot(fit2) # Using survival data () set.seed(50) N=200 dat <- data.frame(time=(tmg <- pmin(.1,rweibull(N, 10, 0.1))), d=1.0*(tmg<0.1), x1=runif(N), x2=runif(N), z=runif(N)) expnms=paste0("x", 1:2) f = survival::Surv(time, d)~x1 + x2 (fit1 <- survival::coxph(f, data = dat)) # non-bootstrap method to get a plot of weights (obj <- qgcomp.cox.noboot(f, expnms = expnms, data = dat)) plot(obj) # bootstrap method to get a survival curve # this plots the expected survival curve for the underlying (conditional) model # as well as the expected survival curve for the MSM under the following scenarios: # 1) highest joint exposure category # 2) lowest joint exposure category # 3) average across all exposure categories # differences between the MSM and conditional fit suggest that the MSM is not flexible # enough to accomodate non-linearities in the underlying fit (or they may simply signal that # MCSize should be higher). Note that if linearity # is assumed in the conditional model, the MSM will typically also appear linear and # will certainly appear linear if no non-exposure covariates are included in the model # not run (slow when using boot version to proper precision) (obj2 <- qgcomp.cox.boot(f, expnms = expnms, data = dat, B=10, MCsize=2000)) plot(obj2) ## End(Not run)
Calculates: expected outcome (on the link scale), mean difference (link scale) and the standard error of the mean difference (link scale) for pointwise comparisons
pointwisebound.boot(x, pointwiseref = 1, alpha = 0.05)
pointwisebound.boot(x, pointwiseref = 1, alpha = 0.05)
x |
"qgcompfit" object from |
pointwiseref |
referent quantile (e.g. 1 uses the lowest joint-exposure category as the referent category for calculating all mean differences/standard deviations) |
alpha |
alpha level for confidence intervals |
The comparison of interest following a qgcomp fit is often comparisons of model predictions at various values of the joint-exposures (e.g. expected outcome at all exposures at the 1st quartile vs. the 3rd quartile). The expected outcome at a given joint exposure, and marginalized over non-exposure covariates (W), is given as E(Y^s|S) = sum_w E_w(Y|S,W)Pr(W) = sum_i E(Y_i|S) where Pr(W) is the emprical distribution of W and S takes on integer values 0 to q-1. Thus, comparisons are of the type E_w(Y|S=s) - E_w(Y|S=s2) where s and s2 are two different values of the joint exposures (e.g. 0 and 2). This function yields E_w(Y|S) as well as E_w(Y|S=s) - E_w(Y|S=p) where s is any value of S and p is the value chosen via "pointwise ref" - e.g. for binomial variables this will equal the risk/ prevalence difference at all values of S, with the referent category S=p-1. The standard error of E(Y|S=s) - E(Y|S=p) is calculated from the bootstrap covariance matrix of E_w(Y|S), such that the standard error for E_w(Y|S=s) - E_w(Y|S=p) is given by
Var(E_w(Y|S=s)) + - Var(E_w(Y|S=p)) - 2*Cov(E_w(Y|S=p), - E_w(Y|S=s))
This is used to create pointwise confidence intervals. Note that this differs slightly from the
pointwisebound.noboot
function, which estimates the variance of the conditional
regression line given by E(Y|S,W=w), where w is a vector of medians of W (i.e. predictions
are made at the median value of all covariates).
A data frame containing
The linear predictor from the marginal structural model
The canonical effect measure (risk ratio/odds ratio/mean difference) for the marginal structural model link
the stndard error of the effect measure
Confidence bounds for the effect measure, and bounds centered at the linear predictor (for plotting purposes)
qgcomp.glm.boot
, pointwisebound.noboot
set.seed(12) ## Not run: n=100 # non-linear model for continuous outcome dat <- data.frame(x1=(x1 <- runif(100)), x2=runif(100), x3=runif(100), z=runif(100), y=runif(100)+x1+x1^2) ft <- qgcomp.glm.boot(y ~ z + x1 + x2 + x3, expnms=c('x1','x2','x3'), data=dat, q=10) pointwisebound.boot(ft, alpha=0.05, pointwiseref=3) ## End(Not run)
set.seed(12) ## Not run: n=100 # non-linear model for continuous outcome dat <- data.frame(x1=(x1 <- runif(100)), x2=runif(100), x3=runif(100), z=runif(100), y=runif(100)+x1+x1^2) ft <- qgcomp.glm.boot(y ~ z + x1 + x2 + x3, expnms=c('x1','x2','x3'), data=dat, q=10) pointwisebound.boot(ft, alpha=0.05, pointwiseref=3) ## End(Not run)
Calculates: expected outcome (on the link scale), mean difference (link scale) and the standard error of the mean difference (link scale) for pointwise comparisons
pointwisebound.noboot(x, alpha = 0.05, pointwiseref = 1)
pointwisebound.noboot(x, alpha = 0.05, pointwiseref = 1)
x |
"qgcompfit" object from |
alpha |
alpha level for confidence intervals |
pointwiseref |
referent quantile (e.g. 1 uses the lowest joint-exposure category as the referent category for calculating all mean differences/standard deviations) |
The comparison of interest following a qgcomp fit is often comparisons of model predictions at various values of the joint-exposures (e.g. expected outcome at all exposures at the 1st quartile vs. the 3rd quartile). The expected outcome at a given joint exposure and at a given level of non-exposure covariates (W=w) is given as E(Y|S,W=w), where S takes on integer values 0 to q-1. Thus, comparisons are of the type E(Y|S=s,W=w) - E(Y|S=s2,W=w) where s and s2 are two different values of the joint exposures (e.g. 0 and 2). This function yields E(Y|S,W=w) as well as E(Y|S=s,W=w) - E(Y|S=p,W=w) where s is any value of S and p is the value chosen via "pointwise ref" - e.g. for binomial variables this will equal the risk/ prevalence difference at all values of S, with the referent category S=p-1. For the non-boostrapped version of quantile g-computation (under a linear model). Note that w is taken to be the referent level of covariates so that if meaningful values of E(Y|S,W=w) and E(Y|S=s,W=w) - E(Y|S=p,W=w) are desired, then it is advisable to set the referent levels of W to meaningful values. This can be done by, e.g. centering continuous age so that the predictions are made at the population mean age, rather than age 0.
Note that function only works with standard "qgcompfit" objects from qgcomp.glm.noboot
or qgcomp.glm.ee
(so it doesn't work
with zero inflated, hurdle, or Cox models)
Variance for the overall effect estimate is given by:
Where the "gradient vector" G is given by
, and
denotes
the partial derivative/gradient. The vector G takes on values that equal the difference in quantiles of
S for each pointwise comparison (e.g. for a comparison of the 3rd vs the 5th category,
G is a vector of 2s)
This variance is used to create pointwise confidence intervals via a normal approximation: (e.g. upper 95% CI = psi + variance*1.96)
A data frame containing
The "partial" linear predictor , or the effect of the mixture + intercept after
conditioning out any confounders. This is similar to the h(x) function in bkmr. This is not a full prediction of the outcome, but
only the partial outcome due to the intercept and the confounders
The canonical effect measure (risk ratio/odds ratio/mean difference) for the marginal structural model link
the stndard error of the effect measure
Confidence bounds for the effect measure
qgcomp.glm.noboot
, pointwisebound.boot
set.seed(12) ## Not run: n = 100 dat <- data.frame(x1=(x1 <- runif(n)), x2=(x2 <- runif(n)), x3=(x3 <- runif(n)), z=(z <- runif(n)), y=rnorm(n)+x1 + x2 - x3 +z) # linear model for continuous outcome ft <- qgcomp.glm.noboot(y ~ z + x1 + x2 + x3, expnms=c('x1','x2','x3'), data=dat, q=10) ft2 <- qgcomp.glm.boot(y ~ z + x1 + x2 + x3, expnms=c('x1','x2','x3'), data=dat, q=10) pointwisebound.noboot(ft, alpha=0.05, pointwiseref=3) pointwisebound.boot(ft2, alpha=0.05, pointwiseref=3) dat <- data.frame(x1=(x1 <- runif(n)), x2=(x2 <- runif(n)), x3=(x3 <- runif(n)), z=(z <- runif(n)), y=rbinom(n, 1, 1/(1+exp(-(x1 + x2 - x3 +z))))) # glms for binary outcome, centering covariate to a potentially more meaningful value dat$zcen = dat$z - mean(dat$z) ft <- qgcomp.glm.noboot(y ~ zcen + x1 + x2 + x3, expnms=c('x1','x2','x3'), data=dat, q=10, family=binomial()) ft2 <- qgcomp.glm.boot(y ~ zcen + x1 + x2 + x3, expnms=c('x1','x2','x3'), data=dat, q=10, family=binomial()) pointwisebound.noboot(ft, alpha=0.05, pointwiseref=3) pointwisebound.boot(ft2, alpha=0.05, pointwiseref=3) dat$z = as.factor(sample(1:3, n, replace=TRUE)) ftf <- qgcomp.glm.noboot(y ~ zcen + x1 + x2 + x3, expnms=c('x1','x2','x3'), data=dat, q=10, family=binomial()) pointwisebound.noboot(ftf, alpha=0.05, pointwiseref=3) ## End(Not run)
set.seed(12) ## Not run: n = 100 dat <- data.frame(x1=(x1 <- runif(n)), x2=(x2 <- runif(n)), x3=(x3 <- runif(n)), z=(z <- runif(n)), y=rnorm(n)+x1 + x2 - x3 +z) # linear model for continuous outcome ft <- qgcomp.glm.noboot(y ~ z + x1 + x2 + x3, expnms=c('x1','x2','x3'), data=dat, q=10) ft2 <- qgcomp.glm.boot(y ~ z + x1 + x2 + x3, expnms=c('x1','x2','x3'), data=dat, q=10) pointwisebound.noboot(ft, alpha=0.05, pointwiseref=3) pointwisebound.boot(ft2, alpha=0.05, pointwiseref=3) dat <- data.frame(x1=(x1 <- runif(n)), x2=(x2 <- runif(n)), x3=(x3 <- runif(n)), z=(z <- runif(n)), y=rbinom(n, 1, 1/(1+exp(-(x1 + x2 - x3 +z))))) # glms for binary outcome, centering covariate to a potentially more meaningful value dat$zcen = dat$z - mean(dat$z) ft <- qgcomp.glm.noboot(y ~ zcen + x1 + x2 + x3, expnms=c('x1','x2','x3'), data=dat, q=10, family=binomial()) ft2 <- qgcomp.glm.boot(y ~ zcen + x1 + x2 + x3, expnms=c('x1','x2','x3'), data=dat, q=10, family=binomial()) pointwisebound.noboot(ft, alpha=0.05, pointwiseref=3) pointwisebound.boot(ft2, alpha=0.05, pointwiseref=3) dat$z = as.factor(sample(1:3, n, replace=TRUE)) ftf <- qgcomp.glm.noboot(y ~ zcen + x1 + x2 + x3, expnms=c('x1','x2','x3'), data=dat, q=10, family=binomial()) pointwisebound.noboot(ftf, alpha=0.05, pointwiseref=3) ## End(Not run)
get predicted values from a qgcompfit object, or make predictions
in a new set of data based on the qgcompfit object. Note that when making predictions
from an object from qgcomp.glm.boot, the predictions are made from the (conditional) g-computation
model rather than the marginal structural model. Predictions from the marginal
structural model can be obtained via msm.predict
. Note
that this function accepts non-quantized exposures in "newdata" and automatically
quantizes them according to the quantile cutpoints in the original fit.
## S3 method for class 'qgcompfit' predict(object, expnms = NULL, newdata = NULL, type = "response", ...)
## S3 method for class 'qgcompfit' predict(object, expnms = NULL, newdata = NULL, type = "response", ...)
object |
"qgcompfit" object from |
expnms |
character vector of exposures of interest |
newdata |
(optional) new set of data with all predictors from "qgcompfit" object |
type |
(from predict.glm) the type of prediction required. The default is on the scale of the linear predictors; the alternative "response" is on the scale of the response variable. Thus for a default binomial model the default predictions are of log-odds (probabilities on logit scale) and type = "response" gives the predicted probabilities. The "terms" option returns a matrix giving the fitted values of each term in the model formula on the linear predictor scale. |
... |
arguments to predict.glm |
set.seed(50) dat <- data.frame(y=runif(50), x1=runif(50), x2=runif(50), z=runif(50)) obj1 <- qgcomp.glm.noboot(y ~ z + x1 + x2, expnms = c('x1', 'x2'), data=dat, q=2) obj2 <- qgcomp.glm.boot(y ~ z + x1 + x2, expnms = c('x1', 'x2'), data=dat, q=2, B=10, seed=125) set.seed(52) dat2 <- data.frame(y=runif(50), x1=runif(50), x2=runif(50), z=runif(50)) summary(predict(obj1, expnms = c('x1', 'x2'), newdata=dat2)) summary(predict(obj2, expnms = c('x1', 'x2'), newdata=dat2))
set.seed(50) dat <- data.frame(y=runif(50), x1=runif(50), x2=runif(50), z=runif(50)) obj1 <- qgcomp.glm.noboot(y ~ z + x1 + x2, expnms = c('x1', 'x2'), data=dat, q=2) obj2 <- qgcomp.glm.boot(y ~ z + x1 + x2, expnms = c('x1', 'x2'), data=dat, q=2, B=10, seed=125) set.seed(52) dat2 <- data.frame(y=runif(50), x1=runif(50), x2=runif(50), z=runif(50)) summary(predict(obj1, expnms = c('x1', 'x2'), newdata=dat2)) summary(predict(obj2, expnms = c('x1', 'x2'), newdata=dat2))
Gives variable output depending on whether qgcomp.glm.noboot
or qgcomp.glm.boot
is called. For qgcomp.glm.noboot
will output final estimate of joint exposure
effect (similar to the 'index' effect in weighted quantile sums), as well
as estimates of the 'weights' (standardized coefficients). For qgcomp.glm.boot
,
the marginal effect is given, but no weights are reported since this approach
generally incorporates non-linear models with interaction terms among exposures,
which preclude weights with any useful interpretation.
## S3 method for class 'qgcompfit' print(x, showweights = TRUE, ...)
## S3 method for class 'qgcompfit' print(x, showweights = TRUE, ...)
x |
"qgcompfit" object from |
showweights |
logical: should weights be printed, if estimated? |
... |
unused |
qgcomp.glm.noboot
, qgcomp.glm.boot
, and qgcomp
set.seed(50) dat <- data.frame(y=runif(50), x1=runif(50), x2=runif(50), z=runif(50)) obj1 <- qgcomp.glm.noboot(y ~ z + x1 + x2, expnms = c('x1', 'x2'), data=dat, q=2) obj2 <- qgcomp.glm.boot(y ~ z + x1 + x2, expnms = c('x1', 'x2'), data=dat, q=2, B=10, seed=125) # does not need to be explicitly called, but included here for clarity print(obj1) print(obj2)
set.seed(50) dat <- data.frame(y=runif(50), x1=runif(50), x2=runif(50), z=runif(50)) obj1 <- qgcomp.glm.noboot(y ~ z + x1 + x2, expnms = c('x1', 'x2'), data=dat, q=2) obj2 <- qgcomp.glm.boot(y ~ z + x1 + x2, expnms = c('x1', 'x2'), data=dat, q=2, B=10, seed=125) # does not need to be explicitly called, but included here for clarity print(obj1) print(obj2)
This function automatically selects between qgcomp.glm.noboot, qgcomp.glm.boot,
qgcomp.cox.noboot, and qgcomp.cox.boot
for the most efficient approach to estimate the average expected
change in the (log) outcome per quantile increase in the joint
exposure to all exposures in expnms', given the underlying model. For example, if the underlying model (specified by the formula
f) is a linear model with all linear terms for exposure, then
qgcomp.glm.noboot“ will be called to fit the model. Non-linear
terms or requesting the risk ratio for binomial outcomes will result in the qgcomp.glm.boot
function being called. For a given linear model, boot and noboot versions will give identical
inference, though when using survival outcomes, the 'boot' version uses simulation based
inference, which can vary from the 'noboot' version due to simulation error (which can
be minimized via setting the MCsize parameter very large - see
qgcomp.cox.boot
for details).
qgcomp(f, data = data, family = gaussian(), rr = TRUE, ...)
qgcomp(f, data = data, family = gaussian(), rr = TRUE, ...)
f |
R style formula (may include survival outcome via |
data |
data frame |
family |
|
rr |
logical: if using binary outcome and rr=TRUE, qgcomp.glm.boot will estimate risk ratio rather than odds ratio. Note, to get population average effect estimates for a binary outcome, set rr=TRUE (default: ORs are generally not of interest as population average effects, so if rr=FALSE then a conditional OR will be estimated, which cannot be interpreted as a population average effect |
... |
arguments to qgcomp.glm.noboot or qgcomp.glm.boot (e.g. q) or glm |
a qgcompfit object, which contains information about the effect measure of interest (psi) and associated variance (var.psi), as well as information on the model fit (fit) and possibly information on the marginal structural model (msmfit) used to estimate the final effect estimates (qgcomp.glm.boot, qgcomp.cox.boot only). If appropriate, weights are also reported, which represent the proportion of a directional (positive/negative) effect that is accounted for by each exposure.
qgcomp.glm.noboot
, qgcomp.glm.boot
,
qgcomp.cox.noboot
, qgcomp.cox.boot
qgcomp.zi.noboot
and qgcomp.zi.boot
(qgcomp
is just a wrapper for these functions)
set.seed(50) dat <- data.frame(y=runif(50), x1=runif(50), x2=runif(50), z=runif(50)) qgcomp.glm.noboot(y ~ z + x1 + x2, expnms = c('x1', 'x2'), data=dat, q=2) qgcomp.glm.boot(y ~ z + x1 + x2, expnms = c('x1', 'x2'), data=dat, q=2, B=10, seed=125) # automatically selects appropriate method qgcomp(y ~ z + x1 + x2, expnms = c('x1', 'x2'), data=dat, q=2) # note for binary outcome this will choose the risk ratio (and bootstrap methods) by default dat <- data.frame(y=rbinom(100, 1, 0.5), x1=runif(100), x2=runif(100), z=runif(100)) ## Not run: qgcomp.glm.noboot(y ~ z + x1 + x2, expnms = c('x1', 'x2'), data=dat, q=2, family=binomial()) set.seed(1231) qgcomp.glm.boot(y ~ z + x1 + x2, expnms = c('x1', 'x2'), data=dat, q=2, family=binomial()) set.seed(1231) qgcomp(y ~ z + x1 + x2, expnms = c('x1', 'x2'), data=dat, q=2, family=binomial()) # automatically selects appropriate method when specifying rr or degree explicitly qgcomp(y ~ z + x1 + x2, expnms = c('x1', 'x2'), data=dat, q=2, family=binomial(), rr=FALSE) qgcomp(y ~ z + x1 + x2, expnms = c('x1', 'x2'), data=dat, q=2, family=binomial(), rr=TRUE) qgcomp(y ~ z + factor(x1) + factor(x2), degree=2, expnms = c('x1', 'x2'), data=dat, q=4, family=binomial()) #survival objects set.seed(50) N=200 dat <- data.frame(time=(tmg <- pmin(.1,rweibull(N, 10, 0.1))), d=1.0*(tmg<0.1), x1=runif(N), x2=runif(N), z=runif(N)) expnms=paste0("x", 1:2) f = survival::Surv(time, d)~x1 + x2 qgcomp(f, expnms = expnms, data = dat) # note if B or MCsize are set but the model is linear, an error will result try(qgcomp(f, expnms = expnms, data = dat, B1=, MCsize)) # note that in the survival models, MCsize should be set to a large number # such that results are repeatable (within an error tolerance such as 2 significant digits) # if you run them under different seed values f = survival::Surv(time, d)~x1 + x2 + x1:x2 qgcomp(f, expnms = expnms, data = dat, B=10, MCsize=100) ## End(Not run)
set.seed(50) dat <- data.frame(y=runif(50), x1=runif(50), x2=runif(50), z=runif(50)) qgcomp.glm.noboot(y ~ z + x1 + x2, expnms = c('x1', 'x2'), data=dat, q=2) qgcomp.glm.boot(y ~ z + x1 + x2, expnms = c('x1', 'x2'), data=dat, q=2, B=10, seed=125) # automatically selects appropriate method qgcomp(y ~ z + x1 + x2, expnms = c('x1', 'x2'), data=dat, q=2) # note for binary outcome this will choose the risk ratio (and bootstrap methods) by default dat <- data.frame(y=rbinom(100, 1, 0.5), x1=runif(100), x2=runif(100), z=runif(100)) ## Not run: qgcomp.glm.noboot(y ~ z + x1 + x2, expnms = c('x1', 'x2'), data=dat, q=2, family=binomial()) set.seed(1231) qgcomp.glm.boot(y ~ z + x1 + x2, expnms = c('x1', 'x2'), data=dat, q=2, family=binomial()) set.seed(1231) qgcomp(y ~ z + x1 + x2, expnms = c('x1', 'x2'), data=dat, q=2, family=binomial()) # automatically selects appropriate method when specifying rr or degree explicitly qgcomp(y ~ z + x1 + x2, expnms = c('x1', 'x2'), data=dat, q=2, family=binomial(), rr=FALSE) qgcomp(y ~ z + x1 + x2, expnms = c('x1', 'x2'), data=dat, q=2, family=binomial(), rr=TRUE) qgcomp(y ~ z + factor(x1) + factor(x2), degree=2, expnms = c('x1', 'x2'), data=dat, q=4, family=binomial()) #survival objects set.seed(50) N=200 dat <- data.frame(time=(tmg <- pmin(.1,rweibull(N, 10, 0.1))), d=1.0*(tmg<0.1), x1=runif(N), x2=runif(N), z=runif(N)) expnms=paste0("x", 1:2) f = survival::Surv(time, d)~x1 + x2 qgcomp(f, expnms = expnms, data = dat) # note if B or MCsize are set but the model is linear, an error will result try(qgcomp(f, expnms = expnms, data = dat, B1=, MCsize)) # note that in the survival models, MCsize should be set to a large number # such that results are repeatable (within an error tolerance such as 2 significant digits) # if you run them under different seed values f = survival::Surv(time, d)~x1 + x2 + x1:x2 qgcomp(f, expnms = expnms, data = dat, B=10, MCsize=100) ## End(Not run)
This function performs quantile g-computation in a survival setting. The approach estimates the covariate-conditional hazard ratio for a joint change of 1 quantile in each exposure variable specified in expnms parameter
qgcomp.cch.noboot( f, data, subcoh = NULL, id = NULL, cohort.size = NULL, expnms = NULL, q = 4, breaks = NULL, weights, cluster = NULL, alpha = 0.05, ... )
qgcomp.cch.noboot( f, data, subcoh = NULL, id = NULL, cohort.size = NULL, expnms = NULL, q = 4, breaks = NULL, weights, cluster = NULL, alpha = 0.05, ... )
f |
R style survival formula, which includes |
data |
data frame |
subcoh |
(From |
id |
(From |
cohort.size |
(From |
expnms |
character vector of exposures of interest |
q |
NULL or number of quantiles used to create quantile indicator variables representing the exposure variables. If NULL, then gcomp proceeds with un-transformed version of exposures in the input datasets (useful if data are already transformed, or for performing standard g-computation) |
breaks |
(optional) NULL, or a list of (equal length) numeric vectors that characterize the minimum value of each category for which to break up the variables named in expnms. This is an alternative to using 'q' to define cutpoints. |
weights |
Not used here |
cluster |
not yet implemented |
alpha |
alpha level for confidence limit calculation |
... |
arguments to glm (e.g. family) |
For survival outcomes (as specified using methods from the survival package), this yields a conditional log hazard ratio representing a change in the expected conditional hazard (conditional on covariates) from increasing every exposure by 1 quantile. In general, this quantity quantity is not equivalent to g-computation estimates. Hypothesis test statistics and 95% confidence intervals are based on using the delta estimate variance of a linear combination of random variables.
a qgcompfit object, which contains information about the effect measure of interest (psi) and associated variance (var.psi), as well as information on the model fit (fit) and information on the weights/standardized coefficients in the positive (pos.weights) and negative (neg.weights) directions.
Other qgcomp_methods:
qgcomp.cox.boot()
,
qgcomp.cox.noboot()
,
qgcomp.glm.boot()
,
qgcomp.glm.ee()
,
qgcomp.glm.noboot()
,
qgcomp.hurdle.boot()
,
qgcomp.hurdle.noboot()
,
qgcomp.multinomial.boot()
,
qgcomp.multinomial.noboot()
,
qgcomp.partials()
,
qgcomp.zi.boot()
,
qgcomp.zi.noboot()
set.seed(50) N=200 dat <- data.frame(time=(tmg <- pmin(.1,rweibull(N, 10, 0.1))), d=1.0*(tmg<0.1), x1=runif(N), x2=runif(N), z=runif(N)) expnms=paste0("x", 1:2) f = survival::Surv(time, d)~x1 + x2 (fit1 <- survival::coxph(f, data = dat)) (obj <- qgcomp.cox.noboot(f, expnms = expnms, data = dat)) ## Not run: # weighted analysis dat$w = runif(N) qdata = quantize(dat, expnms=expnms) (obj2 <- qgcomp.cox.noboot(f, expnms = expnms, data = dat, weight=w)) obj2$fit survival::coxph(f, data = qdata$data, weight=w) # not run: bootstrapped version is much slower (obj2 <- qgcomp.cox.boot(f, expnms = expnms, data = dat, B=200, MCsize=20000)) ## End(Not run)
set.seed(50) N=200 dat <- data.frame(time=(tmg <- pmin(.1,rweibull(N, 10, 0.1))), d=1.0*(tmg<0.1), x1=runif(N), x2=runif(N), z=runif(N)) expnms=paste0("x", 1:2) f = survival::Surv(time, d)~x1 + x2 (fit1 <- survival::coxph(f, data = dat)) (obj <- qgcomp.cox.noboot(f, expnms = expnms, data = dat)) ## Not run: # weighted analysis dat$w = runif(N) qdata = quantize(dat, expnms=expnms) (obj2 <- qgcomp.cox.noboot(f, expnms = expnms, data = dat, weight=w)) obj2$fit survival::coxph(f, data = qdata$data, weight=w) # not run: bootstrapped version is much slower (obj2 <- qgcomp.cox.boot(f, expnms = expnms, data = dat, B=200, MCsize=20000)) ## End(Not run)
This function yields population average effect estimates for (possibly right censored) time-to event outcomes
qgcomp.cox.boot( f, data, expnms = NULL, q = 4, breaks = NULL, id = NULL, weights, cluster = NULL, alpha = 0.05, B = 200, MCsize = 10000, degree = 1, seed = NULL, parallel = FALSE, parplan = FALSE, ... )
qgcomp.cox.boot( f, data, expnms = NULL, q = 4, breaks = NULL, id = NULL, weights, cluster = NULL, alpha = 0.05, B = 200, MCsize = 10000, degree = 1, seed = NULL, parallel = FALSE, parplan = FALSE, ... )
f |
R style survival formula, which includes |
data |
data frame |
expnms |
character vector of exposures of interest |
q |
NULL or number of quantiles used to create quantile indicator variables representing the exposure variables. If NULL, then gcomp proceeds with un-transformed version of exposures in the input datasets (useful if data are already transformed, or for performing standard g-computation) |
breaks |
(optional) NULL, or a list of (equal length) numeric vectors that characterize the minimum value of each category for which to break up the variables named in expnms. This is an alternative to using 'q' to define cutpoints. |
id |
(optional) NULL, or variable name indexing individual units of observation (only needed if analyzing data with multiple observations per id/cluster). Note that qgcomp.cox.noboot will not produce cluster-appropriate standard errors. qgcomp.cox.boot can be used for this, which will use bootstrap sampling of clusters/individuals to estimate cluster-appropriate standard errors via bootstrapping. |
weights |
"case weights" - passed to the "weight" argument of
|
cluster |
not yet implemented |
alpha |
alpha level for confidence limit calculation |
B |
integer: number of bootstrap iterations (this should typically be >=200, though it is set lower in examples to improve run-time). |
MCsize |
integer: sample size for simulation to approximate marginal hazards ratios (if < sample size, then set to sample size). Note that large values will slow down the fitting, but will result in higher accuracy - if you run the function multiple times you will see that results vary due to simulation error. Ideally, MCsize would be set such that simulation error is negligible in the precision reported (e.g. if you report results to 2 decimal places, then MCsize should be set high enough that you consistenty get answers that are the same to 2 decimal places). |
degree |
polynomial bases for marginal model (e.g. degree = 2 allows that the relationship between the whole exposure mixture and the outcome is quadratic. |
seed |
integer or NULL: random number seed for replicable bootstrap results |
parallel |
logical (default FALSE): use future package to speed up bootstrapping |
parplan |
(logical, default=FALSE) automatically set future::plan to plan(multisession) (and set to existing plan, if any, after bootstrapping) |
... |
arguments to coxph |
qgcomp.cox.boot' estimates the log(hazard ratio) per quantile increase in the joint exposure to all exposures in
expnms'. This function uses g-computation to estimate the parameters of a
marginal structural model for the population average effect of increasing all
exposures in ‘expnms’ by a single quantile. This approach involves specifying
an underlying conditional outcome model, given all exposures of interest (possibly
with non-linear basis function representations such as splines or product terms)
and confounders or covariates of interest. This model is fit first, which is used
to generate expected outcomes at each quantile of all exposures, which is then
used in a second model to estimate a population average dose-response curve that
is linear or follows a simple polynomial function. See section on MCSize below
Test statistics and confidence intervals are based on a non-parametric bootstrap, using the standard deviation of the bootstrap estimates to estimate the standard error. The bootstrap standard error is then used to estimate Wald-type confidence intervals. Note that no bootstrapping is done on estimated quantiles of exposure, so these are treated as fixed quantities
MCSize is crucial to get accurate point estimates. In order to get marginal
estimates of the population hazard under different values of the joint exposure
at a given quantile for all exposures in expnms
, qgcomp.cox.boot
uses
Monte Carlo simulation to generate outcomes implied by the underlying conditional model
and then fit a separate (marginal structural) model to those outcomes. In order to get
accurate results that don't vary much from run-to-run of this approach, MCsize
must be set large enough so that results are stable across runs according to a pre-determined
precision (e.g. 2 significant digits).
a qgcompfit object, which contains information about the effect measure of interest (psi) and associated variance (var.psi), as well as information on the model fit (fit) and information on the marginal structural model (msmfit) used to estimate the final effect estimates.
Other qgcomp_methods:
qgcomp.cch.noboot()
,
qgcomp.cox.noboot()
,
qgcomp.glm.boot()
,
qgcomp.glm.ee()
,
qgcomp.glm.noboot()
,
qgcomp.hurdle.boot()
,
qgcomp.hurdle.noboot()
,
qgcomp.multinomial.boot()
,
qgcomp.multinomial.noboot()
,
qgcomp.partials()
,
qgcomp.zi.boot()
,
qgcomp.zi.noboot()
set.seed(50) N=200 dat <- data.frame(time=(tmg <- pmin(.1,rweibull(N, 10, 0.1))), d=1.0*(tmg<0.1), x1=runif(N), x2=runif(N), z=runif(N)) expnms=paste0("x", 1:2) f = survival::Surv(time, d)~x1 + x2 (fit1 <- survival::coxph(f, data = dat)) (obj <- qgcomp.cox.noboot(f, expnms = expnms, data = dat)) ## Not run: # not run (slow when using boot version to proper precision) (obj2 <- qgcomp.cox.boot(f, expnms = expnms, data = dat, B=10, MCsize=20000)) # weighted analysis # using future package, marginalizing over confounder z (obj3 <- qgcomp.cox.boot(survival::Surv(time, d)~x1 + x2 + z, expnms = expnms, data = dat, B=1000, MCsize=20000, parallel=TRUE, parplan=TRUE)) # non-constant hazard ratio, non-linear terms (obj4 <- qgcomp.cox.boot(survival::Surv(time, d)~factor(x1) + splines::bs(x2) + z, expnms = expnms, data = dat, B=1000, MCsize=20000, parallel=FALSE, degree=1)) # weighted analysis dat$w = runif(N) (objw1 <- qgcomp.cox.noboot(f, expnms = expnms, data = dat, weights=w)) (objw2 <- qgcomp.cox.boot(f, expnms = expnms, data = dat, weights=w, B=5, MCsize=20000)) ## End(Not run)
set.seed(50) N=200 dat <- data.frame(time=(tmg <- pmin(.1,rweibull(N, 10, 0.1))), d=1.0*(tmg<0.1), x1=runif(N), x2=runif(N), z=runif(N)) expnms=paste0("x", 1:2) f = survival::Surv(time, d)~x1 + x2 (fit1 <- survival::coxph(f, data = dat)) (obj <- qgcomp.cox.noboot(f, expnms = expnms, data = dat)) ## Not run: # not run (slow when using boot version to proper precision) (obj2 <- qgcomp.cox.boot(f, expnms = expnms, data = dat, B=10, MCsize=20000)) # weighted analysis # using future package, marginalizing over confounder z (obj3 <- qgcomp.cox.boot(survival::Surv(time, d)~x1 + x2 + z, expnms = expnms, data = dat, B=1000, MCsize=20000, parallel=TRUE, parplan=TRUE)) # non-constant hazard ratio, non-linear terms (obj4 <- qgcomp.cox.boot(survival::Surv(time, d)~factor(x1) + splines::bs(x2) + z, expnms = expnms, data = dat, B=1000, MCsize=20000, parallel=FALSE, degree=1)) # weighted analysis dat$w = runif(N) (objw1 <- qgcomp.cox.noboot(f, expnms = expnms, data = dat, weights=w)) (objw2 <- qgcomp.cox.boot(f, expnms = expnms, data = dat, weights=w, B=5, MCsize=20000)) ## End(Not run)
This function performs quantile g-computation in a survival setting. The approach estimates the covariate-conditional hazard ratio for a joint change of 1 quantile in each exposure variable specified in expnms parameter
qgcomp.cox.noboot( f, data, expnms = NULL, q = 4, breaks = NULL, id = NULL, weights, cluster = NULL, alpha = 0.05, ... )
qgcomp.cox.noboot( f, data, expnms = NULL, q = 4, breaks = NULL, id = NULL, weights, cluster = NULL, alpha = 0.05, ... )
f |
R style survival formula, which includes |
data |
data frame |
expnms |
character vector of exposures of interest |
q |
NULL or number of quantiles used to create quantile indicator variables representing the exposure variables. If NULL, then gcomp proceeds with un-transformed version of exposures in the input datasets (useful if data are already transformed, or for performing standard g-computation) |
breaks |
(optional) NULL, or a list of (equal length) numeric vectors that characterize the minimum value of each category for which to break up the variables named in expnms. This is an alternative to using 'q' to define cutpoints. |
id |
(optional) NULL, or variable name indexing individual units of observation (only needed if analyzing data with multiple observations per id/cluster) |
weights |
"case weights" - passed to the "weight" argument of
|
cluster |
not yet implemented |
alpha |
alpha level for confidence limit calculation |
... |
arguments to coxph |
For survival outcomes (as specified using methods from the survival package), this yields a conditional log hazard ratio representing a change in the expected conditional hazard (conditional on covariates) from increasing every exposure by 1 quantile. In general, this quantity quantity is not equivalent to g-computation estimates. Hypothesis test statistics and 95% confidence intervals are based on using the delta estimate variance of a linear combination of random variables.
a qgcompfit object, which contains information about the effect measure of interest (psi) and associated variance (var.psi), as well as information on the model fit (fit) and information on the weights/standardized coefficients in the positive (pos.weights) and negative (neg.weights) directions.
qgcomp.cox.boot
, qgcomp.glm.boot
,
and qgcomp
Other qgcomp_methods:
qgcomp.cch.noboot()
,
qgcomp.cox.boot()
,
qgcomp.glm.boot()
,
qgcomp.glm.ee()
,
qgcomp.glm.noboot()
,
qgcomp.hurdle.boot()
,
qgcomp.hurdle.noboot()
,
qgcomp.multinomial.boot()
,
qgcomp.multinomial.noboot()
,
qgcomp.partials()
,
qgcomp.zi.boot()
,
qgcomp.zi.noboot()
set.seed(50) N=200 dat <- data.frame(time=(tmg <- pmin(.1,rweibull(N, 10, 0.1))), d=1.0*(tmg<0.1), x1=runif(N), x2=runif(N), z=runif(N)) expnms=paste0("x", 1:2) f = survival::Surv(time, d)~x1 + x2 (fit1 <- survival::coxph(f, data = dat)) (obj <- qgcomp.cox.noboot(f, expnms = expnms, data = dat)) ## Not run: # weighted analysis dat$w = runif(N) qdata = quantize(dat, expnms=expnms) (obj2 <- qgcomp.cox.noboot(f, expnms = expnms, data = dat, weight=w)) obj2$fit survival::coxph(f, data = qdata$data, weight=w) # not run: bootstrapped version is much slower (obj2 <- qgcomp.cox.boot(f, expnms = expnms, data = dat, B=200, MCsize=20000)) ## End(Not run)
set.seed(50) N=200 dat <- data.frame(time=(tmg <- pmin(.1,rweibull(N, 10, 0.1))), d=1.0*(tmg<0.1), x1=runif(N), x2=runif(N), z=runif(N)) expnms=paste0("x", 1:2) f = survival::Surv(time, d)~x1 + x2 (fit1 <- survival::coxph(f, data = dat)) (obj <- qgcomp.cox.noboot(f, expnms = expnms, data = dat)) ## Not run: # weighted analysis dat$w = runif(N) qdata = quantize(dat, expnms=expnms) (obj2 <- qgcomp.cox.noboot(f, expnms = expnms, data = dat, weight=w)) obj2$fit survival::coxph(f, data = qdata$data, weight=w) # not run: bootstrapped version is much slower (obj2 <- qgcomp.cox.boot(f, expnms = expnms, data = dat, B=200, MCsize=20000)) ## End(Not run)
This function estimates a dose-response parameter representing a one quantile increase in a set of exposures of interest. This model estimates the parameters of a marginal structural model (MSM) based on g-computation with quantized exposures. Note: this function allows non-linear and non-additive effects of individual components of the exposure, as well as non-linear joint effects of the mixture via polynomial basis functions, which increase the computational computational burden due to the need for non-parametric bootstrapping. qgcomp.boot is an equivalent function (slated for deprecation)
qgcomp.glm.boot( f, data, expnms = NULL, q = 4, breaks = NULL, id = NULL, weights, alpha = 0.05, B = 200, rr = TRUE, degree = 1, seed = NULL, bayes = FALSE, MCsize = nrow(data), parallel = FALSE, parplan = FALSE, ... ) qgcomp.boot( f, data, expnms = NULL, q = 4, breaks = NULL, id = NULL, weights, alpha = 0.05, B = 200, rr = TRUE, degree = 1, seed = NULL, bayes = FALSE, MCsize = nrow(data), parallel = FALSE, parplan = FALSE, ... )
qgcomp.glm.boot( f, data, expnms = NULL, q = 4, breaks = NULL, id = NULL, weights, alpha = 0.05, B = 200, rr = TRUE, degree = 1, seed = NULL, bayes = FALSE, MCsize = nrow(data), parallel = FALSE, parplan = FALSE, ... ) qgcomp.boot( f, data, expnms = NULL, q = 4, breaks = NULL, id = NULL, weights, alpha = 0.05, B = 200, rr = TRUE, degree = 1, seed = NULL, bayes = FALSE, MCsize = nrow(data), parallel = FALSE, parplan = FALSE, ... )
f |
R style formula |
data |
data frame |
expnms |
character vector of exposures of interest |
q |
NULL or number of quantiles used to create quantile indicator variables representing the exposure variables. If NULL, then gcomp proceeds with un-transformed version of exposures in the input datasets (useful if data are already transformed, or for performing standard g-computation) |
breaks |
(optional) NULL, or a list of (equal length) numeric vectors that characterize the minimum value of each category for which to break up the variables named in expnms. This is an alternative to using 'q' to define cutpoints. |
id |
(optional) NULL, or variable name indexing individual units of observation (only needed if analyzing data with multiple observations per id/cluster). Note that qgcomp.glm.noboot will not produce cluster-appropriate standard errors. qgcomp.glm.boot can be used for this, which will use bootstrap sampling of clusters/individuals to estimate cluster-appropriate standard errors via bootstrapping. |
weights |
"case weights" - passed to the "weight" argument of
|
alpha |
alpha level for confidence limit calculation |
B |
integer: number of bootstrap iterations (this should typically be >=200, though it is set lower in examples to improve run-time). |
rr |
logical: if using binary outcome and rr=TRUE, qgcomp.glm.boot will estimate risk ratio rather than odds ratio |
degree |
polynomial bases for marginal model (e.g. degree = 2 allows that the relationship between the whole exposure mixture and the outcome is quadratic (default = 1). |
seed |
integer or NULL: random number seed for replicable bootstrap results |
bayes |
use underlying Bayesian model ( |
MCsize |
integer: sample size for simulation to approximate marginal zero inflated model parameters. This can be left small for testing, but should be as large as needed to reduce simulation error to an acceptable magnitude (can compare psi coefficients for linear fits with qgcomp.glm.noboot to gain some intuition for the level of expected simulation error at a given value of MCsize). This likely won't matter much in linear models, but may be important with binary or count outcomes. |
parallel |
use (safe) parallel processing from the future and future.apply packages |
parplan |
(logical, default=FALSE) automatically set future::plan to plan(multisession) (and set to existing plan, if any, after bootstrapping) |
... |
arguments to glm (e.g. family) |
Estimates correspond to the average expected change in the (log) outcome per quantile increase in the joint exposure to all exposures in ‘expnms’. Test statistics and confidence intervals are based on a non-parametric bootstrap, using the standard deviation of the bootstrap estimates to estimate the standard error. The bootstrap standard error is then used to estimate Wald-type confidence intervals. Note that no bootstrapping is done on estimated quantiles of exposure, so these are treated as fixed quantities
a qgcompfit object, which contains information about the effect measure of interest (psi) and associated variance (var.psi), as well as information on the model fit (fit) and information on the marginal structural model (msmfit) used to estimate the final effect estimates.
Other qgcomp_methods:
qgcomp.cch.noboot()
,
qgcomp.cox.boot()
,
qgcomp.cox.noboot()
,
qgcomp.glm.ee()
,
qgcomp.glm.noboot()
,
qgcomp.hurdle.boot()
,
qgcomp.hurdle.noboot()
,
qgcomp.multinomial.boot()
,
qgcomp.multinomial.noboot()
,
qgcomp.partials()
,
qgcomp.zi.boot()
,
qgcomp.zi.noboot()
set.seed(30) # continuous outcome dat <- data.frame(y=rnorm(100), x1=runif(100), x2=runif(100), z=runif(100)) # Conditional linear slope qgcomp.glm.noboot(y ~ z + x1 + x2, expnms = c('x1', 'x2'), data=dat, q=4, family=gaussian()) # Marginal linear slope (population average slope, for a purely linear, # additive model this will equal the conditional) ## Not run: qgcomp.glm.boot(f=y ~ z + x1 + x2, expnms = c('x1', 'x2'), data=dat, q=4, family=gaussian(), B=200) # B should be at least 200 in actual examples # no intercept model qgcomp.glm.boot(f=y ~ -1+z + x1 + x2, expnms = c('x1', 'x2'), data=dat, q=4, family=gaussian(), B=200) # B should be at least 200 in actual examples # Note that these give different answers! In the first, the estimate is conditional on Z, # but in the second, Z is marginalized over via standardization. The estimates # can be made approximately the same by centering Z (for linear models), but # the conditional estimate will typically have lower standard errors. dat$z = dat$z - mean(dat$z) # Conditional linear slope qgcomp.glm.noboot(y ~ z + x1 + x2, expnms = c('x1', 'x2'), data=dat, q=4, family=gaussian()) # Marginal linear slope (population average slope, for a purely linear, # additive model this will equal the conditional) qgcomp.glm.boot(f=y ~ z + x1 + x2, expnms = c('x1', 'x2'), data=dat, q=4, family=gaussian(), B=200) # B should be at least 200 in actual examples # Population average mixture slope which accounts for non-linearity and interactions qgcomp.glm.boot(y ~ z + x1 + x2 + I(x1^2) + I(x2*x1), family="gaussian", expnms = c('x1', 'x2'), data=dat, q=4, B=200) # generally non-linear/non-addiive underlying models lead to non-linear mixture slopes qgcomp.glm.boot(y ~ z + x1 + x2 + I(x1^2) + I(x2*x1), family="gaussian", expnms = c('x1', 'x2'), data=dat, q=4, B=200, deg=2) # binary outcome dat <- data.frame(y=rbinom(50,1,0.5), x1=runif(50), x2=runif(50), z=runif(50)) # Conditional mixture OR qgcomp.glm.noboot(y ~ z + x1 + x2, family="binomial", expnms = c('x1', 'x2'), data=dat, q=2) #Marginal mixture OR (population average OR - in general, this will not equal the # conditional mixture OR due to non-collapsibility of the OR) qgcomp.glm.boot(y ~ z + x1 + x2, family="binomial", expnms = c('x1', 'x2'), data=dat, q=2, B=3, rr=FALSE) # Population average mixture RR qgcomp.glm.boot(y ~ z + x1 + x2, family="binomial", expnms = c('x1', 'x2'), data=dat, q=2, rr=TRUE, B=3) # Population average mixture RR, indicator variable representation of x2 # note that I(x==...) operates on the quantile-based category of x, # rather than the raw value res = qgcomp.glm.boot(y ~ z + x1 + I(x2==1) + I(x2==2) + I(x2==3), family="binomial", expnms = c('x1', 'x2'), data=dat, q=4, rr=TRUE, B=200) res$fit plot(res) # now add in a non-linear MSM res2 = qgcomp.glm.boot(y ~ z + x1 + I(x2==1) + I(x2==2) + I(x2==3), family="binomial", expnms = c('x1', 'x2'), data=dat, q=4, rr=TRUE, B=200, degree=2) res2$fit res2$msmfit # correct point estimates, incorrect standard errors res2 # correct point estimates, correct standard errors plot(res2) # Log risk ratio per one IQR change in all exposures (not on quantile basis) dat$x1iqr <- dat$x1/with(dat, diff(quantile(x1, c(.25, .75)))) dat$x2iqr <- dat$x2/with(dat, diff(quantile(x2, c(.25, .75)))) # note that I(x>...) now operates on the untransformed value of x, # rather than the quantized value res2 = qgcomp.glm.boot(y ~ z + x1iqr + I(x2iqr>0.1) + I(x2>0.4) + I(x2>0.9), family="binomial", expnms = c('x1iqr', 'x2iqr'), data=dat, q=NULL, rr=TRUE, B=200, degree=2) res2 # using parallel processing qgcomp.glm.boot(y ~ z + x1iqr + I(x2iqr>0.1) + I(x2>0.4) + I(x2>0.9), family="binomial", expnms = c('x1iqr', 'x2iqr'), data=dat, q=NULL, rr=TRUE, B=200, degree=2, parallel=TRUE, parplan=TRUE) # weighted model N=5000 dat4 <- data.frame(id=seq_len(N), x1=runif(N), x2=runif(N), z=runif(N)) dat4$y <- with(dat4, rnorm(N, x1*z + z, 1)) dat4$w=runif(N) + dat4$z*5 qdata = quantize(dat4, expnms = c("x1", "x2"), q=4)$data # first equivalent models with no covariates qgcomp.glm.noboot(f=y ~ x1 + x2, expnms = c('x1', 'x2'), data=dat4, q=4, family=gaussian()) qgcomp.glm.noboot(f=y ~ x1 + x2, expnms = c('x1', 'x2'), data=dat4, q=4, family=gaussian(), weights=w) set.seed(13) qgcomp.glm.boot(f=y ~ x1 + x2, expnms = c('x1', 'x2'), data=dat4, q=4, family=gaussian(), weights=w) # using the correct model set.seed(13) qgcomp.glm.boot(f=y ~ x1*z + x2, expnms = c('x1', 'x2'), data=dat4, q=4, family=gaussian(), weights=w, id="id") (qgcfit <- qgcomp.glm.boot(f=y ~ z + x1 + x2, expnms = c('x1', 'x2'), data=dat4, q=4, family=gaussian(), weights=w)) qgcfit$fit summary(glm(y ~ z + x1 + x2, data = qdata, weights=w)) ## End(Not run)
set.seed(30) # continuous outcome dat <- data.frame(y=rnorm(100), x1=runif(100), x2=runif(100), z=runif(100)) # Conditional linear slope qgcomp.glm.noboot(y ~ z + x1 + x2, expnms = c('x1', 'x2'), data=dat, q=4, family=gaussian()) # Marginal linear slope (population average slope, for a purely linear, # additive model this will equal the conditional) ## Not run: qgcomp.glm.boot(f=y ~ z + x1 + x2, expnms = c('x1', 'x2'), data=dat, q=4, family=gaussian(), B=200) # B should be at least 200 in actual examples # no intercept model qgcomp.glm.boot(f=y ~ -1+z + x1 + x2, expnms = c('x1', 'x2'), data=dat, q=4, family=gaussian(), B=200) # B should be at least 200 in actual examples # Note that these give different answers! In the first, the estimate is conditional on Z, # but in the second, Z is marginalized over via standardization. The estimates # can be made approximately the same by centering Z (for linear models), but # the conditional estimate will typically have lower standard errors. dat$z = dat$z - mean(dat$z) # Conditional linear slope qgcomp.glm.noboot(y ~ z + x1 + x2, expnms = c('x1', 'x2'), data=dat, q=4, family=gaussian()) # Marginal linear slope (population average slope, for a purely linear, # additive model this will equal the conditional) qgcomp.glm.boot(f=y ~ z + x1 + x2, expnms = c('x1', 'x2'), data=dat, q=4, family=gaussian(), B=200) # B should be at least 200 in actual examples # Population average mixture slope which accounts for non-linearity and interactions qgcomp.glm.boot(y ~ z + x1 + x2 + I(x1^2) + I(x2*x1), family="gaussian", expnms = c('x1', 'x2'), data=dat, q=4, B=200) # generally non-linear/non-addiive underlying models lead to non-linear mixture slopes qgcomp.glm.boot(y ~ z + x1 + x2 + I(x1^2) + I(x2*x1), family="gaussian", expnms = c('x1', 'x2'), data=dat, q=4, B=200, deg=2) # binary outcome dat <- data.frame(y=rbinom(50,1,0.5), x1=runif(50), x2=runif(50), z=runif(50)) # Conditional mixture OR qgcomp.glm.noboot(y ~ z + x1 + x2, family="binomial", expnms = c('x1', 'x2'), data=dat, q=2) #Marginal mixture OR (population average OR - in general, this will not equal the # conditional mixture OR due to non-collapsibility of the OR) qgcomp.glm.boot(y ~ z + x1 + x2, family="binomial", expnms = c('x1', 'x2'), data=dat, q=2, B=3, rr=FALSE) # Population average mixture RR qgcomp.glm.boot(y ~ z + x1 + x2, family="binomial", expnms = c('x1', 'x2'), data=dat, q=2, rr=TRUE, B=3) # Population average mixture RR, indicator variable representation of x2 # note that I(x==...) operates on the quantile-based category of x, # rather than the raw value res = qgcomp.glm.boot(y ~ z + x1 + I(x2==1) + I(x2==2) + I(x2==3), family="binomial", expnms = c('x1', 'x2'), data=dat, q=4, rr=TRUE, B=200) res$fit plot(res) # now add in a non-linear MSM res2 = qgcomp.glm.boot(y ~ z + x1 + I(x2==1) + I(x2==2) + I(x2==3), family="binomial", expnms = c('x1', 'x2'), data=dat, q=4, rr=TRUE, B=200, degree=2) res2$fit res2$msmfit # correct point estimates, incorrect standard errors res2 # correct point estimates, correct standard errors plot(res2) # Log risk ratio per one IQR change in all exposures (not on quantile basis) dat$x1iqr <- dat$x1/with(dat, diff(quantile(x1, c(.25, .75)))) dat$x2iqr <- dat$x2/with(dat, diff(quantile(x2, c(.25, .75)))) # note that I(x>...) now operates on the untransformed value of x, # rather than the quantized value res2 = qgcomp.glm.boot(y ~ z + x1iqr + I(x2iqr>0.1) + I(x2>0.4) + I(x2>0.9), family="binomial", expnms = c('x1iqr', 'x2iqr'), data=dat, q=NULL, rr=TRUE, B=200, degree=2) res2 # using parallel processing qgcomp.glm.boot(y ~ z + x1iqr + I(x2iqr>0.1) + I(x2>0.4) + I(x2>0.9), family="binomial", expnms = c('x1iqr', 'x2iqr'), data=dat, q=NULL, rr=TRUE, B=200, degree=2, parallel=TRUE, parplan=TRUE) # weighted model N=5000 dat4 <- data.frame(id=seq_len(N), x1=runif(N), x2=runif(N), z=runif(N)) dat4$y <- with(dat4, rnorm(N, x1*z + z, 1)) dat4$w=runif(N) + dat4$z*5 qdata = quantize(dat4, expnms = c("x1", "x2"), q=4)$data # first equivalent models with no covariates qgcomp.glm.noboot(f=y ~ x1 + x2, expnms = c('x1', 'x2'), data=dat4, q=4, family=gaussian()) qgcomp.glm.noboot(f=y ~ x1 + x2, expnms = c('x1', 'x2'), data=dat4, q=4, family=gaussian(), weights=w) set.seed(13) qgcomp.glm.boot(f=y ~ x1 + x2, expnms = c('x1', 'x2'), data=dat4, q=4, family=gaussian(), weights=w) # using the correct model set.seed(13) qgcomp.glm.boot(f=y ~ x1*z + x2, expnms = c('x1', 'x2'), data=dat4, q=4, family=gaussian(), weights=w, id="id") (qgcfit <- qgcomp.glm.boot(f=y ~ z + x1 + x2, expnms = c('x1', 'x2'), data=dat4, q=4, family=gaussian(), weights=w)) qgcfit$fit summary(glm(y ~ z + x1 + x2, data = qdata, weights=w)) ## End(Not run)
This function estimates a dose-response parameter representing a one quantile increase in a set of exposures of interest. This model estimates the parameters of a marginal structural model (MSM) based on g-computation with quantized exposures. Note: this function allows non-linear and non-additive effects of individual components of the exposure, as well as non-linear joint effects of the mixture via polynomial basis functions, which increase the computational computational burden due to the need for non-parametric bootstrapping. qgcomp.ee is an equivalent function (slated for deprecation)
qgcomp.glm.ee( f, data, expnms = NULL, q = 4, breaks = NULL, id = NULL, weights, offset = NULL, alpha = 0.05, B = 200, rr = TRUE, degree = 1, seed = NULL, bayes = FALSE, includeX = TRUE, verbose = TRUE, ... ) qgcomp.ee( f, data, expnms = NULL, q = 4, breaks = NULL, id = NULL, weights, offset = NULL, alpha = 0.05, B = 200, rr = TRUE, degree = 1, seed = NULL, bayes = FALSE, includeX = TRUE, verbose = TRUE, ... )
qgcomp.glm.ee( f, data, expnms = NULL, q = 4, breaks = NULL, id = NULL, weights, offset = NULL, alpha = 0.05, B = 200, rr = TRUE, degree = 1, seed = NULL, bayes = FALSE, includeX = TRUE, verbose = TRUE, ... ) qgcomp.ee( f, data, expnms = NULL, q = 4, breaks = NULL, id = NULL, weights, offset = NULL, alpha = 0.05, B = 200, rr = TRUE, degree = 1, seed = NULL, bayes = FALSE, includeX = TRUE, verbose = TRUE, ... )
f |
R style formula |
data |
data frame |
expnms |
character vector of exposures of interest |
q |
NULL or number of quantiles used to create quantile indicator variables representing the exposure variables. If NULL, then gcomp proceeds with un-transformed version of exposures in the input datasets (useful if data are already transformed, or for performing standard g-computation) |
breaks |
(optional) NULL, or a list of (equal length) numeric vectors that characterize the minimum value of each category for which to break up the variables named in expnms. This is an alternative to using 'q' to define cutpoints. |
id |
(optional) NULL, or variable name indexing individual units of observation (only needed if analyzing data with multiple observations per id/cluster). Note that qgcomp.glm.noboot will not produce cluster-appropriate standard errors. qgcomp.glm.ee can be used for this, which will use bootstrap sampling of clusters/individuals to estimate cluster-appropriate standard errors via bootstrapping. |
weights |
NULL or "case weights" - sampling weights representing a proportional representation in the analysis data |
offset |
|
alpha |
alpha level for confidence limit calculation |
B |
integer: number of bootstrap iterations (this should typically be >=200, though it is set lower in examples to improve run-time). |
rr |
logical: if using binary outcome and rr=TRUE, qgcomp.glm.ee will estimate risk ratio rather than odds ratio |
degree |
polynomial bases for marginal model (e.g. degree = 2 allows that the relationship between the whole exposure mixture and the outcome is quadratic (default = 1). |
seed |
integer or NULL: random number seed for replicable bootstrap results |
bayes |
use underlying Bayesian model ( |
includeX |
(logical) should the design/predictor matrix be included in the output via the fit and msmfit parameters? |
verbose |
(logical) give extra messages about fits |
... |
arguments to glm (e.g. family) |
Estimates correspond to the average expected change in the (log) outcome per quantile increase in the joint exposure to all exposures in ‘expnms’. Test statistics and confidence intervals are based on a non-parametric bootstrap, using the standard deviation of the bootstrap estimates to estimate the standard error. The bootstrap standard error is then used to estimate Wald-type confidence intervals. Note that no bootstrapping is done on estimated quantiles of exposure, so these are treated as fixed quantities
Note that there is an argument to "family" that is optional, but it defaults to gaussian
per the glm function. Current options allow "gaussian" (or gaussian()) or "binomial." This
function defaults to canonical links (binomial -> logistic link, gaussian ->
identity link). However, setting family="binomial" and rr=TRUE uses a
marginal structural model with a log-link (and an underlying conditional model with
a logistic link). This mimics the behavior of qgcomp.glm.boot
which maximizes backwards compatibility but also is a useful default to avoid
convergence issues when using the log-link for conditional models.
a qgcompfit object, which contains information about the effect measure of interest (psi) and associated variance (var.psi), as well as information on the model fit (fit) and information on the marginal structural model (msmfit) used to estimate the final effect estimates.
Other qgcomp_methods:
qgcomp.cch.noboot()
,
qgcomp.cox.boot()
,
qgcomp.cox.noboot()
,
qgcomp.glm.boot()
,
qgcomp.glm.noboot()
,
qgcomp.hurdle.boot()
,
qgcomp.hurdle.noboot()
,
qgcomp.multinomial.boot()
,
qgcomp.multinomial.noboot()
,
qgcomp.partials()
,
qgcomp.zi.boot()
,
qgcomp.zi.noboot()
set.seed(30) # continuous outcome dat <- data.frame(y=rnorm(100), x1=runif(100), x2=runif(100), z=runif(100)) # Conditional linear slope qgcomp.glm.noboot(y ~ z + x1 + x2, expnms = c('x1', 'x2'), data=dat, q=4, family=gaussian()) # Marginal linear slope (population average slope, for a purely linear, # additive model this will equal the conditional) qgcomp.glm.ee(f=y ~ z + x1 + x2, expnms = c('x1', 'x2'), data=dat, q=4, family=gaussian()) qgcomp.glm.ee(f=y ~ x1 + x2 + I(x1*x2) + z, expnms = c('x1', 'x2'), data=dat, q=4, family=gaussian()) # no intercept model qgcomp.glm.ee(f=y ~ -1+z + x1 + x2, expnms = c('x1', 'x2'), data=dat, q=4, family=gaussian()) # Note that these give different answers! In the first, the estimate is conditional on Z, # but in the second, Z is marginalized over via standardization. The estimates # can be made approximately the same by centering Z (for linear models), but # the conditional estimate will typically have lower standard errors. dat$z = dat$z - mean(dat$z) # Conditional linear slope qgcomp.glm.noboot(y ~ z + x1 + x2, expnms = c('x1', 'x2'), data=dat, q=4, family=gaussian()) # Marginal linear slope (population average slope, for a purely linear, # additive model this will equal the conditional) qgcomp.glm.ee(f=y ~ z + x1 + x2, expnms = c('x1', 'x2'), data=dat, q=4, family=gaussian()) ## Not run: # Population average mixture slope which accounts for non-linearity and interactions # Note this is one use case for the estimating equations approach: replacing bootstrapping qgcomp.glm.ee(y ~ z + x1 + x2 + I(x1^2) + I(x2*x1), family="gaussian", expnms = c('x1', 'x2'), data=dat, q=4) qgcomp.glm.boot(y ~ z + x1 + x2 + I(x1^2) + I(x2*x1), family="gaussian", expnms = c('x1', 'x2'), data=dat, q=4, B=1000) # generally non-linear/non-addiive underlying models lead to non-linear mixture slopes dat$y = dat$y + dat$x1*dat$x2 qgcomp.glm.ee(y ~ z + x1 + x2 + I(x1^2) + I(x2*x1), family="gaussian", expnms = c('x1', 'x2'), data=dat, q=4, deg=2) qgcomp.glm.boot(y ~ z + x1 + x2 + I(x1^2) + I(x2*x1), family="gaussian", expnms = c('x1', 'x2'), data=dat, q=4, deg=2, B=1000) # binary outcome dat <- data.frame(y=rbinom(50,1,0.5), x1=runif(50), x2=runif(50), z=runif(50)) # Conditional mixture OR qgcomp.glm.noboot(y ~ z + x1 + x2, family="binomial", expnms = c('x1', 'x2'), data=dat, q=2) #Marginal mixture OR (population average OR - in general, this will not equal the # conditional mixture OR due to non-collapsibility of the OR) qgcomp.glm.boot(y ~ z + x1 + x2, family="binomial", expnms = c('x1', 'x2'), data=dat, q=2, B=300, MCsize=5000, rr=FALSE) qgcomp.glm.ee(y ~ z + x1 + x2, family="binomial", expnms = c('x1', 'x2'), data=dat, q=2, rr=FALSE) # Population average mixture RR qgcomp.glm.boot(y ~ z + x1 + x2, family="binomial", expnms = c('x1', 'x2'), data=dat, q=2, B=300, MCsize=5000, rr=TRUE) qgcomp.glm.ee(y ~ z + x1 + x2, family="binomial", expnms = c('x1', 'x2'), data=dat, q=2, rr=TRUE) # getting the RR with the poisson trick qgcomp.glm.ee(y ~ z + x1 + x2, family="poisson", expnms = c('x1', 'x2'), data=dat, q=2) qgcomp.glm.boot(y ~ z + x1 + x2, family="poisson", expnms = c('x1', 'x2'), data=dat, q=2, B=300, MCsize=5000) # Population average mixture RR, indicator variable representation of x2 # note that I(x==...) operates on the quantile-based category of x, # rather than the raw value res = qgcomp.glm.ee(y ~ z + x1 + I(x2==1) + I(x2==2) + I(x2==3), family="binomial", expnms = c('x1', 'x2'), data=dat, q=4, rr=TRUE) res$fit plot(res) # now add in a non-linear MSM res2a = qgcomp.glm.boot(y ~ z + x1 + I(x2==1) + I(x2==2) + I(x2==3), family="binomial", expnms = c('x1', 'x2'), data=dat, q=4, rr=TRUE, degree=2) res2 = qgcomp.glm.ee(y ~ z + x1 + I(x2==1) + I(x2==2) + I(x2==3), family="binomial", expnms = c('x1', 'x2'), data=dat, q=4, rr=TRUE, degree=2) # conditional model estimates (identical point estimates and similar std. errors) summary(res2a$fit)$coefficients res2$fit # msm estimates (identical point estimates and different std. errors) summary(res2a$msmfit)$coefficients # correct point estimates, incorrect standard errors res2 # correct point estimates, correct standard errors plot(res2) # Log risk ratio per one IQR change in all exposures (not on quantile basis) dat$x1iqr <- dat$x1/with(dat, diff(quantile(x1, c(.25, .75)))) dat$x2iqr <- dat$x2/with(dat, diff(quantile(x2, c(.25, .75)))) # note that I(x>...) now operates on the untransformed value of x, # rather than the quantized value res2 = qgcomp.glm.ee(y ~ z + x1iqr + I(x2iqr>0.1) + I(x2>0.4) + I(x2>0.9), family="binomial", expnms = c('x1iqr', 'x2iqr'), data=dat, q=NULL, rr=TRUE, degree=2) res2 # weighted model N=5000 dat4 <- data.frame(id=seq_len(N), x1=runif(N), x2=runif(N), z=runif(N)) dat4$y <- with(dat4, rnorm(N, x1*z + z, 1)) dat4$w=runif(N) + dat4$z*5 qdata = quantize(dat4, expnms = c("x1", "x2"), q=4)$data # first equivalent models with no covariates qgcomp.glm.noboot(f=y ~ x1 + x2, expnms = c('x1', 'x2'), data=dat4, q=4, family=gaussian()) qgcomp.glm.noboot(f=y ~ x1 + x2, expnms = c('x1', 'x2'), data=dat4, q=4, family=gaussian(), weights=w) set.seed(13) qgcomp.glm.ee(f=y ~ x1 + x2, expnms = c('x1', 'x2'), data=dat4, q=4, family=gaussian()) qgcomp.glm.ee(f=y ~ x1 + x2, expnms = c('x1', 'x2'), data=dat4, q=4, family=gaussian(), weights=w) # using the correct model set.seed(13) qgcomp.glm.ee(f=y ~ x1*z + x2, expnms = c('x1', 'x2'), data=dat4, q=4, family=gaussian(), weights=w, id="id") (qgcfit <- qgcomp.glm.ee(f=y ~ z + x1 + x2, expnms = c('x1', 'x2'), data=dat4, q=4, family=gaussian(), weights=w)) qgcfit$fit summary(glm(y ~ z + x1 + x2, data = qdata, weights=w)) ## End(Not run)
set.seed(30) # continuous outcome dat <- data.frame(y=rnorm(100), x1=runif(100), x2=runif(100), z=runif(100)) # Conditional linear slope qgcomp.glm.noboot(y ~ z + x1 + x2, expnms = c('x1', 'x2'), data=dat, q=4, family=gaussian()) # Marginal linear slope (population average slope, for a purely linear, # additive model this will equal the conditional) qgcomp.glm.ee(f=y ~ z + x1 + x2, expnms = c('x1', 'x2'), data=dat, q=4, family=gaussian()) qgcomp.glm.ee(f=y ~ x1 + x2 + I(x1*x2) + z, expnms = c('x1', 'x2'), data=dat, q=4, family=gaussian()) # no intercept model qgcomp.glm.ee(f=y ~ -1+z + x1 + x2, expnms = c('x1', 'x2'), data=dat, q=4, family=gaussian()) # Note that these give different answers! In the first, the estimate is conditional on Z, # but in the second, Z is marginalized over via standardization. The estimates # can be made approximately the same by centering Z (for linear models), but # the conditional estimate will typically have lower standard errors. dat$z = dat$z - mean(dat$z) # Conditional linear slope qgcomp.glm.noboot(y ~ z + x1 + x2, expnms = c('x1', 'x2'), data=dat, q=4, family=gaussian()) # Marginal linear slope (population average slope, for a purely linear, # additive model this will equal the conditional) qgcomp.glm.ee(f=y ~ z + x1 + x2, expnms = c('x1', 'x2'), data=dat, q=4, family=gaussian()) ## Not run: # Population average mixture slope which accounts for non-linearity and interactions # Note this is one use case for the estimating equations approach: replacing bootstrapping qgcomp.glm.ee(y ~ z + x1 + x2 + I(x1^2) + I(x2*x1), family="gaussian", expnms = c('x1', 'x2'), data=dat, q=4) qgcomp.glm.boot(y ~ z + x1 + x2 + I(x1^2) + I(x2*x1), family="gaussian", expnms = c('x1', 'x2'), data=dat, q=4, B=1000) # generally non-linear/non-addiive underlying models lead to non-linear mixture slopes dat$y = dat$y + dat$x1*dat$x2 qgcomp.glm.ee(y ~ z + x1 + x2 + I(x1^2) + I(x2*x1), family="gaussian", expnms = c('x1', 'x2'), data=dat, q=4, deg=2) qgcomp.glm.boot(y ~ z + x1 + x2 + I(x1^2) + I(x2*x1), family="gaussian", expnms = c('x1', 'x2'), data=dat, q=4, deg=2, B=1000) # binary outcome dat <- data.frame(y=rbinom(50,1,0.5), x1=runif(50), x2=runif(50), z=runif(50)) # Conditional mixture OR qgcomp.glm.noboot(y ~ z + x1 + x2, family="binomial", expnms = c('x1', 'x2'), data=dat, q=2) #Marginal mixture OR (population average OR - in general, this will not equal the # conditional mixture OR due to non-collapsibility of the OR) qgcomp.glm.boot(y ~ z + x1 + x2, family="binomial", expnms = c('x1', 'x2'), data=dat, q=2, B=300, MCsize=5000, rr=FALSE) qgcomp.glm.ee(y ~ z + x1 + x2, family="binomial", expnms = c('x1', 'x2'), data=dat, q=2, rr=FALSE) # Population average mixture RR qgcomp.glm.boot(y ~ z + x1 + x2, family="binomial", expnms = c('x1', 'x2'), data=dat, q=2, B=300, MCsize=5000, rr=TRUE) qgcomp.glm.ee(y ~ z + x1 + x2, family="binomial", expnms = c('x1', 'x2'), data=dat, q=2, rr=TRUE) # getting the RR with the poisson trick qgcomp.glm.ee(y ~ z + x1 + x2, family="poisson", expnms = c('x1', 'x2'), data=dat, q=2) qgcomp.glm.boot(y ~ z + x1 + x2, family="poisson", expnms = c('x1', 'x2'), data=dat, q=2, B=300, MCsize=5000) # Population average mixture RR, indicator variable representation of x2 # note that I(x==...) operates on the quantile-based category of x, # rather than the raw value res = qgcomp.glm.ee(y ~ z + x1 + I(x2==1) + I(x2==2) + I(x2==3), family="binomial", expnms = c('x1', 'x2'), data=dat, q=4, rr=TRUE) res$fit plot(res) # now add in a non-linear MSM res2a = qgcomp.glm.boot(y ~ z + x1 + I(x2==1) + I(x2==2) + I(x2==3), family="binomial", expnms = c('x1', 'x2'), data=dat, q=4, rr=TRUE, degree=2) res2 = qgcomp.glm.ee(y ~ z + x1 + I(x2==1) + I(x2==2) + I(x2==3), family="binomial", expnms = c('x1', 'x2'), data=dat, q=4, rr=TRUE, degree=2) # conditional model estimates (identical point estimates and similar std. errors) summary(res2a$fit)$coefficients res2$fit # msm estimates (identical point estimates and different std. errors) summary(res2a$msmfit)$coefficients # correct point estimates, incorrect standard errors res2 # correct point estimates, correct standard errors plot(res2) # Log risk ratio per one IQR change in all exposures (not on quantile basis) dat$x1iqr <- dat$x1/with(dat, diff(quantile(x1, c(.25, .75)))) dat$x2iqr <- dat$x2/with(dat, diff(quantile(x2, c(.25, .75)))) # note that I(x>...) now operates on the untransformed value of x, # rather than the quantized value res2 = qgcomp.glm.ee(y ~ z + x1iqr + I(x2iqr>0.1) + I(x2>0.4) + I(x2>0.9), family="binomial", expnms = c('x1iqr', 'x2iqr'), data=dat, q=NULL, rr=TRUE, degree=2) res2 # weighted model N=5000 dat4 <- data.frame(id=seq_len(N), x1=runif(N), x2=runif(N), z=runif(N)) dat4$y <- with(dat4, rnorm(N, x1*z + z, 1)) dat4$w=runif(N) + dat4$z*5 qdata = quantize(dat4, expnms = c("x1", "x2"), q=4)$data # first equivalent models with no covariates qgcomp.glm.noboot(f=y ~ x1 + x2, expnms = c('x1', 'x2'), data=dat4, q=4, family=gaussian()) qgcomp.glm.noboot(f=y ~ x1 + x2, expnms = c('x1', 'x2'), data=dat4, q=4, family=gaussian(), weights=w) set.seed(13) qgcomp.glm.ee(f=y ~ x1 + x2, expnms = c('x1', 'x2'), data=dat4, q=4, family=gaussian()) qgcomp.glm.ee(f=y ~ x1 + x2, expnms = c('x1', 'x2'), data=dat4, q=4, family=gaussian(), weights=w) # using the correct model set.seed(13) qgcomp.glm.ee(f=y ~ x1*z + x2, expnms = c('x1', 'x2'), data=dat4, q=4, family=gaussian(), weights=w, id="id") (qgcfit <- qgcomp.glm.ee(f=y ~ z + x1 + x2, expnms = c('x1', 'x2'), data=dat4, q=4, family=gaussian(), weights=w)) qgcfit$fit summary(glm(y ~ z + x1 + x2, data = qdata, weights=w)) ## End(Not run)
This function estimates a linear dose-response parameter representing a one quantile increase in a set of exposures of interest. This function is limited to linear and additive effects of individual components of the exposure. This model estimates the parameters of a marginal structural model (MSM) based on g-computation with quantized exposures. Note: this function is valid only under linear and additive effects of individual components of the exposure, but when these hold the model can be fit with very little computational burden. qgcomp.noboot is an equivalent function (slated for deprecation)
qgcomp.glm.noboot( f, data, expnms = NULL, q = 4, breaks = NULL, id = NULL, weights, alpha = 0.05, bayes = FALSE, ... ) qgcomp.noboot( f, data, expnms = NULL, q = 4, breaks = NULL, id = NULL, weights, alpha = 0.05, bayes = FALSE, ... )
qgcomp.glm.noboot( f, data, expnms = NULL, q = 4, breaks = NULL, id = NULL, weights, alpha = 0.05, bayes = FALSE, ... ) qgcomp.noboot( f, data, expnms = NULL, q = 4, breaks = NULL, id = NULL, weights, alpha = 0.05, bayes = FALSE, ... )
f |
R style formula |
data |
data frame |
expnms |
character vector of exposures of interest |
q |
NULL or number of quantiles used to create quantile indicator variables representing the exposure variables. If NULL, then gcomp proceeds with un-transformed version of exposures in the input datasets (useful if data are already transformed, or for performing standard g-computation) |
breaks |
(optional) NULL, or a list of (equal length) numeric vectors that characterize the minimum value of each category for which to break up the variables named in expnms. This is an alternative to using 'q' to define cutpoints. |
id |
(optional) NULL, or variable name indexing individual units of observation (only needed if analyzing data with multiple observations per id/cluster). Note that qgcomp.glm.noboot will not produce cluster-appropriate standard errors (this parameter is essentially ignored in qgcomp.glm.noboot). qgcomp.glm.boot can be used for this, which will use bootstrap sampling of clusters/individuals to estimate cluster-appropriate standard errors via bootstrapping. |
weights |
"case weights" - passed to the "weight" argument of
|
alpha |
alpha level for confidence limit calculation |
bayes |
use underlying Bayesian model ( |
... |
arguments to glm (e.g. family) |
For continuous outcomes, under a linear model with no interaction terms, this is equivalent to g-computation of the effect of increasing every exposure by 1 quantile. For binary/count outcomes outcomes, this yields a conditional log odds/rate ratio(s) representing the change in the expected conditional odds/rate (conditional on covariates) from increasing every exposure by 1 quantile. In general, the latter quantity is not equivalent to g-computation estimates. Hypothesis test statistics and confidence intervals are based on using the delta estimate variance of a linear combination of random variables.
a qgcompfit object, which contains information about the effect measure of interest (psi) and associated variance (var.psi), as well as information on the model fit (fit) and information on the weights/standardized coefficients in the positive (pos.weights) and negative (neg.weights) directions.
Other qgcomp_methods:
qgcomp.cch.noboot()
,
qgcomp.cox.boot()
,
qgcomp.cox.noboot()
,
qgcomp.glm.boot()
,
qgcomp.glm.ee()
,
qgcomp.hurdle.boot()
,
qgcomp.hurdle.noboot()
,
qgcomp.multinomial.boot()
,
qgcomp.multinomial.noboot()
,
qgcomp.partials()
,
qgcomp.zi.boot()
,
qgcomp.zi.noboot()
set.seed(50) # linear model dat <- data.frame(y=runif(50,-1,1), x1=runif(50), x2=runif(50), z=runif(50)) qgcomp.glm.noboot(f=y ~ z + x1 + x2, expnms = c('x1', 'x2'), data=dat, q=2, family=gaussian()) # not intercept model qgcomp.glm.noboot(f=y ~-1+ z + x1 + x2, expnms = c('x1', 'x2'), data=dat, q=2, family=gaussian()) # logistic model dat2 <- data.frame(y=rbinom(50, 1,0.5), x1=runif(50), x2=runif(50), z=runif(50)) qgcomp.glm.noboot(f=y ~ z + x1 + x2, expnms = c('x1', 'x2'), data=dat2, q=2, family=binomial()) # poisson model dat3 <- data.frame(y=rpois(50, .5), x1=runif(50), x2=runif(50), z=runif(50)) qgcomp.glm.noboot(f=y ~ z + x1 + x2, expnms = c('x1', 'x2'), data=dat3, q=2, family=poisson()) # weighted model N=5000 dat4 <- data.frame(y=runif(N), x1=runif(N), x2=runif(N), z=runif(N)) dat4$w=runif(N)*2 qdata = quantize(dat4, expnms = c("x1", "x2"))$data (qgcfit <- qgcomp.glm.noboot(f=y ~ z + x1 + x2, expnms = c('x1', 'x2'), data=dat4, q=4, family=gaussian(), weights=w)) qgcfit$fit glm(y ~ z + x1 + x2, data = qdata, weights=w)
set.seed(50) # linear model dat <- data.frame(y=runif(50,-1,1), x1=runif(50), x2=runif(50), z=runif(50)) qgcomp.glm.noboot(f=y ~ z + x1 + x2, expnms = c('x1', 'x2'), data=dat, q=2, family=gaussian()) # not intercept model qgcomp.glm.noboot(f=y ~-1+ z + x1 + x2, expnms = c('x1', 'x2'), data=dat, q=2, family=gaussian()) # logistic model dat2 <- data.frame(y=rbinom(50, 1,0.5), x1=runif(50), x2=runif(50), z=runif(50)) qgcomp.glm.noboot(f=y ~ z + x1 + x2, expnms = c('x1', 'x2'), data=dat2, q=2, family=binomial()) # poisson model dat3 <- data.frame(y=rpois(50, .5), x1=runif(50), x2=runif(50), z=runif(50)) qgcomp.glm.noboot(f=y ~ z + x1 + x2, expnms = c('x1', 'x2'), data=dat3, q=2, family=poisson()) # weighted model N=5000 dat4 <- data.frame(y=runif(N), x1=runif(N), x2=runif(N), z=runif(N)) dat4$w=runif(N)*2 qdata = quantize(dat4, expnms = c("x1", "x2"))$data (qgcfit <- qgcomp.glm.noboot(f=y ~ z + x1 + x2, expnms = c('x1', 'x2'), data=dat4, q=4, family=gaussian(), weights=w)) qgcfit$fit glm(y ~ z + x1 + x2, data = qdata, weights=w)
This function estimates a linear dose-response parameter representing a one quantile increase in a set of exposures of interest for hurdle count outcomes. This function is limited to linear and additive effects of individual components of the exposure. This model estimates the parameters of a marginal structural hurdle count model (MSM) based on g-computation with quantized exposures. Note: this function allows linear and non-additive effects of individual components of the exposure, as well as non-linear joint effects of the mixture via polynomial basis functions, which increase the computational computational burden due to the need for non-parametric bootstrapping.
qgcomp.hurdle.boot( f, data, expnms = NULL, q = 4, breaks = NULL, id = NULL, weights, alpha = 0.05, B = 200, degree = 1, seed = NULL, bayes = FALSE, parallel = FALSE, MCsize = 10000, msmcontrol = hurdlemsm_fit.control(), parplan = FALSE, ... )
qgcomp.hurdle.boot( f, data, expnms = NULL, q = 4, breaks = NULL, id = NULL, weights, alpha = 0.05, B = 200, degree = 1, seed = NULL, bayes = FALSE, parallel = FALSE, MCsize = 10000, msmcontrol = hurdlemsm_fit.control(), parplan = FALSE, ... )
f |
R style formula |
data |
data frame |
expnms |
character vector of exposures of interest |
q |
NULL or number of quantiles used to create quantile indicator variables representing the exposure variables. If NULL, then gcomp proceeds with un-transformed version of exposures in the input datasets (useful if data are already transformed, or for performing standard g-computation) |
breaks |
(optional) NULL, or a list of (equal length) numeric vectors that characterize the minimum value of each category for which to break up the variables named in expnms. This is an alternative to using 'q' to define cutpoints. |
id |
(optional) NULL, or variable name indexing individual units of observation (only needed if analyzing data with multiple observations per id/cluster) |
weights |
"case weights" - passed to the "weight" argument of
|
alpha |
alpha level for confidence limit calculation |
B |
integer: number of bootstrap iterations (this should typically be >=200, though it is set lower in examples to improve run-time). |
degree |
polynomial basis function for marginal model (e.g. degree = 2 allows that the relationship between the whole exposure mixture and the outcome is quadratic. |
seed |
integer or NULL: random number seed for replicable bootstrap results |
bayes |
not currently implemented. |
parallel |
use (safe) parallel processing from the future and future.apply packages |
MCsize |
integer: sample size for simulation to approximate marginal zero inflated model parameters. This can be left small for testing, but should be as large as needed to reduce simulation error to an acceptable magnitude (can compare psi coefficients for linear fits with qgcomp.hurdle.noboot to gain some intuition for the level of expected simulation error at a given value of MCsize) |
msmcontrol |
named list from |
parplan |
(logical, default=FALSE) automatically set future::plan to plan(multisession) (and set to existing plan, if any, after bootstrapping) |
... |
arguments to glm (e.g. family) |
Hurdle count models allow excess zeros in standard count outcome (e.g. Poisson distributed outcomes). Such models have two components: 1) the probability of arising from a degenerate distribution at zero (versus arising from a count distribution) and 2) the rate parameter of a (possibly truncated > 0) count distribution. Thus, one has the option of allowing exposure and covariate effects on the zero distribution, the count distribution, or both. The zero distribution parameters correspond to log-odds ratios for the probability of arising from the zero distribution. Count distribution parameters correspond to log-rate-ratio parameters. Test statistics and confidence intervals are based on a non-parametric bootstrap, using the standard deviation of the bootstrap estimates to estimate the standard error. The bootstrap standard error is then used to estimate Wald-type confidence intervals. Note that no bootstrapping is done on estimated quantiles of exposure, so these are treated as fixed quantities.
Of note, this function yields marginal estimates of the expected outcome under values of the joint exposure quantiles (e.g. the expected outcome if all exposures are below the 1st quartile). These outcomes can be used to derive estimates of the effect on the marginal expectation of the outcome, irrespective of zero/count portions of the statistical model.
Estimates correspond to the average expected change in the (log) outcome per quantile increase in the joint exposure to all exposures in ‘expnms’. Test statistics and confidence intervals are based on a non-parametric bootstrap, using the standard deviation of the bootstrap estimates to estimate the standard error. The bootstrap standard error is then used to estimate Wald-type confidence intervals. Note that no bootstrapping is done on estimated quantiles of exposure, so these are treated as fixed quantities
a qgcompfit object, which contains information about the effect measure of interest (psi) and associated variance (var.psi), as well as information on the model fit (fit) and information on the marginal structural model (msmfit) used to estimate the final effect estimates.
Other qgcomp_methods:
qgcomp.cch.noboot()
,
qgcomp.cox.boot()
,
qgcomp.cox.noboot()
,
qgcomp.glm.boot()
,
qgcomp.glm.ee()
,
qgcomp.glm.noboot()
,
qgcomp.hurdle.noboot()
,
qgcomp.multinomial.boot()
,
qgcomp.multinomial.noboot()
,
qgcomp.partials()
,
qgcomp.zi.boot()
,
qgcomp.zi.noboot()
set.seed(50) n=500 dat <- data.frame(y=rbinom(n, 1, 0.5)*rpois(n, 1.2), x1=runif(n), x2=runif(n), z=runif(n)) # poisson count model, mixture in both portions ## Not run: # warning: the examples below can take a long time to run res = qgcomp.hurdle.boot(f=y ~ x1 + x2 | x1 + x2, expnms = c('x1', 'x2'), data=dat, q=4, dist="poisson", B=1000, MCsize=10000, parallel=TRUE, parplan=TRUE) qgcomp.hurdle.noboot(f=y ~ x1 + x2 | x1 + x2, expnms = c('x1', 'x2'), data=dat, q=4, dist="poisson") res # accuracy for small MCsize is suspect (compare coefficients between boot/noboot versions), # so re-check with MCsize set to larger value (this takes a long time to run) res2 = qgcomp.hurdle.boot(f=y ~ x1 + x2 | x1 + x2, expnms = c('x1', 'x2'), data=dat, q=4, dist="poisson", B=1000, MCsize=50000, parallel=TRUE, parplan=TRUE) res2 plot(density(res2$bootsamps[4,])) # negative binomial count model, mixture and covariate in both portions qgcomp.hurdle.boot(f=y ~ z + x1 + x2 | z + x1 + x2, expnms = c('x1', 'x2'), data=dat, q=4, dist="negbin", B=10, MCsize=10000) # weighted analysis (NOTE THIS DOES NOT WORK WITH parallel=TRUE!) dat$w = runif(n)*5 qgcomp.hurdle.noboot(f=y ~ z + x1 + x2 | x1 + x2, expnms = c('x1', 'x2'), data=dat, q=4, dist="poisson", weights=w) # You may see this: # Warning message: # In eval(family$initialize) : non-integer #successes in a binomial glm! qgcomp.hurdle.boot(f=y ~ x1 + x2 | x1 + x2, expnms = c('x1', 'x2'), data=dat, q=4, dist="poisson", B=5, MCsize=50000, parallel=FALSE, weights=w) # Log rr per one IQR change in all exposures (not on quantile basis) dat$x1iqr <- dat$x1/with(dat, diff(quantile(x1, c(.25, .75)))) dat$x2iqr <- dat$x2/with(dat, diff(quantile(x2, c(.25, .75)))) # note that I(x>...) now operates on the untransformed value of x, # rather than the quantized value res2 = qgcomp.hurdle.boot(f=y ~ z + x1iqr + x2iqr + I(x2iqr>0.1) + I(x2iqr>0.4) + I(x2iqr>0.9) | x1iqr + x2iqr, expnms = c('x1iqr', 'x2iqr'), data=dat, q=NULL, B=2, degree=2, MCsize=2000, dist="poisson") res2 ## End(Not run)
set.seed(50) n=500 dat <- data.frame(y=rbinom(n, 1, 0.5)*rpois(n, 1.2), x1=runif(n), x2=runif(n), z=runif(n)) # poisson count model, mixture in both portions ## Not run: # warning: the examples below can take a long time to run res = qgcomp.hurdle.boot(f=y ~ x1 + x2 | x1 + x2, expnms = c('x1', 'x2'), data=dat, q=4, dist="poisson", B=1000, MCsize=10000, parallel=TRUE, parplan=TRUE) qgcomp.hurdle.noboot(f=y ~ x1 + x2 | x1 + x2, expnms = c('x1', 'x2'), data=dat, q=4, dist="poisson") res # accuracy for small MCsize is suspect (compare coefficients between boot/noboot versions), # so re-check with MCsize set to larger value (this takes a long time to run) res2 = qgcomp.hurdle.boot(f=y ~ x1 + x2 | x1 + x2, expnms = c('x1', 'x2'), data=dat, q=4, dist="poisson", B=1000, MCsize=50000, parallel=TRUE, parplan=TRUE) res2 plot(density(res2$bootsamps[4,])) # negative binomial count model, mixture and covariate in both portions qgcomp.hurdle.boot(f=y ~ z + x1 + x2 | z + x1 + x2, expnms = c('x1', 'x2'), data=dat, q=4, dist="negbin", B=10, MCsize=10000) # weighted analysis (NOTE THIS DOES NOT WORK WITH parallel=TRUE!) dat$w = runif(n)*5 qgcomp.hurdle.noboot(f=y ~ z + x1 + x2 | x1 + x2, expnms = c('x1', 'x2'), data=dat, q=4, dist="poisson", weights=w) # You may see this: # Warning message: # In eval(family$initialize) : non-integer #successes in a binomial glm! qgcomp.hurdle.boot(f=y ~ x1 + x2 | x1 + x2, expnms = c('x1', 'x2'), data=dat, q=4, dist="poisson", B=5, MCsize=50000, parallel=FALSE, weights=w) # Log rr per one IQR change in all exposures (not on quantile basis) dat$x1iqr <- dat$x1/with(dat, diff(quantile(x1, c(.25, .75)))) dat$x2iqr <- dat$x2/with(dat, diff(quantile(x2, c(.25, .75)))) # note that I(x>...) now operates on the untransformed value of x, # rather than the quantized value res2 = qgcomp.hurdle.boot(f=y ~ z + x1iqr + x2iqr + I(x2iqr>0.1) + I(x2iqr>0.4) + I(x2iqr>0.9) | x1iqr + x2iqr, expnms = c('x1iqr', 'x2iqr'), data=dat, q=NULL, B=2, degree=2, MCsize=2000, dist="poisson") res2 ## End(Not run)
This function estimates a linear dose-response parameter representing a one quantile increase in a set of exposures of interest for hurdle count outcomes. This function is limited to linear and additive effects of individual components of the exposure. This model estimates the parameters of a marginal structural hurdle model (MSM) based on g-computation with quantized exposures. Note: this function is valid only under linear and additive effects of individual components of the exposure, but when these hold the model can be fit with very little computational burden.
qgcomp.hurdle.noboot( f, data, expnms = NULL, q = 4, breaks = NULL, id = NULL, weights, alpha = 0.05, bayes = FALSE, ... )
qgcomp.hurdle.noboot( f, data, expnms = NULL, q = 4, breaks = NULL, id = NULL, weights, alpha = 0.05, bayes = FALSE, ... )
f |
R style formula using syntax from 'pscl' package: depvar ~ indvars_count | indvars_zero |
data |
data frame |
expnms |
character vector of exposures of interest |
q |
NULL or number of quantiles used to create quantile indicator variables representing the exposure variables. If NULL, then gcomp proceeds with un-transformed version of exposures in the input datasets (useful if data are already transformed, or for performing standard g-computation) |
breaks |
(optional) NULL, or a list of (equal length) numeric vectors that characterize the minimum value of each category for which to break up the variables named in expnms. This is an alternative to using 'q' to define cutpoints. |
id |
(optional) NULL, or variable name indexing individual units of observation (only needed if analyzing data with multiple observations per id/cluster) |
weights |
"case weights" - passed to the "weight" argument of
|
alpha |
alpha level for confidence limit calculation |
bayes |
not yet implemented |
... |
arguments to hurdle (e.g. dist) |
A hurdle version of quantile g-computation based on the implementation in the 'pscl' package. A hurdle distribution is a mixture distribution in which one of the distributions is a point mass at zero (with probability given by a logistic model), and the other distribution is a discrete or continuous distribution. This estimates the effect of a joint increase in all exposures on 1) the odds of belonging to the "zero" vs. "count" portions of the distribution and/or 2) the rate parameter for the "count" portion of the distribution.
a qgcompfit object, which contains information about the effect measure of interest (psi) and associated variance (var.psi), as well as information on the model fit (fit) and information on the weights/standardized coefficients in the positive (pos.weights) and negative (neg.weights) directions.
qgcomp.hurdle.boot
,qgcomp.glm.noboot
,
qgcomp.cox.noboot
, and hurdle
Other qgcomp_methods:
qgcomp.cch.noboot()
,
qgcomp.cox.boot()
,
qgcomp.cox.noboot()
,
qgcomp.glm.boot()
,
qgcomp.glm.ee()
,
qgcomp.glm.noboot()
,
qgcomp.hurdle.boot()
,
qgcomp.multinomial.boot()
,
qgcomp.multinomial.noboot()
,
qgcomp.partials()
,
qgcomp.zi.boot()
,
qgcomp.zi.noboot()
set.seed(50) n=100 dat <- data.frame(y=rbinom(n, 1, 0.5)*rpois(n, 1.2), x1=runif(n), x2=runif(n), z=runif(n)) # poisson count model, mixture in both portions qgcomp.hurdle.noboot(f=y ~ z + x1 + x2 | x1 + x2, expnms = c('x1', 'x2'), data=dat, q=2, dist="poisson") # negative binomial count model, mixture and covariate in both portions qgcomp.hurdle.noboot(f=y ~ z + x1 + x2 | z + x1 + x2, expnms = c('x1', 'x2'), data=dat, q=2, dist="negbin") qgcomp.hurdle.noboot(f=y ~ z + x1 + x2, expnms = c('x1', 'x2'), data=dat, q=2, dist="negbin") # equivalent # negative binomial count model, mixture only in the 'count' portion of the model qgcomp.hurdle.noboot(f=y ~ z + x1 + x2 | z, expnms = c('x1', 'x2'), data=dat, q=2, dist="negbin") # weighted analysis dat$w = runif(n)*5 qgcomp.hurdle.noboot(f=y ~ z + x1 + x2 | x1 + x2, expnms = c('x1', 'x2'), data=dat, q=2, dist="poisson", weights=w) # Expect this: # Warning message: # In eval(family$initialize) : non-integer #successes in a binomial glm!
set.seed(50) n=100 dat <- data.frame(y=rbinom(n, 1, 0.5)*rpois(n, 1.2), x1=runif(n), x2=runif(n), z=runif(n)) # poisson count model, mixture in both portions qgcomp.hurdle.noboot(f=y ~ z + x1 + x2 | x1 + x2, expnms = c('x1', 'x2'), data=dat, q=2, dist="poisson") # negative binomial count model, mixture and covariate in both portions qgcomp.hurdle.noboot(f=y ~ z + x1 + x2 | z + x1 + x2, expnms = c('x1', 'x2'), data=dat, q=2, dist="negbin") qgcomp.hurdle.noboot(f=y ~ z + x1 + x2, expnms = c('x1', 'x2'), data=dat, q=2, dist="negbin") # equivalent # negative binomial count model, mixture only in the 'count' portion of the model qgcomp.hurdle.noboot(f=y ~ z + x1 + x2 | z, expnms = c('x1', 'x2'), data=dat, q=2, dist="negbin") # weighted analysis dat$w = runif(n)*5 qgcomp.hurdle.noboot(f=y ~ z + x1 + x2 | x1 + x2, expnms = c('x1', 'x2'), data=dat, q=2, dist="poisson", weights=w) # Expect this: # Warning message: # In eval(family$initialize) : non-integer #successes in a binomial glm!
This function estimates a dose-response parameter representing a one quantile increase in a set of exposures of interest. This model estimates the parameters of a marginal structural model (MSM) based on g-computation with quantized exposures. Note: this function allows linear and non-additive effects of individual components of the exposure, as well as non-linear joint effects of the mixture via polynomial basis functions, which increase the computational computational burden due to the need for non-parametric bootstrapping.
qgcomp.multinomial.boot( f, data, expnms = NULL, q = 4, breaks = NULL, id = NULL, weights, alpha = 0.05, B = 200, rr = TRUE, degree = 1, seed = NULL, bayes = FALSE, MCsize = nrow(data), parallel = FALSE, parplan = FALSE, ... )
qgcomp.multinomial.boot( f, data, expnms = NULL, q = 4, breaks = NULL, id = NULL, weights, alpha = 0.05, B = 200, rr = TRUE, degree = 1, seed = NULL, bayes = FALSE, MCsize = nrow(data), parallel = FALSE, parplan = FALSE, ... )
f |
R style formula |
data |
data frame |
expnms |
character vector of exposures of interest |
q |
NULL or number of quantiles used to create quantile indicator variables representing the exposure variables. If NULL, then gcomp proceeds with un-transformed version of exposures in the input datasets (useful if data are already transformed, or for performing standard g-computation) |
breaks |
(optional) NULL, or a list of (equal length) numeric vectors that characterize the minimum value of each category for which to break up the variables named in expnms. This is an alternative to using 'q' to define cutpoints. |
id |
(optional) NULL, or variable name indexing individual units of observation (only needed if analyzing data with multiple observations per id/cluster). Note that qgcomp.multinomial.noboot will not produce cluster-appropriate standard errors. qgcomp.multinomial.boot can be used for this, which will use bootstrap sampling of clusters/individuals to estimate cluster-appropriate standard errors via bootstrapping. |
weights |
"case weights" - passed to the "weight" argument of
|
alpha |
alpha level for confidence limit calculation |
B |
integer: number of bootstrap iterations (this should typically be >=200, though it is set lower in examples to improve run-time). |
rr |
(not used) |
degree |
polynomial bases for marginal model (e.g. degree = 2 allows that the relationship between the whole exposure mixture and the outcome is quadratic (default = 1). |
seed |
integer or NULL: random number seed for replicable bootstrap results |
bayes |
use underlying Bayesian model ( |
MCsize |
integer: sample size for simulation to approximate marginal zero inflated model parameters. This can be left small for testing, but should be as large as needed to reduce simulation error to an acceptable magnitude (can compare psi coefficients for linear fits with qgcomp.multinomial.noboot to gain some intuition for the level of expected simulation error at a given value of MCsize). This likely won't matter much in linear models, but may be important with binary or count outcomes. |
parallel |
use (safe) parallel processing from the future and future.apply packages |
parplan |
(logical, default=FALSE) automatically set future::plan to plan(multisession) (and set to existing plan, if any, after bootstrapping) |
... |
arguments to nnet::multinom |
Estimates correspond to the average expected change in the probability of an outcome type per quantile increase in the joint exposure to all exposures in ‘expnms’. Test statistics and confidence intervals are based on a non-parametric bootstrap, using the standard deviation of the bootstrap estimates to estimate the standard error. The bootstrap standard error is then used to estimate Wald-type confidence intervals. Note that no bootstrapping is done on estimated quantiles of exposure, so these are treated as fixed quantities
a qgcompfit object, which contains information about the effect measure of interest (psi) and associated variance (var.psi), as well as information on the model fit (fit) and information on the marginal structural model (msmfit) used to estimate the final effect estimates.
Other qgcomp_methods:
qgcomp.cch.noboot()
,
qgcomp.cox.boot()
,
qgcomp.cox.noboot()
,
qgcomp.glm.boot()
,
qgcomp.glm.ee()
,
qgcomp.glm.noboot()
,
qgcomp.hurdle.boot()
,
qgcomp.hurdle.noboot()
,
qgcomp.multinomial.noboot()
,
qgcomp.partials()
,
qgcomp.zi.boot()
,
qgcomp.zi.noboot()
data("metals") # from qgcomp package # create categorical outcome from the existing continuous outcome (usually, one will already exist) metals$ycat = factor(quantize(metals, "y",q=4)$data$y, levels=c("0", "1", "2", "3"), labels=c("cct", "ccg", "aat", "aag")) # restrict to smaller dataset for simplicity smallmetals = metals[,c("ycat", "arsenic", "lead", "cadmium", "mage35")] ### 1: Define mixture and underlying model #### mixture = c("arsenic", "lead", "cadmium") f0 = ycat ~ arsenic + lead + cadmium # the multinomial model # (be sure that factor variables are properly coded ahead of time in the dataset) rr = qgcomp.multinomial.boot( f0, expnms = mixture, q=4, data = smallmetals, B = 5, # set to higher values in real examples MCsize = 100, # set to higher values in small samples ) rr2 = qgcomp.multinomial.noboot( f0, expnms = mixture, q=4, data = smallmetals ) ### 5: Create summary qgcomp object for nice printing #### summary(rr, tests=c("H")) # include homogeneity test # 95% confidence intervals #confint(rr, level=0.95) #rr$breaks # quantile cutpoints for exposures # homogeneity_test(rr) #joint_test(rr) qdat = simdata_quantized( outcometype="multinomial", n=10000, corr=c(-.9), coef=cbind(c(.2,-.2,0,0), c(.1,.1,.1,.1)), q = 4 ) rr_sim = qgcomp.multinomial.noboot( y~x1+x2+x3+x4, expnms = c("x1", "x2", "x3", "x4"), q=4, data = qdat ) rr_sim2 = qgcomp.multinomial.boot( y~x1+x2+x3+x4, expnms = c("x1", "x2", "x3", "x4"), q=4, data = qdat, B=1 )
data("metals") # from qgcomp package # create categorical outcome from the existing continuous outcome (usually, one will already exist) metals$ycat = factor(quantize(metals, "y",q=4)$data$y, levels=c("0", "1", "2", "3"), labels=c("cct", "ccg", "aat", "aag")) # restrict to smaller dataset for simplicity smallmetals = metals[,c("ycat", "arsenic", "lead", "cadmium", "mage35")] ### 1: Define mixture and underlying model #### mixture = c("arsenic", "lead", "cadmium") f0 = ycat ~ arsenic + lead + cadmium # the multinomial model # (be sure that factor variables are properly coded ahead of time in the dataset) rr = qgcomp.multinomial.boot( f0, expnms = mixture, q=4, data = smallmetals, B = 5, # set to higher values in real examples MCsize = 100, # set to higher values in small samples ) rr2 = qgcomp.multinomial.noboot( f0, expnms = mixture, q=4, data = smallmetals ) ### 5: Create summary qgcomp object for nice printing #### summary(rr, tests=c("H")) # include homogeneity test # 95% confidence intervals #confint(rr, level=0.95) #rr$breaks # quantile cutpoints for exposures # homogeneity_test(rr) #joint_test(rr) qdat = simdata_quantized( outcometype="multinomial", n=10000, corr=c(-.9), coef=cbind(c(.2,-.2,0,0), c(.1,.1,.1,.1)), q = 4 ) rr_sim = qgcomp.multinomial.noboot( y~x1+x2+x3+x4, expnms = c("x1", "x2", "x3", "x4"), q=4, data = qdat ) rr_sim2 = qgcomp.multinomial.boot( y~x1+x2+x3+x4, expnms = c("x1", "x2", "x3", "x4"), q=4, data = qdat, B=1 )
Quantile g-computation for multinomial outcomes
qgcomp.multinomial.noboot( f, data, expnms = NULL, q = 4, breaks = NULL, id = NULL, weights, alpha = 0.05, bayes = FALSE, ... )
qgcomp.multinomial.noboot( f, data, expnms = NULL, q = 4, breaks = NULL, id = NULL, weights, alpha = 0.05, bayes = FALSE, ... )
f |
R style formula |
data |
data frame |
expnms |
character vector of exposures of interest |
q |
NULL or number of quantiles used to create quantile indicator variables representing the exposure variables. If NULL, then gcomp proceeds with un-transformed version of exposures in the input datasets (useful if data are already transformed, or for performing standard g-computation) |
breaks |
(optional) NULL, or a list of (equal length) numeric vectors that characterize the minimum value of each category for which to break up the variables named in expnms. This is an alternative to using 'q' to define cutpoints. |
id |
(optional) NULL, or variable name indexing individual units of observation (only needed if analyzing data with multiple observations per id/cluster). Note that qgcomp.multinomial.noboot will not produce cluster-appropriate standard errors (this parameter is essentially ignored in qgcomp.multinomial.noboot). qgcomp.qgcomp.multinomial.boot can be used for this, which will use bootstrap sampling of clusters/individuals to estimate cluster-appropriate standard errors via bootstrapping. |
weights |
"case weights" - passed to the "weight" argument of
|
alpha |
alpha level for confidence limit calculation |
bayes |
Logical, Not yet implemented (gives and error if set to TRUE) |
... |
arguments to nnet::multinom |
a qgcompmultfit object, which contains information about the effect measure of interest (psi) and associated variance (var.psi), as well as information on the model fit (fit) and information on the weights/standardized coefficients in the positive and negative directions (weights).
Other qgcomp_methods:
qgcomp.cch.noboot()
,
qgcomp.cox.boot()
,
qgcomp.cox.noboot()
,
qgcomp.glm.boot()
,
qgcomp.glm.ee()
,
qgcomp.glm.noboot()
,
qgcomp.hurdle.boot()
,
qgcomp.hurdle.noboot()
,
qgcomp.multinomial.boot()
,
qgcomp.partials()
,
qgcomp.zi.boot()
,
qgcomp.zi.noboot()
data("metals") # from qgcomp package # create categorical outcome from the existing continuous outcome (usually, one will already exist) metals$ycat = factor(quantize(metals, "y",q=4)$data$y, levels=c("0", "1", "2", "3"), labels=c("cct", "ccg", "aat", "aag")) # restrict to smaller dataset for simplicity smallmetals = metals[,c("ycat", "arsenic", "lead", "cadmium", "mage35")] ### 1: Define mixture and underlying model #### mixture = c("arsenic", "lead", "cadmium") f0 = ycat ~ arsenic + lead + cadmium # the multinomial model # (be sure that factor variables are properly coded ahead of time in the dataset) rr = qgcomp.multinomial.noboot( f0, expnms = mixture, q=4, data = smallmetals, ) ### 5: Create summary qgcomp object for nice printing #### summary(rr, tests=c("H")) # include homogeneity test # 95% confidence intervals confint(rr, level=0.95) rr$breaks # quantile cutpoints for exposures # homogeneity_test(rr) joint_test(rr)
data("metals") # from qgcomp package # create categorical outcome from the existing continuous outcome (usually, one will already exist) metals$ycat = factor(quantize(metals, "y",q=4)$data$y, levels=c("0", "1", "2", "3"), labels=c("cct", "ccg", "aat", "aag")) # restrict to smaller dataset for simplicity smallmetals = metals[,c("ycat", "arsenic", "lead", "cadmium", "mage35")] ### 1: Define mixture and underlying model #### mixture = c("arsenic", "lead", "cadmium") f0 = ycat ~ arsenic + lead + cadmium # the multinomial model # (be sure that factor variables are properly coded ahead of time in the dataset) rr = qgcomp.multinomial.noboot( f0, expnms = mixture, q=4, data = smallmetals, ) ### 5: Create summary qgcomp object for nice printing #### summary(rr, tests=c("H")) # include homogeneity test # 95% confidence intervals confint(rr, level=0.95) rr$breaks # quantile cutpoints for exposures # homogeneity_test(rr) joint_test(rr)
Obtain effect estimates for "partial positive" and "partial negative" effects using quantile g-computation. This approach uses sample splitting to evaluate the overall impact of a set of variables with effects in a single direction, where, using training data, all variables with effects in the same direction are grouped.
qgcomp.partials( fun = c("qgcomp.glm.noboot", "qgcomp.cox.noboot", "qgcomp.zi.noboot"), traindata = NULL, validdata = NULL, expnms = NULL, .fixbreaks = TRUE, .globalbreaks = FALSE, ... )
qgcomp.partials( fun = c("qgcomp.glm.noboot", "qgcomp.cox.noboot", "qgcomp.zi.noboot"), traindata = NULL, validdata = NULL, expnms = NULL, .fixbreaks = TRUE, .globalbreaks = FALSE, ... )
fun |
character variable in the set "qgcomp.glm.noboot" (binary, count, continuous outcomes), "qgcomp.cox.noboot" (survival outcomes), "qgcomp.zi.noboot" (zero inflated outcomes). This describes which qgcomp package function is used to fit the model. (default = "qgcomp.glm.noboot") |
traindata |
Data frame with training data |
validdata |
Data frame with validation data |
expnms |
Exposure mixture of interest |
.fixbreaks |
(logical, overridden by .globalbreaks) Use the same quantile cutpoints in the training and validation data (selected in the training data). As of version 2.8.11, the default is TRUE, whereas it was implicitly FALSE in prior verions. Setting to TRUE increases variance but greatly decreases bias in smaller samples. |
.globalbreaks |
(logical, if TRUE, overrides .fixbreaks) Use the same quantile cutpoints in the training and validation data (selected in combined training and validation data). As of version 2.8.11, the default is TRUE, whereas it was implicitly FALSE in prior verions. Setting to TRUE increases variance but greatly decreases bias in smaller samples. |
... |
Arguments to |
In the basic (non bootstrapped) qgcomp
functions, the positive and
negative "sums
of coefficients" or "partial effect sizes" are given, which equal the sum
of the negative and positive coefficients in the underlying model. Unfortunately,
these partial effects don't constitute variables for which we can derive confidence
intervals or hypothesis tests, so they are mainly for exploratory purposes. By employing
sample splitting, however, we can obtain better estimates of these partial effects.
Sample splitting proceeds by partitioning the data into two samples (40/60 training/validtion split seems acceptable in many circumstances). The "overall mixture effect" is then estimated in the training data, and the mixture variables with positive and negative coefficients are split into separate groups. These two different groups are then used as "the mixture of interest" in two additional qgcomp fits, where the mixture of interest is adjusted for the other exposure variables. For example, if the "positive partial effect" is of interest, then this effect is equal to the sum of the coefficients in the qgcomp model fit to the validation data, with the mixture of interest selected by the original fit to the training data (note that some of these coefficients may be negative in the fit to the validation data - this is expected and necessary for valid hypothesis tests).
The positive/negative partial effects are necessarily exploratory, but sample splitting preserves the statistical properties at the expense of wider confidence intervals and larger variances. The two resulting mixture groups groups should be inspected for
A 'qgcompmultifit' object, which inherits from list
, which contains
character vector of variable names with positive coefficients in the qgcomp model fit to the training data
character vector of variable names with negative coefficients in the qgcomp model fit to the training data
a qgcompfit object fit to the validation data, in which the exposures of interest are contained in 'posmix'
a qgcompfit object fit to the validation data, in which the exposures of interest are contained in 'negmix'
Other qgcomp_methods:
qgcomp.cch.noboot()
,
qgcomp.cox.boot()
,
qgcomp.cox.noboot()
,
qgcomp.glm.boot()
,
qgcomp.glm.ee()
,
qgcomp.glm.noboot()
,
qgcomp.hurdle.boot()
,
qgcomp.hurdle.noboot()
,
qgcomp.multinomial.boot()
,
qgcomp.multinomial.noboot()
,
qgcomp.zi.boot()
,
qgcomp.zi.noboot()
set.seed(123223) dat = qgcomp::simdata_quantized(n=1000, outcomtype="continuous", cor=c(.75, 0), b0=0, coef=c(0.25,-0.25,0,0), q=4) cor(dat) # overall fit (more or less null due to counteracting exposures) (overall <- qgcomp.glm.noboot(f=y~., q=NULL, expnms=c("x1", "x2", "x3", "x4"), data=dat)) # partial effects using 40% training/60% validation split trainidx <- sample(1:nrow(dat), round(nrow(dat)*0.4)) valididx <- setdiff(1:nrow(dat),trainidx) traindata = dat[trainidx,] validdata = dat[valididx,] splitres <- qgcomp.partials(fun="qgcomp.glm.noboot", f=y~., q=NULL, traindata=traindata,validdata=validdata, expnms=c("x1", "x2", "x3", "x4")) splitres ## Not run: # under the null, both should give null results set.seed(123223) dat = simdata_quantized(n=1000, outcomtype="continuous", cor=c(.75, 0), b0=0, coef=c(0,0,0,0), q=4) # 40% training/60% validation trainidx2 <- sample(1:nrow(dat), round(nrow(dat)*0.4)) valididx2 <- setdiff(1:nrow(dat),trainidx2) traindata2 <- dat[trainidx2,] validdata2 <- dat[valididx2,] splitres2 <- qgcomp.partials(fun="qgcomp.glm.noboot", f=y~., q=NULL, traindata=traindata2,validdata=validdata2, expnms=c("x1", "x2", "x3", "x4")) splitres2 # 60% training/40% validation trainidx3 <- sample(1:nrow(dat), round(nrow(dat)*0.6)) valididx3 <- setdiff(1:nrow(dat),trainidx3) traindata3 <- dat[trainidx3,] validdata3 <- dat[valididx3,] splitres3 <- qgcomp.partials(fun="qgcomp.glm.noboot", f=y~., q=NULL, traindata=traindata3,validdata=validdata3, expnms=c("x1", "x2", "x3", "x4")) splitres3 # survival outcome set.seed(50) N=1000 dat = simdata_quantized(n=1000, outcomtype="survival", cor=c(.75, 0, 0, 0, 1), b0=0, coef=c(1,0,0,0,0,1), q=4) names(dat)[which(names(dat=="x5"))] = "z" trainidx4 <- sample(1:nrow(dat), round(nrow(dat)*0.6)) valididx4 <- setdiff(1:nrow(dat),trainidx4) traindata4 <- dat[trainidx4,] validdata4 <- dat[valididx4,] expnms=paste0("x", 1:5) f = survival::Surv(time, d)~x1 + x2 + x3 + x4 + x5 + z (fit1 <- survival::coxph(f, data = dat)) (overall <- qgcomp.cox.noboot(f, expnms = expnms, data = dat)) (splitres4 <- qgcomp.partials(fun="qgcomp.cox.noboot", f=f, q=4, traindata=traindata4,validdata=validdata4, expnms=expnms)) # zero inflated count outcome set.seed(50) n=1000 dat <- data.frame(y= (yany <- rbinom(n, 1, 0.5))*(ycnt <- rpois(n, 1.2)), x1=runif(n)+ycnt*0.2, x2=runif(n)-ycnt*0.2, x3=runif(n), x4=runif(n) , z=runif(n)) # poisson count model, mixture in both portions, but note that the qgcomp.partials # function defines the "positive" variables only by the count portion of the model (overall5 <- qgcomp.zi.noboot(f=y ~ z + x1 + x2 + x3 + x4 | x1 + x2 + x3 + x4 + z, expnms = c("x1", "x2", "x3", "x4"), data=dat, q=4, dist="poisson")) trainidx5 <- sample(1:nrow(dat), round(nrow(dat)*0.6)) valididx5 <- setdiff(1:nrow(dat),trainidx5) traindata5 <- dat[trainidx5,] validdata5 <- dat[valididx5,] splitres5 <- qgcomp.partials(fun="qgcomp.zi.noboot", f=y ~ x1 + x2 + x3 + x4 + z | x1 + x2 + x3 + x4 + z, q=4, traindata=traindata5, validdata=validdata5, expnms=c("x1", "x2", "x3", "x4")) splitres5 ## End(Not run)
set.seed(123223) dat = qgcomp::simdata_quantized(n=1000, outcomtype="continuous", cor=c(.75, 0), b0=0, coef=c(0.25,-0.25,0,0), q=4) cor(dat) # overall fit (more or less null due to counteracting exposures) (overall <- qgcomp.glm.noboot(f=y~., q=NULL, expnms=c("x1", "x2", "x3", "x4"), data=dat)) # partial effects using 40% training/60% validation split trainidx <- sample(1:nrow(dat), round(nrow(dat)*0.4)) valididx <- setdiff(1:nrow(dat),trainidx) traindata = dat[trainidx,] validdata = dat[valididx,] splitres <- qgcomp.partials(fun="qgcomp.glm.noboot", f=y~., q=NULL, traindata=traindata,validdata=validdata, expnms=c("x1", "x2", "x3", "x4")) splitres ## Not run: # under the null, both should give null results set.seed(123223) dat = simdata_quantized(n=1000, outcomtype="continuous", cor=c(.75, 0), b0=0, coef=c(0,0,0,0), q=4) # 40% training/60% validation trainidx2 <- sample(1:nrow(dat), round(nrow(dat)*0.4)) valididx2 <- setdiff(1:nrow(dat),trainidx2) traindata2 <- dat[trainidx2,] validdata2 <- dat[valididx2,] splitres2 <- qgcomp.partials(fun="qgcomp.glm.noboot", f=y~., q=NULL, traindata=traindata2,validdata=validdata2, expnms=c("x1", "x2", "x3", "x4")) splitres2 # 60% training/40% validation trainidx3 <- sample(1:nrow(dat), round(nrow(dat)*0.6)) valididx3 <- setdiff(1:nrow(dat),trainidx3) traindata3 <- dat[trainidx3,] validdata3 <- dat[valididx3,] splitres3 <- qgcomp.partials(fun="qgcomp.glm.noboot", f=y~., q=NULL, traindata=traindata3,validdata=validdata3, expnms=c("x1", "x2", "x3", "x4")) splitres3 # survival outcome set.seed(50) N=1000 dat = simdata_quantized(n=1000, outcomtype="survival", cor=c(.75, 0, 0, 0, 1), b0=0, coef=c(1,0,0,0,0,1), q=4) names(dat)[which(names(dat=="x5"))] = "z" trainidx4 <- sample(1:nrow(dat), round(nrow(dat)*0.6)) valididx4 <- setdiff(1:nrow(dat),trainidx4) traindata4 <- dat[trainidx4,] validdata4 <- dat[valididx4,] expnms=paste0("x", 1:5) f = survival::Surv(time, d)~x1 + x2 + x3 + x4 + x5 + z (fit1 <- survival::coxph(f, data = dat)) (overall <- qgcomp.cox.noboot(f, expnms = expnms, data = dat)) (splitres4 <- qgcomp.partials(fun="qgcomp.cox.noboot", f=f, q=4, traindata=traindata4,validdata=validdata4, expnms=expnms)) # zero inflated count outcome set.seed(50) n=1000 dat <- data.frame(y= (yany <- rbinom(n, 1, 0.5))*(ycnt <- rpois(n, 1.2)), x1=runif(n)+ycnt*0.2, x2=runif(n)-ycnt*0.2, x3=runif(n), x4=runif(n) , z=runif(n)) # poisson count model, mixture in both portions, but note that the qgcomp.partials # function defines the "positive" variables only by the count portion of the model (overall5 <- qgcomp.zi.noboot(f=y ~ z + x1 + x2 + x3 + x4 | x1 + x2 + x3 + x4 + z, expnms = c("x1", "x2", "x3", "x4"), data=dat, q=4, dist="poisson")) trainidx5 <- sample(1:nrow(dat), round(nrow(dat)*0.6)) valididx5 <- setdiff(1:nrow(dat),trainidx5) traindata5 <- dat[trainidx5,] validdata5 <- dat[valididx5,] splitres5 <- qgcomp.partials(fun="qgcomp.zi.noboot", f=y ~ x1 + x2 + x3 + x4 + z | x1 + x2 + x3 + x4 + z, q=4, traindata=traindata5, validdata=validdata5, expnms=c("x1", "x2", "x3", "x4")) splitres5 ## End(Not run)
It is often of interest to examine survival curves from qgcomp.cox.boot models. They can be useful for checking assumptions about how well the marginal structural model conforms to the underlying conditional model, such that the overall fit approximates the non-linearity in the underlying model. This function will yield survival curves, but no measures of uncertainty.
qgcomp.survcurve.boot(x, ...)
qgcomp.survcurve.boot(x, ...)
x |
a |
... |
not used |
a list of data.frames:
Average Survival curve (survival, time) based on marginal structural model, averaged over the population at every quantile of exposure
Population average survival curve (survival, time) based on the underlying conditional model
Survival curves (survival, time) for each quantile based on marginal structural model
Survival curves (survival, time) for each quantile based on underlying conditional model
set.seed(50) N=200 dat <- data.frame(time=(tmg <- pmin(.1,rweibull(N, 10, 0.1))), d=1.0*(tmg<0.1), x1=runif(N), x2=runif(N), z=runif(N)) expnms=paste0("x", 1:2) f = survival::Surv(time, d)~x1 + x2 (fit1 <- survival::coxph(f, data = dat)) (obj <- qgcomp.cox.noboot(f, expnms = expnms, data = dat)) ## Not run: (obj2 <- qgcomp.cox.boot(f, expnms = expnms, data = dat, B=10, MCsize=20000)) curves = cox.survcurve.boot(obj2) rbind(head(curves$mdfq),tail(curves$mdfq)) ## End(Not run)
set.seed(50) N=200 dat <- data.frame(time=(tmg <- pmin(.1,rweibull(N, 10, 0.1))), d=1.0*(tmg<0.1), x1=runif(N), x2=runif(N), z=runif(N)) expnms=paste0("x", 1:2) f = survival::Surv(time, d)~x1 + x2 (fit1 <- survival::coxph(f, data = dat)) (obj <- qgcomp.cox.noboot(f, expnms = expnms, data = dat)) ## Not run: (obj2 <- qgcomp.cox.boot(f, expnms = expnms, data = dat, B=10, MCsize=20000)) curves = cox.survcurve.boot(obj2) rbind(head(curves$mdfq),tail(curves$mdfq)) ## End(Not run)
This function estimates a linear dose-response parameter representing a one quantile increase in a set of exposures of interest for zero-inflated count outcomes. This function is limited to linear and additive effects of individual components of the exposure. This model estimates the parameters of a marginal structural zero-inflated count model (MSM) based on g-computation with quantized exposures. Note: this function allows linear and non-additive effects of individual components of the exposure, as well as non-linear joint effects of the mixture via polynomial basis functions, which increase the computational computational burden due to the need for non-parametric bootstrapping.
qgcomp.zi.boot( f, data, expnms = NULL, q = 4, breaks = NULL, id = NULL, weights, alpha = 0.05, B = 200, degree = 1, seed = NULL, bayes = FALSE, parallel = FALSE, MCsize = 10000, msmcontrol = zimsm_fit.control(), parplan = FALSE, ... )
qgcomp.zi.boot( f, data, expnms = NULL, q = 4, breaks = NULL, id = NULL, weights, alpha = 0.05, B = 200, degree = 1, seed = NULL, bayes = FALSE, parallel = FALSE, MCsize = 10000, msmcontrol = zimsm_fit.control(), parplan = FALSE, ... )
f |
R style formula |
data |
data frame |
expnms |
character vector of exposures of interest |
q |
NULL or number of quantiles used to create quantile indicator variables representing the exposure variables. If NULL, then gcomp proceeds with un-transformed version of exposures in the input datasets (useful if data are already transformed, or for performing standard g-computation) |
breaks |
(optional) NULL, or a list of (equal length) numeric vectors that characterize the minimum value of each category for which to break up the variables named in expnms. This is an alternative to using 'q' to define cutpoints. |
id |
(optional) NULL, or variable name indexing individual units of observation (only needed if analyzing data with multiple observations per id/cluster) |
weights |
"case weights" - passed to the "weight" argument of
|
alpha |
alpha level for confidence limit calculation |
B |
integer: number of bootstrap iterations (this should typically be >=200, though it is set lower in examples to improve run-time). |
degree |
polynomial basis function for marginal model (e.g. degree = 2 allows that the relationship between the whole exposure mixture and the outcome is quadratic.) |
seed |
integer or NULL: random number seed for replicable bootstrap results |
bayes |
not currently implemented. |
parallel |
use (safe) parallel processing from the future and future.apply packages |
MCsize |
integer: sample size for simulation to approximate marginal zero inflated model parameters. This can be left small for testing, but should be as large as needed to reduce simulation error to an acceptable magnitude (can compare psi coefficients for linear fits with qgcomp.zi.noboot to gain some intuition for the level of expected simulation error at a given value of MCsize) |
msmcontrol |
named list from |
parplan |
(logical, default=FALSE) automatically set future::plan to plan(multisession) (and set to existing plan, if any, after bootstrapping) |
... |
arguments to glm (e.g. family) |
Zero-inflated count models allow excess zeros in standard count outcome (e.g. Poisson distributed outcomes). Such models have two components: 1 ) the probability of arising from a degenerate distribution at zero (versus arising from a count distribution) and 2 ) the rate parameter of a count distribution. Thus, one has the option of allowing exposure and covariate effects on the zero distribution, the count distribution, or both. The zero distribution parameters correspond to log-odds ratios for the probability of arising from the zero distribution. Count distribution parameters correspond to log-rate-ratio parameters. Test statistics and confidence intervals are based on a non-parametric bootstrap, using the standard deviation of the bootstrap estimates to estimate the standard error. The bootstrap standard error is then used to estimate Wald-type confidence intervals. Note that no bootstrapping is done on estimated quantiles of exposure, so these are treated as fixed quantities.
Of note, this function yields marginal estimates of the expected outcome under values of the joint exposure quantiles (e.g. the expected outcome if all exposures are below the 1st quartile). These outcomes can be used to derive estimates of the effect on the marginal expectation of the outcome, irrespective of zero-inflated/count portions of the statistical model.
Estimates correspond to the average expected change in the (log) outcome per quantile increase in the joint exposure to all exposures in ‘expnms’. Test statistics and confidence intervals are based on a non-parametric bootstrap, using the standard deviation of the bootstrap estimates to estimate the standard error. The bootstrap standard error is then used to estimate Wald-type confidence intervals. Note that no bootstrapping is done on estimated quantiles of exposure, so these are treated as fixed quantities
a qgcompfit object, which contains information about the effect measure of interest (psi) and associated variance (var.psi), as well as information on the model fit (fit) and information on the marginal structural model (msmfit) used to estimate the final effect estimates.
Other qgcomp_methods:
qgcomp.cch.noboot()
,
qgcomp.cox.boot()
,
qgcomp.cox.noboot()
,
qgcomp.glm.boot()
,
qgcomp.glm.ee()
,
qgcomp.glm.noboot()
,
qgcomp.hurdle.boot()
,
qgcomp.hurdle.noboot()
,
qgcomp.multinomial.boot()
,
qgcomp.multinomial.noboot()
,
qgcomp.partials()
,
qgcomp.zi.noboot()
set.seed(50) n=100 dat <- data.frame(y=rbinom(n, 1, 0.5)*rpois(n, 1.2), x1=runif(n), x2=runif(n), z=runif(n)) # poisson count model, mixture in both portions ## Not run: # warning: the examples below can take a long time to run res = qgcomp.zi.boot(f=y ~ x1 + x2 | x1 + x2, expnms = c('x1', 'x2'), data=dat, q=4, dist="poisson", B=1000, MCsize=10000, parallel=TRUE, parplan=TRUE) qgcomp.zi.noboot(f=y ~ x1 + x2 | x1 + x2, expnms = c('x1', 'x2'), data=dat, q=4, dist="poisson") res # accuracy for small MCsize is suspect (compare coefficients between boot/noboot versions), # so re-check with MCsize set to larger value (this takes a long time to run) res2 = qgcomp.zi.boot(f=y ~ x1 + x2 | x1 + x2, expnms = c('x1', 'x2'), data=dat, q=4, dist="poisson", B=1000, MCsize=50000, parallel=TRUE, parplan=TRUE) res2 plot(density(res2$bootsamps[4,])) # negative binomial count model, mixture and covariate in both portions qgcomp.zi.boot(f=y ~ z + x1 + x2 | z + x1 + x2, expnms = c('x1', 'x2'), data=dat, q=4, dist="negbin", B=10, MCsize=10000) # weighted analysis (NOTE THIS DOES NOT WORK WITH parallel=TRUE!) dat$w = runif(n)*5 qgcomp.zi.noboot(f=y ~ z + x1 + x2 | x1 + x2, expnms = c('x1', 'x2'), data=dat, q=4, dist="poisson", weights=w) # Expect this: # Warning message: # In eval(family$initialize) : non-integer #successes in a binomial glm! qgcomp.zi.boot(f=y ~ x1 + x2 | x1 + x2, expnms = c('x1', 'x2'), data=dat, q=4, dist="poisson", B=5, MCsize=50000, parallel=FALSE, weights=w) # Log rr per one IQR change in all exposures (not on quantile basis) dat$x1iqr <- dat$x1/with(dat, diff(quantile(x1, c(.25, .75)))) dat$x2iqr <- dat$x2/with(dat, diff(quantile(x2, c(.25, .75)))) # note that I(x>...) now operates on the untransformed value of x, # rather than the quantized value res2 = qgcomp.zi.boot(y ~ z + x1iqr + x2iqr + I(x2iqr>0.1) + I(x2>0.4) + I(x2>0.9) | x1iqr + x2iqr, family="binomial", expnms = c('x1iqr', 'x2iqr'), data=dat, q=NULL, B=2, degree=2, MCsize=200, dist="poisson") res2 ## End(Not run)
set.seed(50) n=100 dat <- data.frame(y=rbinom(n, 1, 0.5)*rpois(n, 1.2), x1=runif(n), x2=runif(n), z=runif(n)) # poisson count model, mixture in both portions ## Not run: # warning: the examples below can take a long time to run res = qgcomp.zi.boot(f=y ~ x1 + x2 | x1 + x2, expnms = c('x1', 'x2'), data=dat, q=4, dist="poisson", B=1000, MCsize=10000, parallel=TRUE, parplan=TRUE) qgcomp.zi.noboot(f=y ~ x1 + x2 | x1 + x2, expnms = c('x1', 'x2'), data=dat, q=4, dist="poisson") res # accuracy for small MCsize is suspect (compare coefficients between boot/noboot versions), # so re-check with MCsize set to larger value (this takes a long time to run) res2 = qgcomp.zi.boot(f=y ~ x1 + x2 | x1 + x2, expnms = c('x1', 'x2'), data=dat, q=4, dist="poisson", B=1000, MCsize=50000, parallel=TRUE, parplan=TRUE) res2 plot(density(res2$bootsamps[4,])) # negative binomial count model, mixture and covariate in both portions qgcomp.zi.boot(f=y ~ z + x1 + x2 | z + x1 + x2, expnms = c('x1', 'x2'), data=dat, q=4, dist="negbin", B=10, MCsize=10000) # weighted analysis (NOTE THIS DOES NOT WORK WITH parallel=TRUE!) dat$w = runif(n)*5 qgcomp.zi.noboot(f=y ~ z + x1 + x2 | x1 + x2, expnms = c('x1', 'x2'), data=dat, q=4, dist="poisson", weights=w) # Expect this: # Warning message: # In eval(family$initialize) : non-integer #successes in a binomial glm! qgcomp.zi.boot(f=y ~ x1 + x2 | x1 + x2, expnms = c('x1', 'x2'), data=dat, q=4, dist="poisson", B=5, MCsize=50000, parallel=FALSE, weights=w) # Log rr per one IQR change in all exposures (not on quantile basis) dat$x1iqr <- dat$x1/with(dat, diff(quantile(x1, c(.25, .75)))) dat$x2iqr <- dat$x2/with(dat, diff(quantile(x2, c(.25, .75)))) # note that I(x>...) now operates on the untransformed value of x, # rather than the quantized value res2 = qgcomp.zi.boot(y ~ z + x1iqr + x2iqr + I(x2iqr>0.1) + I(x2>0.4) + I(x2>0.9) | x1iqr + x2iqr, family="binomial", expnms = c('x1iqr', 'x2iqr'), data=dat, q=NULL, B=2, degree=2, MCsize=200, dist="poisson") res2 ## End(Not run)
This function estimates a linear dose-response parameter representing a one quantile increase in a set of exposures of interest for zero-inflated count outcomes. This function is limited to linear and additive effects of individual components of the exposure. This model estimates the parameters of a marginal structural zero-inflated model (MSM) based on g-computation with quantized exposures. Note: this function is valid only under linear and additive effects of individual components of the exposure, but when these hold the model can be fit with very little computational burden.
qgcomp.zi.noboot( f, data, expnms = NULL, q = 4, breaks = NULL, id = NULL, weights, alpha = 0.05, bayes = FALSE, ... )
qgcomp.zi.noboot( f, data, expnms = NULL, q = 4, breaks = NULL, id = NULL, weights, alpha = 0.05, bayes = FALSE, ... )
f |
R style formula using syntax from 'pscl' package: depvar ~ indvars_count | indvars_zero |
data |
data frame |
expnms |
character vector of exposures of interest |
q |
NULL or number of quantiles used to create quantile indicator variables representing the exposure variables. If NULL, then gcomp proceeds with un-transformed version of exposures in the input datasets (useful if data are already transformed, or for performing standard g-computation) |
breaks |
(optional) NULL, or a list of (equal length) numeric vectors that characterize the minimum value of each category for which to break up the variables named in expnms. This is an alternative to using 'q' to define cutpoints. |
id |
(optional) NULL, or variable name indexing individual units of observation (only needed if analyzing data with multiple observations per id/cluster) |
weights |
"case weights" - passed to the "weight" argument of
|
alpha |
alpha level for confidence limit calculation |
bayes |
not yet implemented |
... |
arguments to zeroinfl (e.g. dist) |
A zero-inflated version of quantile g-computation based on the implementation in the 'pscl' package. A zero-inflated distribution is a mixture distribution in which one of the distributions is a point mass at zero (with probability given by a logistic model), and the other distribution is a discrete or continuous distribution. This estimates the effect of a joint increase in all exposures on 1) the odds of belonging to the "zero" vs. "count" portions of the distribution and/or 2) the rate parameter for the "count" portion of the distribution.
a qgcompfit object, which contains information about the effect measure of interest (psi) and associated variance (var.psi), as well as information on the model fit (fit) and information on the weights/standardized coefficients in the positive (pos.weights) and negative (neg.weights) directions.
Other qgcomp_methods:
qgcomp.cch.noboot()
,
qgcomp.cox.boot()
,
qgcomp.cox.noboot()
,
qgcomp.glm.boot()
,
qgcomp.glm.ee()
,
qgcomp.glm.noboot()
,
qgcomp.hurdle.boot()
,
qgcomp.hurdle.noboot()
,
qgcomp.multinomial.boot()
,
qgcomp.multinomial.noboot()
,
qgcomp.partials()
,
qgcomp.zi.boot()
set.seed(50) n=100 dat <- data.frame(y=rbinom(n, 1, 0.5)*rpois(n, 1.2), x1=runif(n), x2=runif(n), z=runif(n)) # poisson count model, mixture in both portions qgcomp.zi.noboot(f=y ~ z + x1 + x2 | x1 + x2, expnms = c('x1', 'x2'), data=dat, q=2, dist="poisson") # negative binomial count model, mixture and covariate in both portions qgcomp.zi.noboot(f=y ~ z + x1 + x2 | z + x1 + x2, expnms = c('x1', 'x2'), data=dat, q=2, dist="negbin") qgcomp.zi.noboot(f=y ~ z + x1 + x2, expnms = c('x1', 'x2'), data=dat, q=2, dist="negbin") # equivalent # negative binomial count model, mixture only in the 'count' portion of the model qgcomp.zi.noboot(f=y ~ z + x1 + x2 | z, expnms = c('x1', 'x2'), data=dat, q=2, dist="negbin") # weighted analysis dat$w = runif(n)*5 qgcomp.zi.noboot(f=y ~ z + x1 + x2 | x1 + x2, expnms = c('x1', 'x2'), data=dat, q=2, dist="poisson", weights=w) # Expect this: # Warning message: # In eval(family$initialize) : non-integer #successes in a binomial glm!
set.seed(50) n=100 dat <- data.frame(y=rbinom(n, 1, 0.5)*rpois(n, 1.2), x1=runif(n), x2=runif(n), z=runif(n)) # poisson count model, mixture in both portions qgcomp.zi.noboot(f=y ~ z + x1 + x2 | x1 + x2, expnms = c('x1', 'x2'), data=dat, q=2, dist="poisson") # negative binomial count model, mixture and covariate in both portions qgcomp.zi.noboot(f=y ~ z + x1 + x2 | z + x1 + x2, expnms = c('x1', 'x2'), data=dat, q=2, dist="negbin") qgcomp.zi.noboot(f=y ~ z + x1 + x2, expnms = c('x1', 'x2'), data=dat, q=2, dist="negbin") # equivalent # negative binomial count model, mixture only in the 'count' portion of the model qgcomp.zi.noboot(f=y ~ z + x1 + x2 | z, expnms = c('x1', 'x2'), data=dat, q=2, dist="negbin") # weighted analysis dat$w = runif(n)*5 qgcomp.zi.noboot(f=y ~ z + x1 + x2 | x1 + x2, expnms = c('x1', 'x2'), data=dat, q=2, dist="poisson", weights=w) # Expect this: # Warning message: # In eval(family$initialize) : non-integer #successes in a binomial glm!
Create variables representing indicator functions with cutpoints defined
by quantiles. Output a list that includes: 1) a dataset that is a copy of data,
except that the variables whose names are included in the expnms
variable are
transformed to their quantized version and 2) an unnamed list of the quantile cutpoints
that are used for each of the variables that were quantized
quantize(data, expnms, q = 4, breaks = NULL)
quantize(data, expnms, q = 4, breaks = NULL)
data |
a data frame |
expnms |
a character vector with the names of the columns to be quantized |
q |
integer, number of quantiles used in creating quantized variables |
breaks |
(optional) list of (equal length) numeric vectors that characterize the minimum value of each category for which to break up the variables named in expnms. This is an alternative to using 'q' to define cutpoints. |
This function creates categorical variables in place of the exposure variables named in 'expnms.' For example, a continuous exposure 'x1' will be replaced in the output data by another 'x1' that takes on values 0:(q-1), where, for example, the value 1 indicates that the original x1 value falls between the first and the second quantile.
A list containing the following fields
a quantized version of the original dataframe
a list of the quantile cutpoints used to create the quantized variables which includes a very small number for the minimum and a very large number for the maximum to avoid causing issues when using these breaks to quantize new data.
set.seed(1232) dat = data.frame(y=runif(100), x1=runif(100), x2=runif(100), z=runif(100)) qdata = quantize(data=dat, expnms=c("x1", "x2"), q=4) table(qdata$data$x1) table(qdata$data$x2) summary(dat[c("y", "z")]);summary(qdata$data[c("y", "z")]) # not touched dat = data.frame(y=runif(100), x1=runif(100), x2=runif(100), z=runif(100)) # using 'breaks' requires specifying min and max (the qth quantile) # example with theoretical quartiles (could be other relevant values) qdata2 = quantize(data=dat, expnms=c("x1", "x2"), breaks=list(c(-1e64, .25, .5, .75, 1e64), c(-1e64, .25, .5, .75, 1e64) )) table(qdata2$data$x1) table(qdata2$data$x2)
set.seed(1232) dat = data.frame(y=runif(100), x1=runif(100), x2=runif(100), z=runif(100)) qdata = quantize(data=dat, expnms=c("x1", "x2"), q=4) table(qdata$data$x1) table(qdata$data$x2) summary(dat[c("y", "z")]);summary(qdata$data[c("y", "z")]) # not touched dat = data.frame(y=runif(100), x1=runif(100), x2=runif(100), z=runif(100)) # using 'breaks' requires specifying min and max (the qth quantile) # example with theoretical quartiles (could be other relevant values) qdata2 = quantize(data=dat, expnms=c("x1", "x2"), breaks=list(c(-1e64, .25, .5, .75, 1e64), c(-1e64, .25, .5, .75, 1e64) )) table(qdata2$data$x1) table(qdata2$data$x2)
This function uses the Delta method to calculate standard errors of linear
functions of variables (similar to lincom
in Stata). Generally, users will not need to
call this function directly.
se_comb(expnms, covmat, grad = NULL)
se_comb(expnms, covmat, grad = NULL)
expnms |
a character vector with the names of the columns to be of interest in the covariance matrix for a which a standard error will be calculated (e.g. same as expnms in qgcomp fit) |
covmat |
covariance matrix for parameters, e.g. from a model or bootstrap procedure |
grad |
the "weight" vector for calculating the contribution of each variable in expnms to the final standard error. For a linear combination, this is equal to a vector of ones (and is set automatically). Or can be calculated via the grad_poly procedure, in the case of coming up with proper weights when the combination of expnms derives from a polynomial function (as in qgcomp.glm.boot with degree>1). |
This function takes inputs of a set of exposure names (character vector)
and a covariance matrix (with colnames/rownames that contain the full set
of exposure names), as well as a possible grad
parameter to calculate
the variance of a weighted combination of the exposures in expnms
, where the
weights are based off of grad
(which defaults to 1, so that this function
yields the variance of a sum of all variables in expnms
)
Here is simple version of the delta method for a linear combination of three model coefficients:
given gradient vector
= delta method variance, where t() is the transpose operator
vcov = rbind(c(1.2, .9),c(.9, 2.0)) colnames(vcov) <- rownames(vcov) <- expnms <- c("x1", "x2") se_comb(expnms, vcov, c(1, 0))^2 # returns the given variance se_comb(expnms, vcov, c(1, 1)) # default linear MSM fit: all exposures # have equal weight se_comb(expnms, vcov, c(.3, .1)) # used when one exposure contributes # to the overall fit more than others = d(msmeffect)/dx
vcov = rbind(c(1.2, .9),c(.9, 2.0)) colnames(vcov) <- rownames(vcov) <- expnms <- c("x1", "x2") se_comb(expnms, vcov, c(1, 0))^2 # returns the given variance se_comb(expnms, vcov, c(1, 1)) # default linear MSM fit: all exposures # have equal weight se_comb(expnms, vcov, c(.3, .1)) # used when one exposure contributes # to the overall fit more than others = d(msmeffect)/dx
Simulate quantized exposures for testing methods
simdata_quantized( outcometype = c("continuous", "logistic", "survival", "multinomial"), n = 100, corr = NULL, b0 = 0, coef = c(1, 0, 0, 0), q = 4, yscale = 1, shape0 = 3, scale0 = 5, censtime = 4, ncheck = TRUE, ... )
simdata_quantized( outcometype = c("continuous", "logistic", "survival", "multinomial"), n = 100, corr = NULL, b0 = 0, coef = c(1, 0, 0, 0), q = 4, yscale = 1, shape0 = 3, scale0 = 5, censtime = 4, ncheck = TRUE, ... )
outcometype |
Character variable that is one of c("continuous", "logistic", "survival"). Selects what type of outcome should be simulated (or how). continuous = normal, continous outcome, logistic= binary outcome from logistic model, survival = right censored survival outcome from Weibull model. |
n |
Sample size |
corr |
NULL, or vector of correlations between the first exposure and subsequent exposures (if length(corr) < (length(coef)-1), then this will be backfilled with zeros) |
b0 |
(continuous, binary outcomes) model intercept |
coef |
Vector of coefficients for the outcome (i.e. model coefficients for exposures). The length of this determines the number of exposures. |
q |
Number of levels or "quanta" of each exposure |
yscale |
(continuous outcomes) error scale (residual error) for normally distributed outcomes |
shape0 |
(survival outcomes) baseline shape of weibull distribution rweibull |
scale0 |
(survival outcomes) baseline scale of weibull distribution rweibull |
censtime |
(survival outcomes) administrative censoring time |
ncheck |
(logical, default=TRUE) adjust sample size if needed so that exposures are exactly evenly distributed (so that qgcomp::quantize(exposure) = exposure) |
... |
unused |
Simulate continuous (normally distributed errors), binary (logistic function), or event-time outcomes as a linear function
a data frame
qgcomp.glm.boot
, and qgcomp.glm.noboot
set.seed(50) qdat = simdata_quantized( outcometype="continuous", n=10000, corr=c(.9,.3), coef=c(1,1,0,0), q = 8 ) cor(qdat) qdat = simdata_quantized( outcometype="continuous", n=10000, corr=c(-.9,.3), coef=c(1,2,0,0), q = 4 ) cor(qdat) table(qdat$x1) qgcomp.glm.noboot(y~.,data=qdat) qdat = simdata_quantized( outcometype="multinomial", n=10000, corr=c(-.9), coef=cbind(c(1,-1,0,0), c(1,.2,0,0)), q = 4 )
set.seed(50) qdat = simdata_quantized( outcometype="continuous", n=10000, corr=c(.9,.3), coef=c(1,1,0,0), q = 8 ) cor(qdat) qdat = simdata_quantized( outcometype="continuous", n=10000, corr=c(-.9,.3), coef=c(1,2,0,0), q = 4 ) cor(qdat) table(qdat$x1) qgcomp.glm.noboot(y~.,data=qdat) qdat = simdata_quantized( outcometype="multinomial", n=10000, corr=c(-.9), coef=cbind(c(1,-1,0,0), c(1,.2,0,0)), q = 4 )
This is a convenience function to split the input data into
two independent sets, possibly accounting for
single level clustering. These two sets can be used with
qgcomp.partials
to get "partial" positive/negative effect estimates
from the original data, where sample splitting is necessary to get valid confidence intervals
and p-values. Sample splitting is also useful for any sort of exploratory model selection, where
the training data can be used to select the model and the validation model used to
generate the final estimates (this process should not be iterative - e.g. no "checking" the
results in the validation data and then re-fitting, as this invalidates inference in the
validation set.) E.g. you could use the training data to select non-linear terms for the
model and then re-fit in validation data to get unbiased estimates.
split_data(data, cluster = NULL, prop.train = 0.4)
split_data(data, cluster = NULL, prop.train = 0.4)
data |
A data.frame for use in qgcomp fitting |
cluster |
NULL (default) or character value naming a cluster identifier in the data. This is to prevent observations from a single cluster being in both the training and validation data, which reduces the effectiveness of sample splitting. |
prop.train |
proportion of the original dataset (or proportion of the clusters identified via the 'cluster' parameter) that are used in the training data (default=0.4) |
A list of the following type: list( trainidx = trainidx, valididx = valididx, traindata = traindata, validdata = validdata )
e.g. if you call spl = split_data(dat)
, then spl$traindata will contain
a 40% sample from the original data, spl$validdata will contain the other 60%
and spl$trainidx, spl$valididx will contain integer indexes that track the
row numbers (from the original data dat
) that have the training and validation
samples.
data(metals) set.seed(1231124) spl = split_data(metals) Xnm <- c( 'arsenic','barium','cadmium','calcium','chromium','copper', 'iron','lead','magnesium','manganese','mercury','selenium','silver', 'sodium','zinc' ) dim(spl$traindata) # 181 observations = 40% of total dim(spl$validdata) # 271 observations = 60% of total splitres <- qgcomp.partials(fun="qgcomp.glm.noboot", f=y~., q=4, traindata=spl$traindata,validdata=spl$validdata, expnms=Xnm) splitres # also used to compare linear vs. non-linear fits (useful if you have enough data) set.seed(1231) spl = split_data(metals, prop.train=.5) lin = qgcomp.glm.boot(f=y~., q=4, expnms=Xnm, B=5, data=spl$traindata) nlin1 = qgcomp.glm.boot(f=y~. + I(manganese^2) + I(calcium^2), expnms=Xnm, deg=2, q=4, B=5, data=spl$traindata) nlin2 = qgcomp.glm.boot(f=y~. + I(arsenic^2) + I(cadmium^2), expnms=Xnm, deg=2, q=4, B=5, data=spl$traindata) AIC(lin);AIC(nlin1);AIC(nlin2) # linear has lowest training AIC, so base final fit off that (and bootstrap not needed) qgcomp.glm.noboot(f=y~., q=4, expnms=Xnm, data=spl$validdata)
data(metals) set.seed(1231124) spl = split_data(metals) Xnm <- c( 'arsenic','barium','cadmium','calcium','chromium','copper', 'iron','lead','magnesium','manganese','mercury','selenium','silver', 'sodium','zinc' ) dim(spl$traindata) # 181 observations = 40% of total dim(spl$validdata) # 271 observations = 60% of total splitres <- qgcomp.partials(fun="qgcomp.glm.noboot", f=y~., q=4, traindata=spl$traindata,validdata=spl$validdata, expnms=Xnm) splitres # also used to compare linear vs. non-linear fits (useful if you have enough data) set.seed(1231) spl = split_data(metals, prop.train=.5) lin = qgcomp.glm.boot(f=y~., q=4, expnms=Xnm, B=5, data=spl$traindata) nlin1 = qgcomp.glm.boot(f=y~. + I(manganese^2) + I(calcium^2), expnms=Xnm, deg=2, q=4, B=5, data=spl$traindata) nlin2 = qgcomp.glm.boot(f=y~. + I(arsenic^2) + I(cadmium^2), expnms=Xnm, deg=2, q=4, B=5, data=spl$traindata) AIC(lin);AIC(nlin1);AIC(nlin2) # linear has lowest training AIC, so base final fit off that (and bootstrap not needed) qgcomp.glm.noboot(f=y~., q=4, expnms=Xnm, data=spl$validdata)
Summary printing to include coefficients, standard errors, hypothesis tests, weights
## S3 method for class 'qgcompmultfit' summary(object, ..., tests = NULL)
## S3 method for class 'qgcompmultfit' summary(object, ..., tests = NULL)
object |
Result from qgcomp multinomial fit (qgcompmultfit object). |
... |
Unused |
tests |
Character vector (e.g. c("global", "homogeneity")) that determine the types of hypothesis tests that are printed |
qgcompmulttest object (list) with results of a chi-squared test
Tidy summarizes information about the components of a model. A model component
might be a single term in a regression, a single hypothesis, a cluster, or a class.
Exactly what tidy considers to be a model component varies cross models but is usually
self-evident. If a model has several distinct types of components, you will need to
specify which components to return. (Description taken from tidyr::tidy
help file.)
## S3 method for class 'qgcompfit' tidy(x, conf.level = 1 - x$alpha, exponentiate = FALSE, quick = FALSE, ...)
## S3 method for class 'qgcompfit' tidy(x, conf.level = 1 - x$alpha, exponentiate = FALSE, quick = FALSE, ...)
x |
a agcompfit object created by qgcomp(). |
conf.level |
Real number between 0 and 1 corresponding to nominal percentage/100 of confidence limit (e.g. conf.level=0.95 means 95 per cent confidence intervals). Defaults to 1-alpha level of qgcompfit. |
exponentiate |
Logical indicating whether or not to exponentiate the the coefficient estimates. This is typical for logistic and multinomial regressions, but a bad idea if there is no log or logit link. Defaults to FALSE. |
quick |
Logical indiciating if the only the term and estimate columns should be returned. Often useful to avoid time consuming covariance and standard error calculations. Defaults to FALSE. |
... |
Additional arguments. Not used. Needed to match generic signature only. Cautionary note: Misspelled arguments will be absorbed in ..., where they will be ignored. If the misspelled argument has a default value, the default value will be used. For example, if you pass conf.lvel = 0.9, all computation will proceed using conf.level = 0.95. Additionally, if you pass newdata = my_tibble to an augment() method that does not accept a newdata argument, it will use the default value for the data argument. |
This function uses the Delta method to calculate a covariance matrix of linear functions of variables and is used internally in qgcomp. Generally, users will not need to call this function directly.
vc_comb(aname = "(Intercept)", expnms, covmat, grad = NULL)
vc_comb(aname = "(Intercept)", expnms, covmat, grad = NULL)
aname |
character scalar with the name of the first column of interest (e.g. variable A in the examples given in the details section) |
expnms |
a character vector with the names of the columns to be of interest in the covariance matrix for a which a standard error will be calculated (e.g. same as expnms in qgcomp fit) |
covmat |
covariance matrix for parameters, e.g. from a model or bootstrap procedure |
grad |
not yet used |
This function takes inputs of a name of random variable (character), as
set of exposure names (character vector) and a covariance matrix (with colnames/rownames
that contain the indepdendent variable and the full set
of exposure names). See se_comb
for details on variances of sums
of random variables. Briefly, for variables A, B and C with covariance matrix Cov(A,B,C),
we can calculate the covariance Cov(A,B+C) with the formula Cov(A,B) + Cov(A,C), and
Cov(A,B+C+D) = Cov(A,B) + Cov(A,C) + Cov(A,D), and so on.
A covariance matrix
vcov = rbind(c(0.010051348, -0.0039332248, -0.0036965571), c(-0.003933225, 0.0051807876, 0.0007706792), c(-0.003696557, 0.0007706792, 0.0050996587)) colnames(vcov) <- rownames(vcov) <- c("(Intercept)", "x1", "x2") expnms <- rownames(vcov)[2:3] aname = rownames(vcov)[1] vc_comb(aname, expnms, vcov) # returns the given covariance matrix
vcov = rbind(c(0.010051348, -0.0039332248, -0.0036965571), c(-0.003933225, 0.0051807876, 0.0007706792), c(-0.003696557, 0.0007706792, 0.0050996587)) colnames(vcov) <- rownames(vcov) <- c("(Intercept)", "x1", "x2") expnms <- rownames(vcov)[2:3] aname = rownames(vcov)[1] vc_comb(aname, expnms, vcov) # returns the given covariance matrix
this is an internal function called by
qgcomp.zi.boot
,
but is documented here for clarity. Generally, users will not need to call
this function directly.
zimsm_fit( f, qdata, intvals, expnms, main = TRUE, degree = 1, id = NULL, weights, MCsize = 10000, containmix = list(count = TRUE, zero = TRUE), bayes = FALSE, x = FALSE, msmcontrol = zimsm_fit.control(), ... )
zimsm_fit( f, qdata, intvals, expnms, main = TRUE, degree = 1, id = NULL, weights, MCsize = 10000, containmix = list(count = TRUE, zero = TRUE), bayes = FALSE, x = FALSE, msmcontrol = zimsm_fit.control(), ... )
f |
an r formula representing the conditional model for the outcome, given all
exposures and covariates. Interaction terms that include exposure variables
should be represented via the |
qdata |
a data frame with quantized exposures (as well as outcome and other covariates) |
intvals |
sequence, the sequence of integer values that the joint exposure is 'set' to for estimating the msm. For quantile g-computation, this is just 0:(q-1), where q is the number of quantiles of exposure. |
expnms |
a character vector with the names of the columns in qdata that represent the exposures of interest (main terms only!) |
main |
logical, internal use: produce estimates of exposure effect (psi) and expected outcomes under g-computation and the MSM |
degree |
polynomial bases for marginal model (e.g. degree = 2 allows that the relationship between the whole exposure mixture and the outcome is quadratic. Default=1 ) |
id |
(optional) NULL, or variable name indexing individual units of observation (only needed if analyzing data with multiple observations per id/cluster) |
weights |
not yet implemented |
MCsize |
integer: sample size for simulation to approximate marginal hazards ratios |
containmix |
named list of logical scalars with names "count" and "zero" |
bayes |
not used |
x |
keep design matrix? (logical) |
msmcontrol |
named list from |
... |
arguments to zeroinfl (e.g. dist) |
This function first computes expected outcomes under hypothetical interventions to simultaneously set all exposures to a specific quantile. These predictions are based on g-computation, where the exposures are ‘quantized’, meaning that they take on ordered integer values according to their ranks, and the integer values are determined by the number of quantile cutpoints used. The function then takes these expected outcomes and fits an additional model (a marginal structural model) with the expected outcomes as the outcome and the intervention value of the exposures (the quantile integer) as the exposure. Under causal identification assumptions and correct model specification, the MSM yields a causal exposure-response representing the incremental change in the expected outcome given a joint intervention on all exposures.
qgcomp.cox.boot
, and qgcomp.cox.noboot
set.seed(50) n=100 ## Not run: dat <- data.frame(y=rbinom(n, 1, 0.5)*rpois(n, 1.2), x1=runif(n), x2=runif(n), z=runif(n)) expnms = c("x1", "x2") q = 4 qdata = quantize(dat, q=q, expnms=expnms)$data f = y ~ x1 + x2 + z | 1 msmfit <- zimsm_fit(f, qdata, intvals=(1:q)-1, expnms, main=TRUE, degree=1, id=NULL, MCsize=10000, containmix=list(count=TRUE, zero=FALSE), x=FALSE) msmfit$msmfit ## End(Not run)
set.seed(50) n=100 ## Not run: dat <- data.frame(y=rbinom(n, 1, 0.5)*rpois(n, 1.2), x1=runif(n), x2=runif(n), z=runif(n)) expnms = c("x1", "x2") q = 4 qdata = quantize(dat, q=q, expnms=expnms)$data f = y ~ x1 + x2 + z | 1 msmfit <- zimsm_fit(f, qdata, intvals=(1:q)-1, expnms, main=TRUE, degree=1, id=NULL, MCsize=10000, containmix=list(count=TRUE, zero=FALSE), x=FALSE) msmfit$msmfit ## End(Not run)
this is an internal function called by
qgcomp.zi.boot
, but is documented here for clarity.
Generally, users will not need to call this function directly.
zimsm_fit.control(predmethod = c("components", "catprobs"))
zimsm_fit.control(predmethod = c("components", "catprobs"))
predmethod |
character in c("components", "catprobs"). "components" simulates from the model parameters directly while "catprobs" simulates outcomes from the category specific probabilities, which is output from predict.zeroinfl. The former is slightly more flexible and stable, but the latter is preferred in zero inflated negative bionomial models. |
Provides fine control over zero inflated MSM fitting