The qgcompint package: g-computation with statistical interaction

Introduction

Quantile g-computation (qgcomp) is a special case of g-computation used for estimating joint exposure response curves for a set of continuous exposures. The base package qgcomp allows one to estimate conditional or marginal joint-exposure response curves. Because this approach developed within the field of “exposure mixtures” the set of exposures of interest are referred to here as “the mixture.” qgcompint builds on qgcomp by incorporating statistical interaction (product terms) between binary, categorical, or continuous covariates and the mixture.

The model

Say we have an outcome Y, some exposures 𝕏, a “modifier” or a covariate for which we wish to assess statistical interaction with 𝕏, denoted by M and possibly some other covariates (e.g. potential confounders) denoted by .

The basic model of quantile g-computation is a joint marginal structural model given by

𝔼(YXq||M, Z,ψ,η) = g(ψ0 + ψ1Sq + ψ2M + ψ3M × Sq + ηZ)

where g(⋅) is a link function in a generalized linear model (e.g. the inverse logit function in the case of a logistic model for the probability that Y = 1), ψ0 is the model intercept, η is a set of model coefficients for the covariates and Sq is an “index” that represents a joint value of exposures. The joint exposure has a “main effect” at the referent value of M given by ψ1, ψ2 represents the association (or set of associations for categorical M) between the modifier and the outcome, and ψ3 is a product terms (or set of product terms for categorical M) that represent the deviation of the exposure response for Sq from the main effect for each one unit increase in M. The magnitude of ψ3 can be used to estimate the extent of statistical interaction on the model scale, sometimes referred to as effect measure modification.

Quantile g-computation (by default) transforms all exposures X into Xq, which are “scores” taking on discrete values 0, 1, 2, etc. representing a categorical “bin” of exposure. By default, there are four bins with evenly spaced quantile cutpoints for each exposure, so Xq = 0 means that X was below the observed 25th percentile for that exposure. The index Sq represents all exposures being set to the same value (again, by default, discrete values 0, 1, 2, 3). Thus, the parameter ψ1 quantifies the expected change in the outcome, given a one quantile increase in all exposures simultaneously, possibly adjusted for Z.

There are nuances to this particular model form that are available in the qgcompint package which will be explored below. There exists one special case of quantile g-computation that leads to fast fitting: linear/additive exposure effects. Here we simulate “pre-quantized” data where the exposures X1, X2, ..., X7 can only take on values of 0, 1, 2, 3 in equal proportions. The model underlying the outcomes is given by the linear regression:

𝔼(Y||X, M, β, ψ, η) = β0 + β1X1 + ... + β7X7 + ψ2M + η9X1 × M, ..., +η15X7 × M

with the true values of β given by:

##        value
## psi_0    0.0
## beta_1   0.8
## beta_2   0.6
## beta_3   0.3
## beta_4  -0.3
## beta_5  -0.3
## beta_6  -0.3
## beta_7   0.0
## psi_2    0.0
## eta_1    1.0
## eta_2    0.0
## eta_3    0.0
## eta_4    0.0
## eta_5    0.2
## eta_6    0.2
## eta_7    0.2

In this example X1 is positively correlated with X2 − X4 (ρ = 0.8, 0.6, 0.3) and negatively correlated with X5 − X7 (ρ = −0.3 − 0.3, −0.3). In this setting, the parameter ψ1 will equal the sum of the β1 − β7 coefficients (0.8), ψ2 is given directly by the data generation model (0.0), and ψ3 will equal the sum of the η coefficients (1.6). Simulating data to fit this model is available within a the simdata_quantized_emm function in the qgcompint package. Here, we simulate data using a binary modifier and inspect the correlation matrix to see that the estimated correlation matrix is approximately the same as the correlation of the data generation mechanism. These will converge in large sample sizes, but for a sample size of 200, the estimated coefficients will differ from those we simulate under due to random variation (set the sample size to 10000 in this example to confirm).

 library(qgcompint)
 set.seed(42)
 dat1 <- simdata_quantized_emm(
  outcometype="continuous",
# sample size
  n = 300,
# correlation between x1 and x2, x3, ...
  corr=c(0.8, 0.6, 0.3, -0.3, -0.3, -0.3),
# model intercept
  b0=0,
# linear model coefficients for x1, x2, ... at referent level of interacting variable
  mainterms=c(0.3, -0.1, 0.1, 0.0, 0.3, 0.1, 0.1),
# linear model coefficients for product terms between x1, x2, ... and interacting variable  
  prodterms = c(1.0, 0.0, 0.0, 0.0, 0.2, 0.2, 0.2),
# type of interacting variable
  ztype = "binary",
# number of levels of exposure
  q = 4,
# residual variance of y
  yscale = 2.0                            
)
names(dat1)[which(names(dat1)=="z")] = "M"

## data
head(dat1)
##   M x1 x2 x3 x4 x5 x6 x7        y
## 1 1  0  1  0  0  2  3  3 3.231033
## 2 1  0  3  0  0  3  2  1 1.685041
## 3 0  1  1  1  0  2  1  2 2.459104
## 4 1  2  2  0  2  3  1  1 4.750329
## 5 1  1  1  3  1  3  1  3 2.756988
## 6 1  1  0  1  1  3  0  2 4.711521
## modifier
table(dat1$M)
## 
##   0   1 
## 151 149
## outcome
summary(dat1$y)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -4.9302  0.7028  2.4380  2.5628  4.2134 10.0813
## exposure correlation
cor(dat1[, paste0("x", 1:7)])
##            x1         x2         x3          x4          x5          x6
## x1  1.0000000  0.8240000  0.6213333  0.33866667 -0.34666667 -0.28266667
## x2  0.8240000  1.0000000  0.4800000  0.33600000 -0.32533333 -0.23200000
## x3  0.6213333  0.4800000  1.0000000  0.26133333 -0.20800000 -0.23733333
## x4  0.3386667  0.3360000  0.2613333  1.00000000 -0.12800000 -0.03200000
## x5 -0.3466667 -0.3253333 -0.2080000 -0.12800000  1.00000000  0.09066667
## x6 -0.2826667 -0.2320000 -0.2373333 -0.03200000  0.09066667  1.00000000
## x7 -0.3653333 -0.3626667 -0.2346667 -0.04533333  0.16800000  0.12000000
##             x7
## x1 -0.36533333
## x2 -0.36266667
## x3 -0.23466667
## x4 -0.04533333
## x5  0.16800000
## x6  0.12000000
## x7  1.00000000

Basics: fitting a model with a modifier

Here we see that qgcomp (via the function qgcomp.emm.glm.noboot) estimates a ψ1 fairly close to 0.8 (estimate = 0.7) (again, as we increase sample size, the estimated value will be expected to become increasingly close to the true value). The product term ‘M:mixture’ is the ψ3 parameter noted above, which is also fairly close to the true value of 1.6 (estimate = 1.7).

For binary modifiers, qgcomp.emm.glm.noboot will also estimate the joint effect of the mixture in strata of the modifier. Here, the effect of the mixture at M = 0 is given by ψ1, whereas the effect of the mixture at M = 1 is estimated below the coefficient table (and is here given by ψ1 + ψ3 = 2.4).

qfit1 <- qgcomp.emm.glm.noboot(y~x1+x2+x3+x4+x5+x6+x7,
  data = dat1,
  expnms = paste0("x", 1:7),
  emmvar = "M",
  q = 4)
qfit1
## ## Qgcomp weights/partial effects at M = 0
## Scaled effect size (positive direction, sum of positive effects = 1.05)
##    x4    x1    x5    x7 
## 0.393 0.237 0.219 0.151 
## Scaled effect size (negative direction, sum of negative effects = -0.376)
##    x6    x3    x2 
## 0.449 0.435 0.116 
## 
## 
## ## Qgcomp weights/partial effects at M = 
## Scaled effect size (positive direction, sum of positive effects = 0)
## None
## Scaled effect size (negative direction, sum of negative effects = 0)
## None
## 
## 
## ## Mixture slope parameters (delta method CI):
## 
##             Estimate Std. Error  Lower CI Upper CI t value Pr(>|t|)
## (Intercept)  0.14559    0.61921 -1.068034   1.3592  0.2351  0.81428
## psi1         0.67542    0.39146 -0.091832   1.4427  1.7254  0.08555
## M            0.36067    0.87113 -1.346720   2.0681  0.4140  0.67917
## M:mixture    1.68390    0.56052  0.585307   2.7825  3.0042  0.00290
## 
## Estimate (CI), M=1: 
## 0.67542 (-0.091832, 1.4427)

Basics: getting bounds for pointwise comparisons

As in qgcomp you can estimate pointwise comparisons along the joint regression line. Here we estimate them at both values of M (via the emmval parameter).

pointwisebound(qfit1, emmval=0)
##   quantile quantile.midpoint        hx mean.diff   se.diff     ll.diff  ul.diff
## 1        0             0.125 0.1455920 0.0000000 0.0000000  0.00000000 0.000000
## 2        1             0.375 0.8210108 0.6754188 0.3914617 -0.09183203 1.442670
## 3        2             0.625 1.4964296 1.3508376 0.7829234 -0.18366407 2.885339
## 4        3             0.875 2.1718484 2.0262564 1.1743851 -0.27549610 4.328009
##   emm_level
## 1         0
## 2         0
## 3         0
## 4         0
pointwisebound(qfit1, emmval=1)
##   quantile quantile.midpoint        hx mean.diff   se.diff  ll.diff  ul.diff
## 1        0             0.125 0.5062631  0.000000 0.0000000 0.000000 0.000000
## 2        1             0.375 2.8655840  2.359321 0.4011708 1.573041 3.145601
## 3        2             0.625 5.2249050  4.718642 0.8023416 3.146081 6.291203
## 4        3             0.875 7.5842259  7.077963 1.2035125 4.719122 9.436804
##   emm_level
## 1         1
## 2         1
## 3         1
## 4         1

Basics: plotting weights (weights are at referent level of modifier)

For qgcomp.emm.glm.noboot fits, a set of “weights” will be given that are interpreted as the proportion of a “partial” effect for each variable. That is, ψ1 will represent the joint effect of multiple exposures, some of which will have independent effects that are positive, and some will have negative independent effects. For example, the “negative partial effect” is simply the sum of all of the negative independent effects (this is only given for a model in which all exposures are included via linear terms and no interactions among exposures occur). These weights are conditional on the fitted model, and so are not “estimates” per se and will not have associated confidence intervals. Nonetheless, the weights are useful for interpretation of the joint effect.

Notably, with product terms in the model for the joint effect, a different set of weights will be generated at every value of the modifier. Here, we can plot the weights at M=0 and M=1.

plot(qfit1, emmval=0)
Weights for M = 0

Weights for M = 0

plot(qfit1, emmval=1)
Weights for M = 1

Weights for M = 1

Basics: bootstrapping

For non-linear qgcomp fits, or to get marginal estimates of the exposure-response curve (i.e. conditional only on the modifier), we can use the qgcomp.emm.boot function. Here we just repeat the original fit, which yields similar evidence that there is substantial statistical interaction on the additive scale, so that we would expect the joint exposure effect estimate to be greater for M=1 than for M=0, which corroborates the fit above (the point estimates are identical, as expected in this case due to no non-modifier covariates, so this is not a surprise).

qfit1b <- qgcomp.emm.boot(y~x1+x2+x3+x4+x5+x6+x7,
  data=dat1,
  expnms = paste0("x", 1:7),
  emmvar = "M",
  q = 4)
qfit1b
## ## Mixture slope parameters (bootstrap CI):
## 
##             Estimate Std. Error  Lower CI Upper CI t value Pr(>|t|)
## (Intercept)  0.14559    0.60441 -1.039024   1.3302  0.2409 0.809819
## psi1         0.67542    0.38573 -0.080604   1.4314  1.7510 0.081029
## M            0.36067    0.81015 -1.227190   1.9485  0.4452 0.656522
## M:mixture    1.68390    0.52371  0.657445   2.7104  3.2153 0.001454

Plotting bootstrapped fits (plotting predictions)

plot(qfit1b, emmval=0)
Pointwise comparisons for M = 0

Pointwise comparisons for M = 0

plot(qfit1b, emmval=1)
Pointwise comparisons for M = 1

Pointwise comparisons for M = 1

Overlay plots of predictions at multiple modifier levels

Visually, it’s easier to compare two plots with the same y-axis. Here we can see that the joint regression curve is steeper at M=1 than it is at M=0. You will get some notes from ggplot2 about replacing scales, which can be ignored or suppressed using suppressMessages(print( <ggplot statements > )). Visually, you can see substantial evidence for interaction due to the diverging slopes and non-overlapping confidence intervals.

library(ggplot2) # need to explicitly call ggplot
pp0b = plot(qfit1b, emmval=0, suppressprint=TRUE, geom_only=TRUE, 
             pointwisebars=TRUE, modelfitline=FALSE, modelband=FALSE)
pp1b = plot(qfit1b, emmval=1, suppressprint=TRUE, geom_only=TRUE, 
             pointwisebars=TRUE, modelfitline=FALSE, modelband=FALSE)
# relabel the plots by directly modifying the ggplot2 geometries.
# "col" is a column that sets the color
# "point" and "errorbar" are names given to ggplot2::geom_point and 
# ggplot2::geom_error bar geometries
pp0b$point$data$col = "M=0"
pp1b$point$data$col = "M=1"
pp0b$errorbar$data$col = "M=0"
pp1b$errorbar$data$col = "M=1"
# jitter along the x axis so the points do not overlap
pp0b$point$data$x = pp0b$point$data$x - 0.01 
pp1b$point$data$x = pp1b$point$data$x + 0.01 
pp0b$errorbar$data$x = pp0b$errorbar$data$x - 0.01 
pp1b$errorbar$data$x = pp1b$errorbar$data$x + 0.01 



suppressMessages(print(
  ggplot() + pp0b + pp1b + scale_colour_viridis_d(name="") + 
    labs(title="Bootstrap approach")
))
Pointwise comparisons (same scale) for M = 0

Pointwise comparisons (same scale) for M = 0

Basics: Estimating equations

Another approach to non-linear qgcomp fits, or to get marginal estimates of the exposure-response curve (i.e. conditional only on the modifier), we can use the qgcomp.emm.glm.ee function. This approach uses estimating equation methodology for fitting. Estimating equation methodology also underlies GEE (generalized estimating equation) methods, which are frequently employed in longitudinal data analysis because they can yield appropriate standard errors for clustered data. In qgcomp.emm.glm.ee, the standard errors are also cluster robust (as long as the id variable is specified), but it can also be used to obtain non-linear fits in the same way that the bootstrapped estimator is above. Relative to the bootstrapped estimator, the estimating equations estimator will be less computationally intensive and will yield standard errors that are not subject to simulation error (and thus will be more repeatable). In general, the estimating equation method should be preferable for many of the cases in which one might be considering the bootstrapped estimator. Because standard errors are estimated differently, you will see differences in confidence intervals between qgcomp.emm.glm.ee and qgcomp.emm.glm.noboot even when the models are otherwise identical.

Here we just repeat the analysis with the bootstrapped fit, which yields similar evidence that there is substantial statistical interaction on the additive scale, so that we would expect the joint exposure effect estimate to be greater for M=1 than for M=0, which corroborates the fit above (the point estimates are identical, as expected in this case due to no non-modifier covariates, so this is not a surprise).

qfit1ee <- qgcomp.emm.glm.ee(y~x1+x2+x3+x4+x5+x6+x7,
  data=dat1,
  expnms = paste0("x", 1:7),
  emmvar = "M",
  q = 4)
qfit1ee
## ## Mixture slope parameters (robust CI):
## 
##             Estimate Std. Error  Lower CI Upper CI t value Pr(>|t|)
## (Intercept)  0.14559    0.56468 -0.961162   1.2523  0.2578 0.796725
## psi1         0.67542    0.36103 -0.032196   1.3830  1.8708 0.062407
## M            0.36067    0.77844 -1.165035   1.8864  0.4633 0.643485
## mixture:M    1.68390    0.50114  0.701696   2.6661  3.3602 0.000886
## 
## Estimate (CI), M=1: 
## 0.67542 (-0.032196, 1.383)

Plotting predictions from estimating equation fits (plotting predictions)

Here we skip to overlaying the two plots

pp0ee = plot(qfit1b, emmval=0, suppressprint=TRUE, geom_only=TRUE, 
             pointwisebars=TRUE, modelfitline=FALSE, modelband=FALSE)
pp1ee = plot(qfit1b, emmval=1, suppressprint=TRUE, geom_only=TRUE, 
             pointwisebars=TRUE, modelfitline=FALSE, modelband=FALSE)
# relabel the plots by directly modifying the ggplot2 geometries.
# "col" is a column that sets the color
# "point" and "errorbar" are names given to ggplot2::geom_point and 
# ggplot2::geom_error bar geometries
pp0ee$point$data$col = "M=0"
pp1ee$point$data$col = "M=1"
pp0ee$errorbar$data$col = "M=0"
pp1ee$errorbar$data$col = "M=1"
# jitter along the x axis so the points do not overlap
pp0ee$point$data$x = pp0ee$point$data$x - 0.01 
pp1ee$point$data$x = pp1ee$point$data$x + 0.01 
pp0ee$errorbar$data$x = pp0ee$errorbar$data$x - 0.01 
pp1ee$errorbar$data$x = pp1ee$errorbar$data$x + 0.01 



suppressMessages(print(
  ggplot() + pp0ee + pp1ee + scale_colour_viridis_d(name="") + 
    labs(title="Estimating equations approach")
))
Pointwise comparisons for M = 0

Pointwise comparisons for M = 0

Basics: categorical modifier, binary outcome

Now we can simulate data under a categorical modifier, and the simdata_quantized_emm function will pick some convenient defaults. We will also use a binary outcome with N=300 to allow that use of a categorical modifier with a rare binary outcome will be subject to low power and potential for bootstrapping to fail due to empty strata in some bootstrap samples.

Simulated data defaults

Here is a good place to note that simdata_quantized_emm is provided mainly as a learning tool. While it could potentially be used for simulations in a scientific publication, it likely has too many defaults that are not user controllable that may be useful to be able to change. Some of the underlying code that may serve as inspiration for more comprehensive simulations can be explored via the “hidden” (non-exported) functions qgcompint:::.quantized_design_emm, qgcompint:::.dgm_quantized_linear_emm, qgcompint:::.dgm_quantized_logistic_emm, and qgcompint:::.dgm_quantized_survival_emm.

 set.seed(23)
 dat2 <- simdata_quantized_emm(
  outcometype="logistic",
# sample size
  n = 300,
# correlation between x1 and x2, x3, ...
  corr=c(0.6, 0.5, 0.3, -0.3, -0.3, 0.0),
# model intercept
  b0=-2,
# linear model coefficients for x1, x2, ... at referent level of interacting variable
  mainterms=c(0.1, -0.1, 0.1, 0.0, 0.1, 0.1, 0.1),
# linear model coefficients for product terms between x1, x2, ... and interacting variable  
  prodterms = c(0.2, 0.0, 0.0, 0.0, 0.2, -0.2, 0.2),
# type of interacting variable
  ztype = "categorical",
# number of levels of exposure
  q = 4,
# residual variance of y
  yscale = 2.0                            
)

## data
head(dat2)
##   z x1 x2 x3 x4 x5 x6 x7 y
## 1 0  2  2  2  2  3  0  3 0
## 2 2  0  2  0  3  1  3  3 0
## 3 0  3  0  3  1  0  3  0 0
## 4 2  2  2  2  3  1  1  0 0
## 5 2  3  3  0  3  0  1  1 0
## 6 0  3  0  3  1  3  1  3 0
## modifier
table(dat2$z)
## 
##   0   1   2 
## 109  92  99
## outcome
table(dat2$y)
## 
##   0   1 
## 205  95
## exposure correlation
cor(dat2[, paste0("x", 1:7)])
##             x1          x2         x3           x4           x5           x6
## x1  1.00000000  0.53866667  0.5493333  0.210666667 -0.272000000 -0.285333333
## x2  0.53866667  1.00000000  0.2693333  0.186666667 -0.130666667 -0.205333333
## x3  0.54933333  0.26933333  1.0000000  0.109333333 -0.186666667 -0.184000000
## x4  0.21066667  0.18666667  0.1093333  1.000000000 -0.069333333 -0.005333333
## x5 -0.27200000 -0.13066667 -0.1866667 -0.069333333  1.000000000  0.005333333
## x6 -0.28533333 -0.20533333 -0.1840000 -0.005333333  0.005333333  1.000000000
## x7 -0.01066667 -0.05066667 -0.0480000  0.029333333  0.029333333  0.002666667
##              x7
## x1 -0.010666667
## x2 -0.050666667
## x3 -0.048000000
## x4  0.029333333
## x5  0.029333333
## x6  0.002666667
## x7  1.000000000

The wrong way to include categorical modifiers

Below is one way to fit qgcomp.emm.glm.noboot with a categorical modifier that exactly follows the previous code. This approach to categorical modifiers is incorrect, in this case, due to the format of these data. Note if you fit the model like this, where your categorical modifier is not the proper data type, qgcomp.emm.glm.noboot will assume you have a continuous modifier.

qfit.wrong <- qgcomp.emm.glm.noboot(y~x1+x2+x3+x4+x5+x6+x7,
  data = dat2,
  expnms = paste0("x", 1:7),
  emmvar = "z",
  q = 4, family=binomial())
qfit.wrong
## ## Qgcomp weights/partial effects at z = 0
## Scaled effect size (positive direction, sum of positive effects = 1.1)
##     x5     x3     x1     x6 
## 0.3881 0.3025 0.2401 0.0694 
## Scaled effect size (negative direction, sum of negative effects = -0.214)
##    x2    x7    x4 
## 0.695 0.171 0.134 
## 
## 
## ## Mixture log(OR) (delta method CI):
## 
##             Estimate Std. Error Lower CI Upper CI Z value Pr(>|z|)
## (Intercept) -2.72336    0.92407 -4.53451 -0.91221 -2.9471 0.003207
## psi1         0.88157    0.56080 -0.21757  1.98071  1.5720 0.115949
## z           -0.43426    0.71890 -1.84329  0.97476 -0.6041 0.545799
## z:mixture    0.65881    0.44430 -0.21199  1.52961  1.4828 0.138123

The right way to include categorical modifiers (use as.factor())

Instead, you should convert each categorical modifier to a “factor” prior to fitting the model. Here you can see the output for both the non-bootstrapped fit and the fit with bootstrapped confidence intervals (since there are no other covariates in the model, these two approaches estimate the same marginal effect and parameter log-odds ratio estimates will be identical). Here we see that we get a unique interaction term and main effect for each level of the modifier z.

dat2$zfactor = as.factor(dat2$z)
# using asymptotic-based confidence intervals
qfit2 <- qgcomp.emm.glm.noboot(y~x1+x2+x3+x4+x5+x6+x7,
  data = dat2,
  expnms = paste0("x", 1:7),
  emmvar = "zfactor",
  q = 4, family=binomial())
# using bootstrap based confidence intervals (estimate a)
set.seed(12312)
qfit2b <- qgcomp.emm.boot(y~x1+x2+x3+x4+x5+x6+x7,
  data = dat2,
  expnms = paste0("x", 1:7),
  emmvar = "zfactor",
  B = 10, # set higher when using in real data
  q = 4, family=binomial(), rr = FALSE)
# using estimating equation based confidence intervals (estimate ee)
set.seed(12312)
qfit2ee <- qgcomp.emm.ee(y~x1+x2+x3+x4+x5+x6+x7,
  data = dat2,
  expnms = paste0("x", 1:7),
  emmvar = "zfactor",
  q = 4, family=binomial(), rr = FALSE)
qfit2
## ## Qgcomp weights/partial effects at zfactor = 0
## Scaled effect size (positive direction, sum of positive effects = 1.15)
##     x5     x3     x1     x6 
## 0.3914 0.3857 0.2014 0.0215 
## Scaled effect size (negative direction, sum of negative effects = -0.382)
##    x7    x4    x2 
## 0.427 0.351 0.222 
## 
## 
## ## Mixture log(OR) (delta method CI):
## 
##                  Estimate Std. Error Lower CI Upper CI Z value Pr(>|z|)
## (Intercept)      -2.80404    1.08650 -4.93355 -0.67454 -2.5808 0.009857
## psi1              0.76337    0.65463 -0.51969  2.04642  1.1661 0.243576
## zfactor1         -0.89742    1.55815 -3.95134  2.15651 -0.5759 0.564650
## zfactor1:mixture  1.46319    0.97182 -0.44155  3.36793  1.5056 0.132166
## zfactor2         -0.52448    1.48432 -3.43369  2.38474 -0.3533 0.723830
## zfactor2:mixture  1.09584    0.90958 -0.68691  2.87859  1.2048 0.228290
qfit2b
## ## Mixture log(OR) (bootstrap CI):
## 
##                  Estimate Std. Error Lower CI Upper CI Z value Pr(>|z|)
## (Intercept)      -2.80404    1.26019 -5.27396 -0.33413 -2.2251  0.02607
## psi1              0.76337    0.66973 -0.54927  2.07600  1.1398  0.25436
## zfactor1         -0.89742    1.94264 -4.70493  2.91009 -0.4620  0.64411
## zfactor2         -0.52448    1.86269 -4.17529  3.12634 -0.2816  0.77827
## zfactor1:mixture  1.46319    1.12314 -0.73812  3.66450  1.3028  0.19265
## zfactor2:mixture  1.09584    1.09279 -1.04599  3.23768  1.0028  0.31596
qfit2ee
## ## Mixture log(OR) (robust CI):
## 
##                  Estimate Std. Error Lower CI Upper CI Z value Pr(>|z|)
## (Intercept)      -2.80404    1.04140 -4.84515 -0.76293 -2.6926  0.00709
## psi1              0.76337    0.62774 -0.46699  1.99372  1.2160  0.22397
## zfactor1         -0.89742    1.46829 -3.77522  1.98039 -0.6112  0.54107
## zfactor2         -0.52448    1.50004 -3.46451  2.41555 -0.3496  0.72661
## mixture:zfactor1  1.46319    0.90814 -0.31674  3.24312  1.6112  0.10714
## mixture:zfactor2  1.09584    0.91844 -0.70428  2.89596  1.1932  0.23281

Bounds for pointwise comparisons with categorical modifiers

Here are some miscellaneous functions for getting point estimates and bounds for various comparisons at specific values of the modifier.

## output the weights at Z=0
getstratweights(qfit2, emmval=0)
## ## Qgcomp weights/partial effects at zfactor = 0
## Scaled effect size (positive direction, sum of positive effects = 1.15)
##     x5     x3     x1     x6 
## 0.3914 0.3857 0.2014 0.0215 
## Scaled effect size (negative direction, sum of negative effects = -0.382)
##    x7    x4    x2 
## 0.427 0.351 0.222
## output pointwise comparisons at Z=0
pointwisebound(qfit2, emmval=0)
##   quantile quantile.midpoint         hx       or   se.lnor     ll.or      ul.or
## 1        0             0.125 -2.8040444 1.000000 0.0000000 1.0000000   1.000000
## 2        1             0.375 -2.0406788 2.145485 0.6546337 0.5947033   7.740173
## 3        2             0.625 -1.2773131 4.603106 1.3092673 0.3536720  59.910280
## 4        3             0.875 -0.5139474 9.875896 1.9639010 0.2103299 463.715942
##   emm_level
## 1         0
## 2         0
## 3         0
## 4         0
## plot weights at Z=0
plot(qfit2, emmval=0)

## output stratum specific joint effect estimate for the mixture at Z=2
print(getstrateffects(qfit2, emmval=2))
## Joint effect at zfactor=2
##         Estimate Std. Error Lower CI Upper CI z value Pr(>|z|)
## Mixture   1.8592     0.6315  0.62148   3.0969  2.9441 0.003239
## output the weights at Z=2
print(getstratweights(qfit2, emmval=2))
## ## Qgcomp weights/partial effects at zfactor = 2
## Scaled effect size (positive direction, sum of positive effects = 1.87)
##      x5      x1      x2      x3      x7      x6 
## 0.41602 0.38949 0.09062 0.07037 0.02519 0.00831 
## Scaled effect size (negative direction, sum of negative effects = -0.00714)
## x4 
##  1
## output pointwise comparisons at Z=2
pointwisebound(qfit2, emmval=2)
##   quantile quantile.midpoint         hx         or  se.lnor    ll.or      ul.or
## 1        0             0.125 -3.3285212   1.000000 0.000000 1.000000     1.0000
## 2        1             0.375 -1.4693122   6.418658 0.631504 1.861688    22.1300
## 3        2             0.625  0.3898968  41.199165 1.263008 3.465884   489.7369
## 4        3             0.875  2.2491058 264.443333 1.894512 6.452396 10837.8769
##   emm_level
## 1         2
## 2         2
## 3         2
## 4         2
plot(qfit2, emmval=2)

## output stratum specific joint effect estimate for the mixture at Z=2 from bootstrapped fit
print(getstrateffects(qfit2b, emmval=2))
## Joint effect at zfactor=2
##         Estimate Std. Error Lower CI Upper CI z value Pr(>|z|)
## Mixture   1.8592     0.6315  0.62148   3.0969  2.9441 0.003239
## output pointwise comparisons at Z=2 from bootstrapped fit
print(pointwisebound(qfit2b, emmval=2))
##   quantile quantile.midpoint         hx         or   se.lnor    ll.or
## 1        0             0.125 -3.3285212   1.000000 0.0000000 1.000000
## 2        1             0.375 -1.4693122   6.418658 0.8057764 1.323019
## 3        2             0.625  0.3898968  41.199165 1.6115527 1.750380
## 4        3             0.875  2.2491058 264.443333 2.4173291 2.315786
##         ul.or ll.linpred ul.linpred emm_level
## 1     1.00000  -3.328521 -3.3285212         2
## 2    31.14026  -3.048605  0.1099805         2
## 3   969.71606  -2.768689  3.5484821         2
## 4 30197.21448  -2.488772  6.9869838         2
## output modelwise confidence bounds at Z=2 from bootstrapped fit
print(modelbound(qfit2b, emmval=2))
##   quantile quantile.midpoint    linpred          o     se.pw      ll.pw
## 1        0             0.125 -3.3285212 0.03584608 1.3936703 0.00233425
## 2        1             0.375 -1.4693122 0.23008368 0.6366945 0.06605877
## 3        2             0.625  0.3898968 1.47682837 0.4086489 0.66296013
## 4        3             0.875  2.2491058 9.47925560 1.1077721 1.08102783
##        ul.pw    ll.simul   ul.simul emm_level
## 1  0.5504729 0.003315715  0.2510315         2
## 2  0.8013849 0.083425466  0.4932845         2
## 3  3.2898238 0.871064034  2.9511398         2
## 4 83.1211592 1.904740357 52.8130752         2
## Plot pointwise comparisons at Z=2 from bootstrapped fit
plot(qfit2b, emmval=2)

Overlay plots of predictions at multiple modifier levels

Visually, it’s easier to compare predictions across all levels of the EMM variable on the same plot. This plot looks at bounds on the marginal structural regression model line.

library(ggplot2) # need to explicitly call ggplot
pp0ee = plot(qfit2ee, emmval=0, suppressprint=TRUE, geom_only=TRUE, 
             modelband=TRUE, pointwisebars=FALSE, modelfitline=TRUE)
pp1ee = plot(qfit2ee, emmval=1, suppressprint=TRUE, geom_only=TRUE, 
             modelband=TRUE, pointwisebars=FALSE, modelfitline=TRUE)
pp2ee = plot(qfit2ee, emmval=2, suppressprint=TRUE, geom_only=TRUE, 
             modelband=TRUE, pointwisebars=FALSE, modelfitline=TRUE)
# relabel the plots uisn
pp0ee$line$data$col = "Z=0"
pp1ee$line$data$col = "Z=1"
pp2ee$line$data$col = "Z=2"
pp0ee$ribbon$data$col = "Z=0"
pp1ee$ribbon$data$col = "Z=1"
pp2ee$ribbon$data$col = "Z=2"
# jitter along the x axis so the points do not overlap
pp0ee$line$data$x = pp0ee$line$data$x - 0.01 
pp2ee$line$data$x = pp2ee$line$data$x + 0.01 
pp0ee$ribbon$data$x = pp0ee$ribbon$data$x - 0.01 
pp2ee$ribbon$data$x = pp2ee$ribbon$data$x + 0.01 

ggplot() + pp0ee + pp1ee + pp2ee + scale_color_viridis_d(name="") + scale_fill_viridis_d(name="", alpha=0.25)
## Scale for y is already present.
## Adding another scale for y, which will replace the existing scale.
## Scale for x is already present.
## Adding another scale for x, which will replace the existing scale.
## Scale for fill is already present.
## Adding another scale for fill, which will replace the existing scale.
## Scale for colour is already present.
## Adding another scale for colour, which will replace the existing scale.
## Scale for y is already present.
## Adding another scale for y, which will replace the existing scale.
## Scale for x is already present.
## Adding another scale for x, which will replace the existing scale.
## Scale for fill is already present.
## Adding another scale for fill, which will replace the existing scale.
## Scale for colour is already present.
## Adding another scale for colour, which will replace the existing scale.
## Scale for colour is already present.
## Adding another scale for colour, which will replace the existing scale.
## Scale for fill is already present.
## Adding another scale for fill, which will replace the existing scale.
Pointwise comparisons (same scale) for M = 0

Pointwise comparisons (same scale) for M = 0

Global tests of interaction for categorical modifiers

It is possible to obtain a global test of (no) interaction for a multi-level modifier using the qgcomp package and the anova function. You need to fit a model without interaction (via qgcomp) and then with interaction. The test is done on the “underlying” conditional model corresponding to the original data, rather than the marginal structural model. The methods for qgcomp.glm.[no]boot and qgcomp.emm.glm.[no]boot utilize the anova.glm function from the stats package (note the slightly different specification), while the methods for qgcomp.glm.ee and qgcomp.emm.glm.ee use anova.geeglm function methods from the geepack R package. Both tests return similar (not identical) results.

library("qgcomp")
dat2$zfactor = as.factor(dat2$z)
catfitoverall <- qgcomp.glm.noboot(y~x1+x2+x3+x4+x5+x6+x7,
  data = dat2,
  expnms = paste0("x", 1:7),
  q = 4, family=binomial())
catfitemm <- qgcomp.emm.glm.noboot(y~x1+x2+x3+x4+x5+x6+x7,
  data = dat2,
  expnms = paste0("x", 1:7),
  emmvar = "zfactor",
  q = 4, family=binomial())

catfitoverallee <- qgcomp.glm.ee(y~x1+x2+x3+x4+x5+x6+x7,
  data = dat2,
  expnms = paste0("x", 1:7),
  q = 4, family=binomial())
catfitemmee <- qgcomp.emm.glm.ee(y~x1+x2+x3+x4+x5+x6+x7,
  data = dat2,
  expnms = paste0("x", 1:7),
  emmvar = "zfactor",
  q = 4, family=binomial(), rr=FALSE)


anova(catfitoverall$fit, catfitemm$fit, test = "Chisq")
## Analysis of Deviance Table
## 
## Model 1: y ~ x1 + x2 + x3 + x4 + x5 + x6 + x7
## Model 2: y ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x1 * zfactor1 + x2 * zfactor1 + 
##     x3 * zfactor1 + x4 * zfactor1 + x5 * zfactor1 + x6 * zfactor1 + 
##     x7 * zfactor1 + x1 * zfactor2 + x2 * zfactor2 + x3 * zfactor2 + 
##     x4 * zfactor2 + x5 * zfactor2 + x6 * zfactor2 + x7 * zfactor2
##   Resid. Df Resid. Dev Df Deviance Pr(>Chi)   
## 1       292     338.93                        
## 2       276     306.69 16   32.243 0.009295 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(catfitemmee, catfitoverallee)
## Analysis of 'Wald statistic' Table
## 
## Model 1 y ~ x1 + x2 + x3 + x4 + x5 + x6 + x7, expnms: x1, x2, x3, x4, x5, x6, x7, EMM: zfactor 
## Model 2 y ~ x1 + x2 + x3 + x4 + x5 + x6 + x7, expnms: x1, x2, x3, x4, x5, x6, x7
##   Df     X2 P(>|Chi|)  
## 1 16 29.764   0.01927 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Basics: Non-numerical factor modifiers

As above, non-numerical modifiers must be coded as factors. These are straightforward, and this example follows with no commentary.

 set.seed(23)
 dat3 <- simdata_quantized_emm(
  outcometype="continuous",
  n = 300,
  corr=c(0.6, 0.5, 0.3, -0.3, -0.3, 0.0),
  b0=-2,
  mainterms=c(0.1, -0.1, 0.1, 0.0, 0.1, 0.1, 0.1),
  prodterms = c(0.2, 0.0, 0.0, 0.0, 0.2, -0.2, 0.2),
  ztype = "categorical",
  q = 4,
  yscale = 2.0                            
)

dat3$zletter = as.factor(with(dat3, ifelse(z==0, "a", ifelse(z==1, "b", "c"))))
with(dat3, table(z, zletter))
##    zletter
## z     a   b   c
##   0 109   0   0
##   1   0  92   0
##   2   0   0  99
qfit_letter <- qgcomp.emm.glm.noboot(y~x1+x2+x3+x4+x5+x6+x7,
  data = dat3,
  expnms = paste0("x", 1:7),
  emmvar = "zletter",
  q = 4, family=gaussian())

qfit_letteree <- qgcomp.emm.glm.ee(y~x1+x2+x3+x4+x5+x6+x7,
  data = dat3,
  expnms = paste0("x", 1:7),
  emmvar = "zletter",
  q = 4, family=gaussian())

print(qfit_letter)
## ## Qgcomp weights/partial effects at zletter = a
## Scaled effect size (positive direction, sum of positive effects = 1.09)
##     x3     x7     x2     x4     x6 
## 0.4099 0.2452 0.1330 0.1122 0.0997 
## Scaled effect size (negative direction, sum of negative effects = -0.823)
##    x1    x5 
## 0.713 0.287 
## 
## 
## ## Mixture slope parameters (delta method CI):
## 
##                  Estimate Std. Error Lower CI Upper CI t value Pr(>|t|)
## (Intercept)      -2.19374    0.73751 -3.63924 -0.74824 -2.9745 0.003193
## psi1              0.26804    0.46378 -0.64095  1.17704  0.5780 0.563763
## zletterb         -0.48345    1.09771 -2.63492  1.66801 -0.4404 0.659977
## zletterb:mixture  1.08966    0.70934 -0.30062  2.47993  1.5362 0.125639
## zletterc          0.76945    1.02824 -1.24587  2.78477  0.7483 0.454905
## zletterc:mixture  0.59209    0.65586 -0.69337  1.87755  0.9028 0.367432
print(qfit_letteree)
## ## Mixture slope parameters (robust CI):
## 
##                  Estimate Std. Error  Lower CI Upper CI t value Pr(>|t|)
## (Intercept)      -2.19374    0.70559 -3.576665 -0.81082 -3.1091 0.002074
## psi1              0.26804    0.43462 -0.583790  1.11988  0.6167 0.537918
## zletterb         -0.48345    0.92428 -2.295010  1.32811 -0.5231 0.601355
## zletterc          0.76945    0.94934 -1.091227  2.63013  0.8105 0.418349
## mixture:zletterb  1.08966    0.60262 -0.091446  2.27076  1.8082 0.071665
## mixture:zletterc  0.59209    0.60277 -0.589316  1.77349  0.9823 0.326824
## output stratum specific joint effect estimate for the mixture at Zletter="a"
print(getstrateffects(qfit_letter, emmval="a"))
## Joint effect at zletter=a
##         Estimate Std. Error Lower CI Upper CI z value Pr(>|z|)
## Mixture  0.26804    0.46378 -0.64095    1.177   0.578   0.5633
print(getstrateffects(qfit_letteree, emmval="a"))
## Joint effect at zletter=a
##         Estimate Std. Error Lower CI Upper CI z value Pr(>|z|)
## Mixture  0.26804    0.43462 -0.58379   1.1199  0.6167   0.5374
## output the weights at Z=2
print(getstratweights(qfit_letter, emmval="a"))
## ## Qgcomp weights/partial effects at zletter = a
## Scaled effect size (positive direction, sum of positive effects = 1.09)
##     x3     x7     x2     x4     x6 
## 0.4099 0.2452 0.1330 0.1122 0.0997 
## Scaled effect size (negative direction, sum of negative effects = -0.823)
##    x1    x5 
## 0.713 0.287
## output predictions at Zletter="b"
print(pointwisebound(qfit_letter, emmval="b", pointwiseref = 1, alpha=0.05))
##   quantile quantile.midpoint         hx mean.diff   se.diff   ll.diff  ul.diff
## 1        0             0.125 -2.6771928  0.000000 0.0000000 0.0000000 0.000000
## 2        1             0.375 -1.3194888  1.357704 0.5367183 0.3057553 2.409653
## 3        2             0.625  0.0382151  2.715408 1.0734367 0.6115107 4.819305
## 4        3             0.875  1.3959191  4.073112 1.6101550 0.9172660 7.228958
##   emm_level
## 1         b
## 2         b
## 3         b
## 4         b
print(pointwisebound(qfit_letteree, emmval="b", pointwiseref = 1, alpha=0.05))
##   quantile quantile.midpoint         hx mean.diff   se.diff   ll.diff  ul.diff
## 1        0             0.125 -2.6771928  0.000000 0.0000000 0.0000000 0.000000
## 2        1             0.375 -1.3194888  1.357704 0.4174364 0.5395436 2.175864
## 3        2             0.625  0.0382151  2.715408 0.8348728 1.0790873 4.351729
## 4        3             0.875  1.3959191  4.073112 1.2523092 1.6186309 6.527593
##   ll.linpred ul.linpred emm_level
## 1  -2.677193 -2.6771928         b
## 2  -2.137649 -0.5013285         b
## 3  -1.598106  1.6745357         b
## 4  -1.058562  3.8504000         b

Basics: Continuous modifiers

Here we simulate some data, similar to prior datasets, where we use a continuous modifier.

 set.seed(23)
 dat3 <- simdata_quantized_emm(
  outcometype="continuous",
# sample size
  n = 100,
# correlation between x1 and x2, x3, ...
  corr=c(0.8, 0.6, 0.3, -0.3, -0.3, -0.3),
# model intercept
  b0=-2,
# linear model coefficients for x1, x2, ... at referent level of interacting variable
  mainterms=c(0.3, -0.1, 0.1, 0.0, 0.3, 0.1, 0.1),
# linear model coefficients for product terms between x1, x2, ... and interacting variable  
  prodterms = c(1.0, 0.0, 0.0, 0.0, 0.2, 0.2, 0.2),
# type of interacting variable
  ztype = "continuous",
# number of levels of exposure
  q = 4,
# residual variance of y
  yscale = 2.0                            
)
names(dat3)[which(names(dat3)=="z")] = "CoM"

head(dat3)
##          CoM x1 x2 x3 x4 x5 x6 x7          y
## 1  0.1932123  3  3  3  3  0  0  3 -1.0597465
## 2 -0.4346821  2  2  3  2  0  1  1 -4.9330438
## 3  0.9132671  1  1  0  1  2  2  2 -1.8409107
## 4  1.7933881  1  1  2  1  3  2  0  1.6776239
## 5  0.9966051  2  2  0  2  0  1  1 -1.0562066
## 6  1.1074905  1  1  3  0  2  1  0 -0.9831373
summary(dat3$CoM)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## -2.41826 -0.53582  0.04267  0.07475  0.84181  1.97606
summary(dat3$y)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -8.6878 -2.7252 -0.8718 -0.7253  1.2805  8.0743
cor(dat3[, paste0("x", 1:7)])
##        x1     x2     x3     x4     x5     x6     x7
## x1  1.000  0.920  0.632  0.320 -0.400 -0.520 -0.200
## x2  0.920  1.000  0.616  0.320 -0.400 -0.472 -0.136
## x3  0.632  0.616  1.000  0.232 -0.360 -0.288 -0.240
## x4  0.320  0.320  0.232  1.000 -0.040 -0.280  0.032
## x5 -0.400 -0.400 -0.360 -0.040  1.000  0.192  0.072
## x6 -0.520 -0.472 -0.288 -0.280  0.192  1.000  0.024
## x7 -0.200 -0.136 -0.240  0.032  0.072  0.024  1.000

Fits are done as in the previous example (here they are shown without any further commentary)

qfit3 <- qgcomp.emm.glm.noboot(y~x1+x2+x3+x4+x5+x6+x7,
  data = dat3,
  expnms = paste0("x", 1:7),
  emmvar = "CoM",
  q = 4)
qfit3
## ## Qgcomp weights/partial effects at CoM = 0
## Scaled effect size (positive direction, sum of positive effects = 0.93)
##      x3      x1      x7      x6      x4      x5 
## 0.29883 0.24796 0.20602 0.20568 0.03698 0.00453 
## Scaled effect size (negative direction, sum of negative effects = -0.482)
## x2 
##  1 
## 
## 
## ## Mixture slope parameters (delta method CI):
## 
##             Estimate Std. Error Lower CI Upper CI t value Pr(>|t|)
## (Intercept) -1.40872    0.82889 -3.03332  0.21587 -1.6995  0.09292
## psi1         0.44806    0.54005 -0.61041  1.50654  0.8297  0.40907
## CoM         -1.70779    0.95928 -3.58795  0.17236 -1.7803  0.07864
## CoM:mixture  2.78425    0.62382  1.56160  4.00691  4.4633 2.49e-05
qfit3b <- qgcomp.emm.boot(y~x1+x2+x3+x4+x5+x6+x7,
  data = dat3,
  expnms = paste0("x", 1:7),
  emmvar = "CoM",
  B=10,
  q = 4)
qfit3b
## ## Mixture slope parameters (bootstrap CI):
## 
##             Estimate Std. Error Lower CI Upper CI t value  Pr(>|t|)
## (Intercept) -1.40872    0.80304 -2.98265  0.16521 -1.7542   0.08308
## psi1         0.44806    0.50720 -0.54603  1.44215  0.8834   0.37957
## CoM         -1.70779    0.75007 -3.17790 -0.23769 -2.2768   0.02537
## CoM:mixture  2.78425    0.52286  1.75946  3.80905  5.3250 8.461e-07
qfit3ee <- qgcomp.emm.ee(y~x1+x2+x3+x4+x5+x6+x7,
  data = dat3,
  expnms = paste0("x", 1:7),
  emmvar = "CoM",
  q = 4)
qfit3ee
## ## Mixture slope parameters (robust CI):
## 
##             Estimate Std. Error Lower CI  Upper CI t value  Pr(>|t|)
## (Intercept) -1.40872    0.78107 -2.93959  0.122146 -1.8036   0.07493
## psi1         0.44806    0.49066 -0.51361  1.409735  0.9132   0.36379
## CoM         -1.70779    0.83509 -3.34454 -0.071046 -2.0450   0.04402
## mixture:CoM  2.78425    0.54885  1.70853  3.859975  5.0729 2.351e-06

Stratified effects, weights, and pointwise effect comparisons at specific values of a continuous confounder

  • Weights can be obtained for any valid value of the modifier and are available for non-bootstrapped fits.
  • Stratified effects can be obtained for any valid value of the modifier and are available for any fit
  • Point-wise comparisons are also available for any fit. These estimate the predicted outcome as well as confidence intervals for a contrast with a specified level of the joint exposure (here the second lowest quartile group)
## output/plot the weights at CoM=0")
getstratweights(qfit3, emmval=0)
## ## Qgcomp weights/partial effects at CoM = 0
## Scaled effect size (positive direction, sum of positive effects = 0.93)
##      x3      x1      x7      x6      x4      x5 
## 0.29883 0.24796 0.20602 0.20568 0.03698 0.00453 
## Scaled effect size (negative direction, sum of negative effects = -0.482)
## x2 
##  1
plot(qfit3, emmval=0)

## output stratum specific joint effect estimate for the mixture at CoM=0
print(getstrateffects(qfit3, emmval=0))
## Joint effect at CoM=0
##         Estimate Std. Error Lower CI Upper CI z value Pr(>|z|)
## Mixture  0.44806    0.54005 -0.61041   1.5065  0.8297   0.4067
## output pointwise comparisons at CoM=0
print(pointwisebound(qfit3, emmval=0))
##   quantile quantile.midpoint          hx mean.diff   se.diff    ll.diff
## 1        0             0.125 -1.40872323 0.0000000 0.0000000  0.0000000
## 2        1             0.375 -0.96065965 0.4480636 0.5400468 -0.6104086
## 3        2             0.625 -0.51259608 0.8961271 1.0800935 -1.2208173
## 4        3             0.875 -0.06453251 1.3441907 1.6201403 -1.8312259
##    ul.diff emm_level
## 1 0.000000         0
## 2 1.506536         0
## 3 3.013072         0
## 4 4.519607         0
## output/plot the weights at the 80%ile of CoM
getstratweights(qfit3, emmval=quantile(dat3$CoM, .8))
## ## Qgcomp weights/partial effects at CoM = 0.975896138278307
## Scaled effect size (positive direction, sum of positive effects = 3.18)
##     x1     x7     x6     x5     x2     x3 
## 0.2402 0.2321 0.2212 0.1308 0.0934 0.0822 
## Scaled effect size (negative direction, sum of negative effects = -0.0196)
## x4 
##  1
plot(qfit3, emmval=quantile(dat3$CoM, .8))

## output stratum specific joint effect estimate for the mixture at the 80%ile of CoM
print(getstrateffects(qfit3, emmval=quantile(dat3$CoM, .8)))
## Joint effect at CoM=0.975896138278307
##         Estimate Std. Error Lower CI Upper CI z value  Pr(>|z|)
## Mixture  3.16521    0.80046   1.5963   4.7341  3.9542 7.678e-05
## output pointwise comparisons at at the 80%ile of CoM
print(pointwisebound(qfit3, emmval=quantile(dat3$CoM, .8), pointwiseref = 2))
##   quantile quantile.midpoint         hx mean.diff   se.diff   ll.diff   ul.diff
## 1        0             0.125 -3.0753528 -3.165207 0.8004573 -4.734075 -1.596340
## 2        1             0.375  0.0898543  0.000000 0.0000000  0.000000  0.000000
## 3        2             0.625  3.2550614  3.165207 0.8004573  1.596340  4.734075
## 4        3             0.875  6.4202685  6.330414 1.6009147  3.192679  9.468149
##   emm_level
## 1 0.9758961
## 2 0.9758961
## 3 0.9758961
## 4 0.9758961

Stratified effects and pointwise comparisons can also be made with bootstrapped/estimating equation fits.

## output stratum specific joint effect estimate for the mixture at CoM=0
#print(getstrateffects(qfit3b, emmval=0)) # commmented out to reduce output
print(getstrateffects(qfit3ee, emmval=0))
## Joint effect at CoM=0
##         Estimate Std. Error Lower CI Upper CI z value Pr(>|z|)
## Mixture  0.44806    0.49066 -0.51361   1.4097  0.9132   0.3611
## output pointwise comparisons at CoM=0
#print(pointwisebound(qfit3b, emmval=0)) # commmented out to reduce output
print(pointwisebound(qfit3ee, emmval=0))
##   quantile quantile.midpoint          hx mean.diff   se.diff   ll.diff  ul.diff
## 1        0             0.125 -1.40872323 0.0000000 0.0000000  0.000000 0.000000
## 2        1             0.375 -0.96065965 0.4480636 0.4906578 -0.513608 1.409735
## 3        2             0.625 -0.51259608 0.8961271 0.9813156 -1.027216 2.819470
## 4        3             0.875 -0.06453251 1.3441907 1.4719734 -1.540824 4.229206
##   ll.linpred   ul.linpred emm_level
## 1  -1.408723 -1.408723226         0
## 2  -1.922331  0.001011944         0
## 3  -2.435939  1.410747113         0
## 4  -2.949547  2.820482283         0
## output stratum specific joint effect estimate for the mixture at the 80%ile of CoM
#print(getstrateffects(qfit3b, emmval=quantile(dat3$CoM, .8))) # commmented out to reduce output
print(getstrateffects(qfit3ee, emmval=quantile(dat3$CoM, .8)))
## Joint effect at CoM=0.975896138278307
##         Estimate Std. Error Lower CI Upper CI z value  Pr(>|z|)
## Mixture  3.16521    0.58465   2.0193   4.3111  5.4138 6.169e-08
## output pointwise comparisons at at the 80%ile of CoM
#print(pointwisebound(qfit3b, emmval=quantile(dat3$CoM, .8))) # commmented out to reduce output
print(pointwisebound(qfit3ee, emmval=quantile(dat3$CoM, .8)))
##   quantile quantile.midpoint         hx mean.diff  se.diff  ll.diff   ul.diff
## 1        0             0.125 -3.0753528  0.000000 0.000000 0.000000  0.000000
## 2        1             0.375  0.0898543  3.165207 0.584652 2.019310  4.311104
## 3        2             0.625  3.2550614  6.330414 1.169304 4.038620  8.622208
## 4        3             0.875  6.4202685  9.495621 1.753956 6.057931 12.933312
##   ll.linpred ul.linpred emm_level
## 1 -3.0753528  -3.075353 0.9758961
## 2 -1.0560426   1.235751 0.9758961
## 3  0.9632676   5.546855 0.9758961
## 4  2.9825777   9.857959 0.9758961
## plot the pointwise effects at CoM=0 (compare bootstrap with estimating equations approach)
## note that the bootstrapped version is not valid because of only using 10 bootstrap iterations
ppb <- plot(qfit3b, emmval=0, suppressprint = TRUE, geom_only = TRUE)
ppee <- plot(qfit3ee, emmval=0, suppressprint = TRUE, geom_only = TRUE)

ppb$point$data$col = "Bootstrap"
ppb$errorbar$data$col = "Bootstrap"
ppee$point$data$col = "EE"
ppee$errorbar$data$col = "EE"

ppee$point$data$x = ppee$point$data$x + 0.01
ppee$errorbar$data$x = ppee$errorbar$data$x + 0.01


ggplot() + ppb + ppee + scale_color_grey(name="Pointwise estimates, 95% CI")
## Scale for x is already present.
## Adding another scale for x, which will replace the existing scale.
## Scale for fill is already present.
## Adding another scale for fill, which will replace the existing scale.
## Scale for colour is already present.
## Adding another scale for colour, which will replace the existing scale.
## Scale for colour is already present.
## Adding another scale for colour, which will replace the existing scale.

Non-linear fits

Non-linear joint effects of a mixture will tend to occur when independent effects of individual exposures are non-linear or when there is interaction on the model scale between exposures. Here is a toy example of allowing a quadratic overall joint effect with an underlying set of interaction terms between x1 and all other exposures. q is set to 8 for this example for reasons described in the qgcomp package vignette. This approach adds an interaction term for both the “main effect” of the mixture and the squared term of the mixture. The coefficients can be intererpreted as any other quadratic/interaction terms (which is to say, with some difficulty). Informally, the CoM:mixture^2 term can be interpreted as the magnitude of the change in the non-linearity of the slope due to a one unit increase in the continuous modifier. Both the bootstrapped and estimating equations formulations can be used to fit non-linear models, with the estimating equations approaches being preferred for computational efficiency.

qfit3bnl <- qgcomp.emm.glm.boot(y~x1+x2+x3+x4+x5+x6+x7 + x1*(x2 + x3 + x4 + x5 + x6 +x7),
  data = dat3,
  expnms = paste0("x", 1:7),
  emmvar = "CoM",
  B = 10, # set higher in practice
  q = 8, degree= 2)

qfit3bnlee <- qgcomp.emm.glm.ee(y~x1+x2+x3+x4+x5+x6+x7 + x1*(x2 + x3 + x4 + x5 + x6 +x7),
  data = dat3,
  expnms = paste0("x", 1:7),
  emmvar = "CoM",
  q = 8, degree= 2)

qfit3bnl
## ## Mixture slope parameters (bootstrap CI):
## 
##                  Estimate  Std. Error    Lower CI    Upper CI t value  Pr(>|t|)
## (Intercept)   -1.8359e+00  9.2572e-01 -3.6503e+00 -2.1515e-02 -1.9832   0.05096
## psi1           4.5330e-01  7.8571e-01 -1.0867e+00  1.9933e+00  0.5769   0.56569
## psi2          -6.6350e-02  1.9034e-01 -4.3940e-01  3.0670e-01 -0.3486   0.72836
## CoM           -1.1514e+00  6.2803e-01 -2.3823e+00  7.9540e-02 -1.8333   0.07067
## CoM:mixture    1.6314e+00  3.2228e-01  9.9971e-01  2.2630e+00  5.0620 2.813e-06
## CoM:mixture^2 -3.5123e-17  3.8621e-17 -1.1082e-16  4.0572e-17 -0.9094   0.36599
qfit3bnlee
## ## Mixture slope parameters (robust CI):
## 
##                     Estimate  Std. Error    Lower CI    Upper CI t value
## (Intercept)      -1.8359e+00  8.3109e-01 -3.4648e+00 -2.0699e-01 -2.2090
## psi1              4.5330e-01  6.5266e-01 -8.2589e-01  1.7325e+00  0.6945
## I(mixture^2)     -6.6350e-02  1.4202e-01 -3.4469e-01  2.1199e-01 -0.4672
## CoM              -1.1514e+00  6.1548e-01 -2.3577e+00  5.4937e-02 -1.8707
## mixture:CoM       1.6314e+00  2.6526e-01  1.1115e+00  2.1513e+00  6.1502
## I(mixture^2):CoM  1.5458e-15  2.8724e-12 -5.6282e-12  5.6313e-12  0.0005
##                   Pr(>|t|)
## (Intercept)        0.03019
## psi1               0.48946
## I(mixture^2)       0.64169
## CoM                0.06524
## mixture:CoM      3.332e-08
## I(mixture^2):CoM   0.99957

Some effect estimation tools are not all enabled for non-linear fits and will produce an error. However, as with previous fits, we can estimate pointwise differences along the quantiles, and plot the marginal structural model regression line at various values of the joint quantized exposures. Here you can see that the regression line is expected to be steeper at higher levels of the modifier, as in previous fits. We can explictly ask for model confidence bands (which are given by the modelbound function), which pull in confidence limits for the regression line based on the bootstrap distribution of estimates.

print(pointwisebound(qfit3bnlee, emmval=-1))
##   quantile quantile.midpoint          hx  mean.diff   se.diff    ll.diff
## 1        0            0.0625  -0.6845197   0.000000 0.0000000   0.000000
## 2        1            0.1875  -1.9289318  -1.244412 0.6167308  -2.453182
## 3        2            0.3125  -3.3060446  -2.621525 1.0442651  -4.668247
## 4        3            0.4375  -4.8158581  -4.131338 1.3590510  -6.795029
## 5        4            0.5625  -6.4583724  -5.773853 1.6883742  -9.083005
## 6        5            0.6875  -8.2335875  -7.549068 2.1876819 -11.836845
## 7        6            0.8125 -10.1415032  -9.456983 2.9701500 -15.278370
## 8        7            0.9375 -12.1821197 -11.497600 4.0756472 -19.485722
##       ul.diff  ll.linpred ul.linpred emm_level
## 1  0.00000000  -0.6845197 -0.6845197        -1
## 2 -0.03564181  -3.1377020 -0.7201616        -1
## 3 -0.57480287  -5.3527666 -1.2593226        -1
## 4 -1.46764731  -7.4795492 -2.1521671        -1
## 5 -2.46469996  -9.7675251 -3.1492197        -1
## 6 -3.26128994 -12.5213652 -3.9458097        -1
## 7 -3.63559648 -15.9628902 -4.3201162        -1
## 8 -3.50947829 -20.1702414 -4.1939980        -1
print(pointwisebound(qfit3bnlee, emmval=-1))
##   quantile quantile.midpoint          hx  mean.diff   se.diff    ll.diff
## 1        0            0.0625  -0.6845197   0.000000 0.0000000   0.000000
## 2        1            0.1875  -1.9289318  -1.244412 0.6167308  -2.453182
## 3        2            0.3125  -3.3060446  -2.621525 1.0442651  -4.668247
## 4        3            0.4375  -4.8158581  -4.131338 1.3590510  -6.795029
## 5        4            0.5625  -6.4583724  -5.773853 1.6883742  -9.083005
## 6        5            0.6875  -8.2335875  -7.549068 2.1876819 -11.836845
## 7        6            0.8125 -10.1415032  -9.456983 2.9701500 -15.278370
## 8        7            0.9375 -12.1821197 -11.497600 4.0756472 -19.485722
##       ul.diff  ll.linpred ul.linpred emm_level
## 1  0.00000000  -0.6845197 -0.6845197        -1
## 2 -0.03564181  -3.1377020 -0.7201616        -1
## 3 -0.57480287  -5.3527666 -1.2593226        -1
## 4 -1.46764731  -7.4795492 -2.1521671        -1
## 5 -2.46469996  -9.7675251 -3.1492197        -1
## 6 -3.26128994 -12.5213652 -3.9458097        -1
## 7 -3.63559648 -15.9628902 -4.3201162        -1
## 8 -3.50947829 -20.1702414 -4.1939980        -1
print(pointwisebound(qfit3bnlee, emmval=1))
##   quantile quantile.midpoint         hx mean.diff   se.diff   ll.diff   ul.diff
## 1        0            0.0625 -2.9872820  0.000000 0.0000000 0.0000000  0.000000
## 2        1            0.1875 -0.9689631  2.018319 0.5689187 0.9032588  3.133379
## 3        2            0.3125  0.9166552  3.903937 0.9454155 2.0509569  5.756918
## 4        3            0.4375  2.6695727  5.656855 1.2133258 3.2787798  8.034930
## 5        4            0.5625  4.2897894  7.277071 1.5180540 4.3017402 10.252403
## 6        5            0.6875  5.7773054  8.764587 2.0299278 4.7860021 12.743173
## 7        6            0.8125  7.1321206 10.119403 2.8520808 4.5294270 15.709378
## 8        7            0.9375  8.3542352 11.341517 4.0054526 3.4909742 19.192060
##   ll.linpred ul.linpred emm_level
## 1 -2.9872820 -2.9872820         1
## 2 -2.0840232  0.1460971         1
## 3 -0.9363252  2.7696355         1
## 4  0.2914977  5.0476476         1
## 5  1.3144582  7.2651206         1
## 6  1.7987201  9.7558907         1
## 7  1.5421450 12.7220963         1
## 8  0.5036922 16.2047781         1
plot(qfit3bnlee, emmval=-1, modelband=TRUE, pointwiseref=4)

plot(qfit3bnlee, emmval=1, modelband=TRUE, pointwiseref=4)

We can look at this fit another way, too, by plotting predictions from the marginal structural model at all observed values of the modifier, which gives a more complete picture of the model than the plots at single values of the modifier. We can create smoothed scatter plot lines (LOESS) at binned values of the modifier to informally look at how the regression line might change over values of the modifier. We can approximate the effect (point estimate only) at a specific value of z by plotting a smooth fit limited a narrow range of the modifier (< -1 or > 1). Here we see that the joint effect of exposures does appear to differ across values of the modifier, but there is little suggestion of non-linearity at either low or high values of the modifier. This informal assessment agrees with intuition based on the estimated coefficients and the standard plots from the qgcompint package.

library(ggplot2)
plotdata = data=data.frame(q=qfit3bnlee$index, ey=qfit3bnlee$y.expected, modifier=qfit3bnlee$emmvar.msm)
ggplot() + 
         geom_point(aes(x=q, y=ey, color=modifier), data=plotdata) + 
         geom_point(aes(x=q, y=ey), color="purple", data=plotdata[plotdata$modifier>1, ], pch=1, cex=3) + 
         geom_smooth(aes(x=q, y=ey), se=FALSE, color="purple", data=plotdata[plotdata$modifier>1, ], method = 'loess', formula='y ~ x') + 
         geom_smooth(aes(x=q, y=ey), se=FALSE, color="red", data=plotdata[plotdata$modifier < -1, ], method = 'loess', formula='y ~ x') + 
         geom_point(aes(x=q, y=ey), color="red", data=plotdata[plotdata$modifier < -1, ], pch=1, cex=3) + 
  theme_classic() + 
  labs(y="Expected outcome", x="Quantile score value (0 to q-1)") + 
  scale_color_continuous(name="Value\nof\nmodifier")

Survival analysis

As with standard qgcomp, the qgcompint package allows assessment of effect measure modification for a Cox proportional hazards model. The simdata_quantized_emm function allows simulation of right censored survival data.

 set.seed(23)
 dat4 <- simdata_quantized_emm(
  outcometype="survival",
# sample size
  n = 200,
# correlation between x1 and x2, x3, ...
  corr=c(0.8, 0.6, 0.3, -0.3, -0.3, -0.3),
# model intercept
  b0=-2,
# linear model coefficients for x1, x2, ... at referent level of interacting variable
  mainterms=c(0.0, -0.1, 0.1, 0.0, 0.3, 0.1, 0.1),
# linear model coefficients for product terms between x1, x2, ... and interacting variable  
  prodterms = c(0.1, 0.0, 0.0, 0.0, -0.2, -0.2, -0.2),
# type of interacting variable
  ztype = "categorical",
# number of levels of exposure
  q = 4,
# residual variance of y
  yscale = 2.0                            
)
dat4$zfactor = as.factor(dat4$z)
head(dat4)
##   z x1 x2 x3 x4 x5 x6 x7     time d zfactor
## 1 0  3  3  3  2  0  0  0 2.544399 1       0
## 2 2  3  3  3  3  3  2  0 3.419209 1       2
## 3 0  2  2  1  2  3  1  1 1.775714 1       0
## 4 2  3  3  3  3  3  3  3 4.000000 0       2
## 5 2  0  0  0  0  3  1  3 3.329749 1       2
## 6 0  0  0  0  0  1  2  3 1.557125 1       0
summary(dat4$zfactor)
##  0  1  2 
## 64 63 73
summary(dat4$time)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.4161  1.9309  2.6441  2.6728  3.7167  4.0000
table(dat4$d) # 30 censored
## 
##   0   1 
##  40 160
cor(dat4[, paste0("x", 1:7)])
##        x1     x2     x3     x4     x5     x6     x7
## x1  1.000  0.776  0.616  0.268 -0.296 -0.328 -0.268
## x2  0.776  1.000  0.468  0.192 -0.200 -0.316 -0.284
## x3  0.616  0.468  1.000  0.228 -0.272 -0.244 -0.284
## x4  0.268  0.192  0.228  1.000 -0.064  0.056 -0.176
## x5 -0.296 -0.200 -0.272 -0.064  1.000  0.032  0.220
## x6 -0.328 -0.316 -0.244  0.056  0.032  1.000  0.004
## x7 -0.268 -0.284 -0.284 -0.176  0.220  0.004  1.000

Fitting a Cox model with a fully linear/additive specification is very similar to other qgcomp “noboot” models, and the same plots/weight estimation/effect estimation functions work on these objects. For now, bootstrapped versions of this model are not available.

qfit4 <- qgcomp.emm.cox.noboot(survival::Surv(time, d)~x1+x2+x3+x4+x5+x6+x7,
  data = dat4,
  expnms = paste0("x", 1:7),
  emmvar = "zfactor",
  q = 4)
## Adding main term for zfactor to the model
qfit4
## ## Qgcomp weights/partial effects at zfactor = 0
## Scaled effect size (positive direction, sum of positive effects = 0.773)
##     x7     x6     x5     x4     x3 
## 0.3457 0.3062 0.1686 0.1063 0.0732 
## Scaled effect size (negative direction, sum of negative effects = -0.142)
##     x2     x1 
## 0.9686 0.0314 
## 
## 
## Mixture log(hazard ratio) (delta method CI):
## 
##                   Estimate Std. Error  Lower CI Upper CI Z value Pr(>|z|)
## psi1              0.630742   0.351180 -0.057559  1.31904  1.7961  0.07248
## zfactor1         -0.465127   0.809987 -2.052673  1.12242 -0.5742  0.56580
## zfactor1:mixture -0.060184   0.519398 -1.078185  0.95782 -0.1159  0.90775
## zfactor2          0.746395   0.797326 -0.816334  2.30913  0.9361  0.34921
## zfactor2:mixture -1.325227   0.525392 -2.354976 -0.29548 -2.5224  0.01166
plot(qfit4, emmval=0)

getstratweights(qfit4, emmval=2)
## ## Qgcomp weights/partial effects at zfactor = 2
## Scaled effect size (positive direction, sum of positive effects = 0.421)
##   x3   x2 
## 0.83 0.17 
## Scaled effect size (negative direction, sum of negative effects = -1.12)
##     x6     x1     x7     x5     x4 
## 0.3468 0.2427 0.2276 0.1250 0.0578
getstrateffects(qfit4, emmval=2)
## Joint effect at zfactor=2
##         Estimate Std. Error Lower CI Upper CI z value Pr(>|z|)
## Mixture -0.69448    0.38722  -1.4534 0.064445 -1.7935  0.07289
pointwisebound(qfit4, emmval=1)
##   quantile quantile.midpoint         hx       hr   se.lnhr     ll.hr    ul.hr
## 1        0             0.125 -0.4651275 1.000000 0.0000000 1.0000000  1.00000
## 2        1             0.375  0.1054305 1.769254 0.3891288 0.8252075  3.79330
## 3        2             0.625  0.6759885 3.130260 0.7782575 0.6809675 14.38912
## 4        3             0.875  1.2465464 5.538224 1.1673863 0.5619395 54.58226
##   emm_level
## 1         1
## 2         1
## 3         1
## 4         1

Frequently asked questions

How does qgcomp/qgcompint address collinearity

In general, t does so by asking a question that is typically not adversely impacted by high correlation/near/full non-collinearity. When possible, some methods allow the option “Bayes=TRUE,” which does perform some very light regularization and can overcome perfect collinearity in a way that allows estimation of the overall effect of exposure, but the independent effects (scaled effects, or weights) will still be subject to issues like variance inflation.

Is there an upper limit in terms of the number of variables I can include?

It depends. Qgcomp/qgcompint is not generally suited for problems when the number of predictors exceeds the sample size. If you try the method, it works, and you obtain reasonable narrow confidence intervals, then the method has been able to handle that number of exposures. Package authors would advocate that if you are working with a problem where the number of predictors exceeds the sample size and are wondering about this package, that you seek out a statistician with experience in those problems that can point out the challenges in such analyses.

Why are estimating equation methods and bootstrapping both used? Is one preferable?

Anything in this package that is available via bootstrapping is also available for the (newer) estimating equation approaches in this package. If your goal is just to estimate parameters, then the estimating equation approaches may be preferable because they are less computationally burdensome and prone to fewer errors like simulation error that may not be in every researcher’s area of expertise for diagnosis.

Why isn’t available in this package even though it is in the qgcomp package?

Developer time. If you see a method in the qgcomp package that you would like to see in this package and could make a good use-case for. Please contact the package developer/maintainer.

How do I use multiple modifiers?

There is currently no simple way to implement multiple, simultaneous modifiers in the qgcompint package. For binary/categorical modifiers, it is straightforward to create a single modifier with distinct value for every unique combination of the modifiers.

See also

The faq in the qgcomp package vignettes

References

Original quantile g-computation paper

Alexander P. Keil, Jessie P. Buckley, Katie M. O’Brien, Kelly K. Ferguson, Shanshan Zhao,Alexandra J. White. A quantile-based g-computation approach to addressing the effects of exposure mixtures. https://doi.org/10.1289/EHP5838

First paper to use qgcompint

Stevens, D.R., Bommarito, P.A., Keil, A.P., McElrath, T.F., Trasande, L., Barrett, E.S., Bush, N.R., Nguyen, R.H., Sathyanarayana, S., Swan, S. and Ferguson, K.K., 2022. Urinary phthalate metabolite mixtures in pregnancy and fetal growth: findings from the infant development and the environment study. Environment international, 163, p.107235.

Acknowledgments

The development of this package was supported by NIH Grant RO1ES02953101 and the NIH intramural research program. Invaluable code testing has been performed by Danielle Stephens and Juwel Rana, among others.