Comparing linear model coefficients to one another

There are some situations where you want to compare one regression model coefficient to another, and can't straightforwardly do that by cleverly setting up the right interactions, coding, etc.

For example, say you have lexical decision reaction times (RT) for nouns and verbs, and you also have recorded those words' lengths and frequencies. If you want to know whether frequency has a bigger impact on RTs for nouns than it does for verbs, that's easy: just regress RTs on frequency, word category, and their interaction (RT ~ Frequency*Category) and the interaction will answer your question. But what if you want to know whether frequency has a bigger impact on RT than length does? That's trickier. You can do RT ~ Frequency + Length, pull out standardized coefficients, and see that the effect of Frequency on RT is "this big" whereas the effect of Length on RT is "that big". But how can you statistically compare these in order to tell if one has a significantly bigger effect than than the other?

The code below is an attempt to solve that problem. The general approach (treating it as a t-test, by taking the difference between coefficients divided by their pooled variance) is something that was suggested to me by someone at my campus's stats help center at the time, and the rest of the details here are just tweaks I've added on over the years. For a long time I've just had this code floating around in an e-mail from like 15 years ago that I dig up again whenever someone asks me about it. But with each passing year it gets harder and harder for me to remember how to find it, so I decided to finally write this down so I [hopefully?] won't lose it. And if it ends up being useful for someone else in the world, too, then that's an added bonus. But use it at your own risk since it's essentially just nonsense I made up when I was a grad student.

The code

The following function will calculate the t-statistic for a comparison between two coefficients in a linear model (such as the kind you get from the lm() function):

compare_coefs <- function( model, coef1, coef2 ){

diff <- coef(model)[coef1] - coef(model)[coef2]
pooledvar <- sqrt( vcov(model)[coef1,coef1] + vcov(model)[coef2,coef2] - 2*vcov(model)[coef1,coef2] )
t <- diff / pooledvar
names(t) <- NULL # this is just for cosmetic purposes, it doesn't actually matter
return(t)

}

To use it, just give it your model, plus the names or indices of the coefficients you want to compare. For example:

m <- lm( Temp ~ Wind + Ozone, airquality )
compare_coefs( m, 2, 3) # Compare the 2nd and 3rd coefficient
compare_coefs( m, "Wind", "Ozone" ) # Compare the "Wind" and "Ozone" coefficients; should get the same value

This will give you a t-statistic for the difference between those two coefficients. From there, you can just calculate the p-value of this statistic based on the degrees of freedom (which you can normally steal from the model summary):

t <- compare_coefs( m, "Wind", "Ozone" )
df <- m$df.residual # summary(m) shows that this is 113
pt( -abs(t), df ) *2 # I use -abs() to force t to be negative, and *2 to get a two-tailed p-value )

Mixed-effects models

If you want to do this to compare coefficients in a mixed-effects model, you'll have to make a few adjustments. In all the following, I am assuming you are using the {lme4} package to make your models. Use this at your own risk; it will give you numbers, and I think those numbers are accurate and actually mean something, but that's really a question for a statistician.

First, in the code, replace the coef() function with fixef() This is just necessary because lme4 models store the coefficients we want [in the format that I have lazily hard-coded the function to expect] in a structure that is accessed by fixef() rather than coef().

compare_coefs <- function( model, coef1, coef2 ){

diff <- fixef(model)[coef1] - fixef(model)[coef2]
pooledvar <- sqrt( vcov(model)[coef1,coef1] + vcov(model)[coef2,coef2] - 2*vcov(model)[coef1,coef2] )
t <- diff / pooledvar
names(t) <- NULL # this is just for cosmetic purposes, it doesn't actually matter
return(t)

}

# Example:
library(languageR) # The dataset we'll run a model on comes from this package
library(lme4)
m <- lmer( RT ~ Frequency + FamilySize + (1|Subject) + (1|Word), lexdec ) # using intercept-only models for this demo so the code runs fast; plz don't use intercept-only models for real
compare_coefs( m, "Frequency", "FamilySize" )

Secondly, if you want to turn the t-statistic into a p-value, things are a bit more complicated. summary(model) or model$df.resid won't tell you degrees of freedom for an lme4 model like it did for a straight-up lm model. How to calculate degrees of freedom for a mixed-effects model is actually a controversial, hot-button issue in the exciting and glamorous world of statistics. The good news for us, however, is that Luke (2017) gives us permission to just use degree-of-freedom estimates from lmerTest and be done with it. (Apologies to Luke for my massively oversimplified summary of a very detailed paper.) So that's what I'd recommend doing. Load up {lmerTest} before you do your model, so that summary() will give you dfs, and just shamelessly steal that number to put into your bit that calculates the p-value.

Other considerations

Comparing magnitudes regardless of sign. Depending on the research question you are trying to address when doing this analysis, you might actually want to compare the absolute values of two coefficients, rather than the coefficients themselves. For example, imagine that you've got some data where predictor A has an association of "5" with the outcome variable, and predictor B has an association of "-5". These might turn out to be significantly different from one another (5 is ten whole units bigger than -5!). But the magnitudes of these associations are the same. If that's what you care about, you can easily adapt the above code by just putting abs() around each coefficient:

compare_coefs <- function( model, coef1, coef2 ){

diff <- abs(coef(model)[coef1]) - abs(coef(model)[coef2])
pooledvar <- sqrt( vcov(model)[coef1,coef1] + vcov(model)[coef2,coef2] - 2*vcov(model)[coef1,coef2] )
t <- diff / pooledvar
names(t) <- NULL # this is just for cosmetic purposes, it doesn't actually matter
return(t)

}

Comparing variance instead of magnitude. This is a tricky one. Imagine that the coefficient for predictor A is 5, with a standard error of 1; and the coefficient for predictor B is also 5, but with a standard error of 10. Predictor A clearly has a "stronger" association with the outcome. But the code I've provided here won't tell you that predictor A is significantly better than predictor B; to the contrary, you're going to get a t-statistic of 0. I don't know what test you'd need to run to compare the "strengths" of association in the way. It should be doable (rather than directly comparing the coefficients, I'm guessing you'd want to compare the coefficients' t-statistics; and meta-analysis people have ways of doing that kind of thing) but the code on this page doesn't do it.

dfs in generalized linear models. My suggestions above for calculating p-values are based on using degrees of freedom. If you use a generalized model (like glm or glmer), you won't see degrees of freedom. Since coefficients in those models can be tested with a z distribution, I just ignore degrees of freedom for this and calculate p based on a z distribution rather than a t distribution. Am I allowed to do that? Not sure! That's another question for a statistician!


by Stephen Politzer-Ahles. Last modified on 2025-02-13.