Regression analysis with fixest

fixestopen in new window (by Laurent Bergé) is a package designed from the ground up in C++ to make running regressions fast and incredibly easy. It provides in-built support for a variety of linear and non-linear models, as well as regression tables and plotting methods.

Installation

Before continuing, make sure that you have installed fixest. You only have to do this once (or as often as you want to update the package).

# Install from CRAN
install.packages("fixest")

# Alternatively, you can install the latest development version
# install.packages("fixest", repos = "https://fastverse.r-universe.dev")
1
2
3
4
5

Once fixest is installed, don't forget to load it whenever you want to use it. Unlike Stata, you have to re-load a package every time you start a new R session.

# Load fixest into our current R session
library(fixest)
1
2

All of the examples in this section will use a modified dataset from the CPS with some added variables for demonstration purposes. To load the data run the following:

import delimited using ///
    "https://raw.githubusercontent.com/stata2r/stata2r.github.io/main/data/cps_long.csv", clear
1
2
# Base R reads CSVs too, but we'll use data.table here
dat = data.table::fread('https://raw.githubusercontent.com/stata2r/stata2r.github.io/main/data/cps_long.csv')
1
2

Introduction

The fixestopen in new window package contains a highly flexible set of tools that allow you to estimate a fairly large set of standard regression models. While the package certainly doesn't cover every model that exists, there is a non-negligible subset of Stata users for whom every model they've ever needed to run is covered by fixest.

This includes regular ol' linear regression in the feols() function, which builds off of the Base R standard regression function lm(), but also includes things like instrumental variables via 2SLS, and of course support for as many fixed effects as you'd like. fixest isn't limited to linear regression either, covering IV and fixed-effects support for a wide range of GLM models like logit, probit, Poisson, negative binomial, and so on in feglm() and fepois().

fixest covers all of this while being very fast. If you felt a speed boost going from Stata's xtreg to reghdfe, get ready for another significant improvement when moving to fixest.

You also get a fair amount of convenience. Adjusting your standard errors to be heteroskedasticity-robust or clustered can be a pain in other R regression functions, but it is easy in fixest with the vcov option. Regression tables, coefficient and interaction-margin plots, selecting long lists of controls without having to type them all in, lagged variables, retrieving estimated fixed effects, Wald tests, and the choice of reference for categorical variables are all made easy. You even get some stuff that's rather tricky in Stata, like automatically iterating over a bunch of model specifications, basic and staggered difference-in-difference support, or Conley standard errors.

Using fixest for regression starts with writing a formula. While there are plenty of bells and whistles to add, at its core regression formulas take the form y ~ x1 + x2 | fe1 + fe2, where y is the outcome, x1 and x2 are predictors, and fe1 and fe2 are your sets of fixed effects.

Models

Unlike Stata, which only ever has one active dataset in memory, remember that having multiple datasets in your global environment is the norm in R. We highlight this difference to head off a very common error for new Stata R users: you need to specify which dataset you're using in your model calls, e.g. feols(..., data = dat). We'll see lots of examples below. At the same time, note that fixest allows you to set various global optionsopen in new window, including which dataset you want to use for all of your regressions. Again, we'll see examples below.

Simple model

reg wage educ 
reg wage educ age
1
2
feols(wage ~ educ, data = dat) 
feols(wage ~ educ + age, data = dat)

# Aside 1: `data = ...` is always the first argument 
# after the model formula. So many R users would just 
# write: 
feols(wage ~ educ, dat) 

# Aside 2: You can also set your dataset globally so 
# that you don't have to reference it each time. 
setFixest_estimation(data = dat) 
feols(wage ~ educ) 
feols(wage ~ educ + age) 
# etc.
1
2
3
4
5
6
7
8
9
10
11
12
13
14

Categorical variables

reg wage educ i.treat 

* Specifying a baseline:
reg wage educ ib1.treat
1
2
3
4
feols(wage ~ educ + i(treat), dat) 

# Specifying a baseline:
feols(wage ~ educ + i(treat, ref = 1), dat)
1
2
3
4

Fixed effects

reghdfe wage educ, absorb(countyfips) cluster(countyfips) 





reghdfe wage educ, absorb(countyfips)  

* Add more fixed effects... 
reghdfe wage educ, absorb(countyfips year) ///
                   vce(cluster countyfips year) 
reghdfe wage educ, absorb(countyfips#year) /// 
                   vce(cluster countyfips#year)
1
2
3
4
5
6
7
8
9
10
11
12
13
feols(wage ~ educ | countyfips, dat) 

# Aside: fixest automatically clusters SEs by the first 
# fixed effect (if there are any). We'll get to SEs 
# later, but if you just want iid errors for a fixed 
# effect model: 
feols(wage ~ educ | countyfips, dat, vcov = 'iid') 

# Add more fixed effects... 
feols(wage ~ educ | countyfips + year, 
      dat, vcov = ~countyfips + year) 
feols(wage ~ educ | countyfips^year, 
      dat) # defaults to vcov = ~countyfips^year
1
2
3
4
5
6
7
8
9
10
11
12
13

Instrumental variables

ivreg 2sls wage (educ = age) 
ivreg 2sls wage marr (educ = age) 

* With fixed effects 
ivreghdfe 2sls wage marr (educ = age), absorb(countyfips)
1
2
3
4
5
feols(wage ~ 1 | educ ~ age, dat)  
feols(wage ~ marr | educ ~ age, dat) 

# With fixed effects (IV 1st stage always comes last) 
feols(wage ~ marr | countyfips | educ ~ age, dat)
1
2
3
4
5

Nonlinear models

While we don't really show it here, note that (almost) all of the functionality that this page demonstrates w.r.t. feols() carries over to fixest's non-linear estimation functions too (feglm(), fepois(), etc.). This includes SE adjustment, and so forth.

xtset statefips
logit marr age black hisp

* Note: Attempting to replicate the feglm() model with fixed
* effects at right using xtlogit or xtprobit leads to 
* numerical overflow or matsize issues


ppmlhdfe educ age black hisp, absorb(statefips year) ///
	                      vce(cluster statefips)
1
2
3
4
5
6
7
8
9
10
feglm(marr ~ age + black + hisp, 
      dat, family = 'logit')

# Add fixed effects (probit this time)
feglm(marr ~ age + black + hisp | statefips + year, 
      dat, family = 'probit')

# fepois() is there for Poisson regression
fepois(educ ~ age + black + hisp | statefips + year, dat)
1
2
3
4
5
6
7
8
9

Macros, wildcards and shortcuts

local ctrls age black hisp marr 
reg wage educ `ctrls' 

reg wage educ x* 
reg wage educ *sp  
reg wage educ *ac*
1
2
3
4
5
6
ctrls = c("age", "black", "hisp", "marr") 
feols(wage ~ educ + .[ctrls], dat) 

feols(wage ~ educ + ..('^x'), dat) # ^ = starts with 
feols(wage ~ educ + ..('sp#39;), dat) # $ = ends with 
feols(wage ~ educ + ..('ac'), dat) 

# Many more macro options. See `?setFixest_fml` and
# `?setFixest_estimation`. Example (reminder) where 
# you set your dataset globally, so you don't have to 
# retype `data = ...` anymore. 
setFixest_estimation(data = dat) 
feols(wage ~ educ) 
feols(wage ~ educ + .[ctrls] | statefips) 
# Etc.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

Multi-model estimations (advanced)

fixest supports a variety of multi-modelopen in new window capabilities. Not only are these efficient from a coding perspective (you can get away with much less typing), but they are also highly optimized. For example, if you run a multi-model estimation with the same group of fixed-effects then fixest will only compute those fixed-effects once for all models. The next group of examples are meant to highlight some specific examples of this functionality. They don't necessarily have direct Stata equivalents that we are aware of. Moreover, while we don't show it here, please note that all of these options can be combined (e.g. split sample with stepwise regression). Multi-model objects can also be sent directly to presentation functions like etable() and coefplot().

Split sample

## Separate regressions for hispanic and non-hispanic
feols(wage ~ educ | countyfips, dat, split = ~hisp)

# As above, but now includes the full sample as a 3rd reg
feols(wage ~ educ | countyfips, dat, fsplit = ~hisp)
1
2
3
4
5

Multiple dependent variables

# Regress wage & educ separately on the same indep. vars & FEs
feols(c(wage, educ) ~ age + marr | countyfips + year, dat) 
1
2

Stepwise regression

# Stepwise. First reg doesn't include "marr", second reg
# doesn't include "age"
feols(wage ~ educ + sw(age, marr), dat) 

# Cumulative stepwise. As above, except now the second reg
# includes "age".
feols(wage ~ educ + csw(age, marr), dat)

# Stepwise operators work in the FE slot too
feols(wage ~ educ | csw(year, statefips), dat)

# Aside: You can also use "sw0()" and "csw0()", in which 
# case you'll get an extra regression at the start that 
# doesn't include the stepwise components. 
1
2
3
4
5
6
7
8
9
10
11
12
13
14

Interactions

Interact continuous variables

reg wage c.educ#c.age 
reg wage c.educ##c.age 

* Polynomials 
reg wage c.age#c.age 
reg wage c.age##c.age 
1
2
3
4
5
6
feols(wage ~ educ:age, dat) 
feols(wage ~ educ*age, dat) 

# Polynomials 
feols(wage ~ I(age^2), dat) 
feols(wage ~ poly(age, 2, raw = TRUE))
1
2
3
4
5
6

Interact categorical variables

reg wage i.treat#i.hisp 




reg wage i.treat i.treat#i.hisp
reg wage i.treat##i.hisp
1
2
3
4
5
6
7
feols(wage ~ i(treat, i.hisp), dat) 

# Aside: i() is a fixest-specific shortcut that also 
# has synergies with some other fixest functions. But 
# base R interaction operators all still work, e.g. 
feols(wage ~ factor(treat)/factor(hisp), dat) 
feols(wage ~ factor(treat)*factor(hisp), dat)
1
2
3
4
5
6
7

Interact categorical with continuous variables

reg wage i.treat#c.age 




reg wage i.treat#c.age 
reg wage i.treat i.treat#c.age 
reg wage i.treat##c.age
1
2
3
4
5
6
7
8
feols(wage ~ i(treat, age), dat) 

# Aside: i() is a fixest-specific shortcut that also 
# has synergies with some other fixest functions. But 
# base R interaction operators all still work, e.g. 
feols(wage ~ factor(treat):age, dat) 
feols(wage ~ factor(treat)/age, dat) 
feols(wage ~ factor(treat)*age, dat)
1
2
3
4
5
6
7
8

Difference-in-differences

In addition to the ability to estimate a difference-in-differences design using two-way fixed effects (if the design is appropriate for that; no staggered treatment, for instance), fixest offers several other DID-specific tools. The below examples use generic data sets, since the CPS data used in the rest of this page is not appropriate for DID.

* No immediate Stata equivalent to did_means that we know of,
* although you could replicate much of it by hand with an 
* elaborate call to table

* Sun and Abraham can be estimated using the 
* eventstudyinteract package on ssc
1
2
3
4
5
6
# did_means() provides tables of means, SEs, and treatment/
# control and pre/post differences for 2x2 DID
did_means(outcome + control ~ treat | post)

# sunab() produces interactions of the type that allow you to
# estimate the Sun & Abraham model for staggered treatment 
# timing, and automatically get average treatment effects for
# each relative period
feols(y ~ control + sunab(year_treated, year))
1
2
3
4
5
6
7
8
9

Interact fixed effects

* Combine fixed effects 
reghdfe wage educ, absorb(statefips#year) 

* Varying slopes (e.g. time trend for each state) 
reghdfe wage educ, absorb(statefips#c.year) ///
	           vce(cluster statefips#c.year)
1
2
3
4
5
6
# Combine fixed effects 
feols(wage ~ educ | statefips^year, dat) 

# Varying slopes (e.g. time trend for each state) 
feols(wage ~ educ | statefips[year], dat)
1
2
3
4
5

Standard errors

While you can specify standard errors inside the original fixest model call (just like Stata), a unique feature of R is that you can adjust errors for an existing model on-the-fly. This has several benefitsopen in new window, including being much more efficient since you don't have to re-estimate your whole model. We'll try to highlight examples of both approaches below.

HC

reg wage educ, vce(robust) 
reg wage educ, vce(hc3)
1
2
feols(wage ~ educ, dat, vcov = 'hc1') 
feols(wage ~ educ, dat, vcov = sandwich::vcovHC) 

# Note: You can also adjust the SEs of an existing model 
m = feols(wage ~ educ, dat) # iid
summary(m, vcov = 'hc1')    # switch to HC1
1
2
3
4
5
6

HAC

xtset id year
ivreghdfe wage educ, bw(auto) vce(robust)
1
2
feols(y ~ x, dat, vcov = 'NW', panel.id = ~unit + time)
# feols(y ~ x, dat, vcov = 'NW') # if panel id already set (see below)
1
2

Clustered

reghdfe wage educ, absorb(countyfips) /// 
                   vce(cluster countyfips) 

* Twoway clustering etc. 
reghdfe wage educ, absorb(countyfips year) ///
                   vce(cluster countyfips year) 



reghdfe wage educ, absorb(countyfips#year) ///
                   vce(cluster countyfips#year)
1
2
3
4
5
6
7
8
9
10
11
feols(wage ~ educ | countyfips, dat) # Auto clusters by FE 
# feols(wage ~ educ | countyfips, dat, vcov = ~countyfips) # ofc can be explicit too 

# Twoway clustering etc. 
feols(wage ~ educ | countyfips + year, 
      dat, vcov = ~countyfips + year) 
# feols(wage ~ educ | countyfips + year, 
#      dat, vcov = 'twoway') ## same as above

feols(wage ~ educ | countyfips^year, 
      dat, vcov = ~countyfips^year) 
1
2
3
4
5
6
7
8
9
10
11

Conley standard errors

* See: http://www.trfetzer.com/conley-spatial-hac-errors-with-fixed-effects/
1
feols(wage ~ educ, dat, vcov = conley("25 mi"))
1

On-the-fly SE adjustment

We're belabouring the point now, but one last reminder that you can adjust the standard errors for existing models "on the fly" by passing the vcov = ... argument. There's no performance penalty, since the adjustment is done instantaneously and it therefore has the virtue of separating the mechanical computation stage of model estimation from the inference stage. As we'll see below, on-the-fly SE adjustment works for a variety of other fixest functions, e.g. etable(). But here is a quick example using summary():

m = feols(wage ~ educ | countyfips + year, dat) 
m                                    # Clustered by countyfips (default)
summary(m, vcov = 'iid')             # Switch to iid errors
summary(m, vcov = 'twoway')          # Cluster by countyfips and year 
summary(m, vcov = ~countyfips^year)  # Cluster by countyfips*year interaction
1
2
3
4
5

Presentation

Regression table

reg wage educ age 
eststo est1 
esttab est1

* Add second regression
reg wage educ age black hisp
eststo est2
esttab est1 est2

* Export to TeX
esttab using "regtable.tex", replace 
1
2
3
4
5
6
7
8
9
10
11
est1 = feols(wage ~ educ + age, dat) 
etable(est1)


# Add second regression
est2 = feols(wage ~ educ + age + black + hisp, dat) 
etable(est1, est2)


# Export to Tex
etable(est1, est2, file = "regtable.tex")
1
2
3
4
5
6
7
8
9
10
11

Note: The etable() function is extremely flexible and includes support for many things that we won't show you here. See the relevant vignettes for more (1open in new window, 2open in new window). Below we highlight a few unique features that don't have direct Stata equivalents. (You could potentially mimic with a loop, but that will require more code and be slower, since your whole model has to be re-estimated each time.)

# SEs for existing models can be adjusted on-the-fly 
etable(est1, est2, vcov = 'hc1') 

# Report multiple SEs for the same model 
etable(est1, vcov = list('iid', 'hc1', ~id, ~countyfips)) 

# Multi-model example: Two dep. vars, stepwise coefs, 
# varying slopes, split sample, etc. (18 models in total!)
est_mult = feols(c(wage, age) ~ educ + csw0(hisp, black) | 
                     statefips[year], 
                 dat, fsplit = ~marr) 
etable(est_mult, vcov = ~statefips^year)
1
2
3
4
5
6
7
8
9
10
11
12

Joint test of coefficients

* Rename so we can use the wildcard later
rename (black hisp) (raceeth_black raceeth_hisp)
regress wage educ age raceeth_* marr 
testparm raceeth_black raceeth_hisp
testparm raceeth_*
1
2
3
4
5
# Rename so we can use a regular expression later
data.table::setnames(dat, c('black','hisp'), c('raceeth_black','raceeth_hisp'))
est1 = feols(wage ~ educ + age + ..('raceeth_') + marr, dat)
wald(est1, c('raceeth_black','raceeth_hisp'))
wald(est1, 'raceeth_')
1
2
3
4
5

Coefficient plot

* Assume we have est1 and est2 from above 
coefplot est1 
coefplot est1 est2
1
2
3
# Assume we have est1 and est2 from above 
coefplot(est1) 
coefplot(list(est1, est2))
1
2
3

Interaction Plot

regress wage hisp##c.age

* Show how effect differs by group
margins hisp, dydx(age)
marginsplot

# Show predictive margins with an interaction
regress wage hisp##c.age
margins hisp, at(age = (16(1)55))
* The recast here gives a line and error area instead of points and lines
marginsplot, recast(line) recastci(rarea)
1
2
3
4
5
6
7
8
9
10
11
est1 = feols(wage ~ i(hisp, age), dat) 

# Show how effect differs by group
iplot(est1)


# Show predictive margins with an interaction
# This requires plot_cap from the marginaleffects package
library(marginaleffects)
plot_cap(est1, condition = c('age','hisp'))
1
2
3
4
5
6
7
8
9
10

Panel

Note: You don't need to specify your panel variables globally and this functionality is mostly for convenience features associated with time-series operations like leads and lags. You can also use panel(dat, ~ id + var) to do so on-the-fly in your regression call. But Laurent, the fixest author, recommends setting the panel ID globally when applicable, so that's what we do below.

Lag variables

xtset id year 
reg wage educ l1.wage
1
2
setFixest_estimation(panel.id = ~id+year)
feols(wage ~ educ + l(wage, 1), dat)
# feols(wage ~ educ + l(wage, 1), dat, panel.id = ~id+year) # if not set
1
2
3

Lead variables

xtset id year 
reg wage educ f1.wage
1
2
# setFixest_estimation(panel.id = ~id+year) # already set
feols(wage ~ educ + f(wage, 1), dat)
1
2

First difference

xtset id year 
reg wage educ D.x
1
2
# setFixest_estimation(panel.id = ~id+year) # already set
feols(wage ~ educ + d(wage), dat)
1
2