fixest (by Laurent Bergé) is a package
designed from the ground up in C++ to make running regressions fast and
incredibly easy. It provides built-in support for a variety of linear and
nonlinear models, as well as regression tables and plotting methods.
Installation
Before continuing, make sure that you have installed fixest. You only
have to do this once (or as often as you want to update the package).
Once fixest is installed, don’t forget to load it whenever you want to
use it. Unlike Stata, you have to re-load a package every time you start a new R
session.
All of the examples in this section will use a modified dataset from the CPS
with some added variables for demonstration purposes. To load the data run the
following:
Introduction
The fixest package contains a highly flexible set of functions that allow you to estimate a large set of regression models. While the package obviously doesn’t cover every model out there, there is a non-negligible subset of Stata users for whom every model they’ve ever needed to estimate is covered by fixest.
This includes regular linear regression via the feols() function, which extends the Base R lm() function by supporting (high-dimenional) fixed effects. But fixest isn’t just limited to linear regression. The package supports efficient instrumental variables (IV) estimation, as well as a wide range of GLM models like logit, probit, Poisson, and negative binomial.
You also get a lot of convenience features with fixest. Adjusting for heteroskedasticity-robust or clustered standard errors is easily done via the vcov option. The package provides built-in methods for exporting regression tables and coefficient plots. You can select long lists of control variables without having to type them all in, retrieve estimated fixed effects, conduct Wald tests, adjust the reference levels for categorical variables and interactions on the fly, efficiently estimate simulation studies, etc. etc. You even get some stuff that’s rather tricky in Stata, like automatically iterating over a bunch of model specifications, basic and staggered difference-in-difference support, or Conley standard errors.
fixest offers all of this in addition to being very fast. If you felt a speed boost going from Stata’s xtreg to reghdfe, get ready for another significant improvement when moving to fixest.
Using fixest for regression starts with writing a formula. While there are plenty of bells and whistles to add, at its core regression formulas take the form y ~ x1 + x2 | fe1 + fe2, where y is the outcome, x1 and x2 are predictors, and fe1 and fe2 are your sets of fixed effects.
Models
Unlike Stata, which only ever has one active dataset in memory, remember that
having multiple datasets in your global environment is the norm in R. We
highlight this difference to head off a very common error for new Stata R users:
you need to specify which dataset you’re using in your model calls, e.g.
feols(..., data = dat). We’ll see lots of examples below. At the same time,
note that fixest allows you to set various global
options,
including which dataset you want to use for all of your regressions. Again,
we’ll see examples below.
Simple model
Categorical variables
Fixed effects
Weights
Instrumental variables
Nonlinear models
While we don’t really show it here, note that (almost) all of the functionality that
this page demonstrates w.r.t. feols() carries over to fixest’s non-linear
estimation functions too (feglm(), fepois(), etc.). This includes SE
adjustment, and so forth.
Macros, wildcards and shortcuts
Multi-model estimations (advanced)
fixest supports a variety of
multi-model
capabilities. Not only are these efficient from a coding perspective (you can get
away with much less typing), but they are also highly optimized. For example,
if you run a multi-model estimation with the same group of fixed-effects then
fixest will only compute those fixed-effects once for all models. The next
group of examples are meant to highlight some specific examples of this
functionality. They don’t necessarily have direct Stata equivalents that we are
aware of. Moreover, while we don’t show it here, please note that all of these
options can be combined (e.g. split sample with stepwise regression).
Multi-model objects can also be sent directly to presentation
functions like etable() and coefplot().
Split sample
Multiple dependent variables
Stepwise regression
Interactions
Interact continuous variables
Interact categorical variables
Interact categorical with continuous variables
Difference-in-differences
In addition to the ability to estimate a difference-in-differences design using
two-way fixed effects (if the design is appropriate for that; no staggered
treatment, for instance), fixest offers several other DID-specific tools.
The below examples use generic data sets, since the CPS data used in the rest of
this page is not appropriate for DID.
Interact fixed effects
Standard errors
While you can specify standard errors inside the original fixest model call
(just like Stata), a unique feature of R is that you can adjust errors for an
existing model on-the-fly. This has several
benefits, including being
much more efficient since you don’t have to re-estimate your whole model. We’ll
try to highlight examples of both approaches below.
HC
HAC
Clustered
Conley standard errors
On-the-fly SE adjustment
We’re belabouring the point now, but one last reminder that you can adjust the
standard errors for existing models “on the fly” by passing the vcov = ...
argument. There’s no performance penalty, since the adjustment is done
instantaneously and it therefore has the virtue of separating the mechanical
computation stage of model estimation from the inference stage. As we’ll see
below, on-the-fly SE adjustment works for a variety of other fixest
functions, e.g. etable(). But here is a quick example using summary():
Presentation
Regression table
Note: The etable() function is extremely flexible and includes support for
many things that we won’t show you here. See the relevant vignettes for more
(1,
2). Below we highlight a few unique features that don’t have direct Stata
equivalents. (You could potentially mimic with a loop, but that will require
more code and be slower, since your whole model has to be re-estimated each
time.)
Joint test of coefficients
Coefficient plot
Interaction Plot
Panel
Note: You don’t need to specify your panel variables globally and this functionality is mostly for convenience features associated with time-series operations like leads and lags. You can also use panel(dat, ~ id + var) to do so on-the-fly in your regression call. But Laurent, the fixest author, recommends setting the panel ID globally when applicable, so that’s what we do below.