# Regression analysis with fixest

**fixest**open in new window (by Laurent BergĂ©) is a package designed from the ground up in C++ to make running regressions fast and incredibly easy. It provides built-in support for a variety of linear and nonlinear models, as well as regression tables and plotting methods.

## Installation

Before continuing, make sure that you have installed **fixest**. You only have to do this once (or as often as you want to update the package).

```
# Install from CRAN
install.packages("fixest")
# Alternatively, you can install the latest development version
# install.packages("fixest", repos = "https://fastverse.r-universe.dev")
```

2

3

4

5

Once **fixest** is installed, don't forget to load it whenever you want to use it. Unlike Stata, you have to re-load a package every time you start a new R session.

```
# Load fixest into our current R session
library(fixest)
```

2

All of the examples in this section will use a modified dataset from the CPS with some added variables for demonstration purposes. To load the data run the following:

```
import delimited using ///
"https://raw.githubusercontent.com/stata2r/stata2r.github.io/main/data/cps_long.csv", clear
```

2

```
# Base R reads CSVs too, but we'll use data.table here
dat = data.table::fread('https://raw.githubusercontent.com/stata2r/stata2r.github.io/main/data/cps_long.csv')
```

2

## Introduction

The **fixest**open in new window package contains a highly flexible set of functions that allow you to estimate a large set of regression models. While the package obviously doesn't cover *every* model out there, there is a non-negligible subset of Stata users for whom every model they've ever needed to estimate is covered by **fixest**.

This includes regular linear regression via the `feols()`

function, which extends the Base R `lm()`

function by supporting (high-dimenional) fixed effects. But **fixest** isn't just limited to linear regression. The package supports efficient instrumental variables (IV) estimation, as well as a wide range of GLM models like logit, probit, Poisson, and negative binomial.

You also get a lot of convenience features with **fixest**. Adjusting for heteroskedasticity-robust or clustered standard errors is easily done via the `vcov`

option. The package provides built-in methods for exporting regression tables and coefficient plots. You can select long lists of control variables without having to type them all in, retrieve estimated fixed effects, conduct Wald tests, adjust the reference levels for categorical variables and interactions on the fly, efficiently estimate simulation studies, etc. etc. You even get some stuff that's rather tricky in Stata, like automatically iterating over a bunch of model specifications, basic and staggered difference-in-difference support, or Conley standard errors.

**fixest** offers all of this in addition to being *very* fastopen in new window. If you felt a speed boost going from Stata's `xtreg`

to `reghdfe,`

get ready for another significant improvement when moving to **fixest**.

Using **fixest** for regression starts with writing a formula. While there are plenty of bells and whistles to add, at its core regression formulas take the form ** y ~ x1 + x2 | fe1 + fe2**, where

`y`

is the outcome, `x1`

and `x2`

are predictors, and `fe1`

and `fe2`

are your sets of fixed effects.## Models

Unlike Stata, which only ever has one active dataset in memory, remember that having multiple datasets in your global environment is the norm in R. We highlight this difference to head off a very common error for new Stata R users: *you need to specify which dataset you're using in your model calls*, e.g. `feols(..., data = dat)`

. We'll see lots of examples below. At the same time, note that **fixest** allows you to set various global optionsopen in new window, including which dataset you want to use for all of your regressions. Again, we'll see examples below.

### Simple model

```
reg wage educ
reg wage educ age
```

2

```
feols(wage ~ educ, data = dat)
feols(wage ~ educ + age, data = dat)
# Aside 1: `data = ...` is the first argument after
# the model formula. So many R users would just write:
feols(wage ~ educ, dat)
# Aside 2: You can also set your dataset globally so
# that you don't have to reference it each time.
setFixest_estimation(data = dat)
feols(wage ~ educ)
feols(wage ~ educ + age)
# etc.
```

2

3

4

5

6

7

8

9

10

11

12

13

### Categorical variables

```
reg wage educ i.treat
* Specifying a baseline:
reg wage educ ib1.treat
```

2

3

4

```
feols(wage ~ educ + i(treat), dat)
# Specifying a baseline:
feols(wage ~ educ + i(treat, ref = 1), dat)
```

2

3

4

### Fixed effects

```
reghdfe wage educ, absorb(countyfips) cluster(countyfips)
reghdfe wage educ, absorb(countyfips)
* Add more fixed effects...
reghdfe wage educ, absorb(countyfips year) ///
vce(cluster countyfips year)
reghdfe wage educ, absorb(countyfips#year) ///
vce(cluster countyfips#year)
```

2

3

4

5

6

7

8

9

10

11

12

13

14

```
feols(wage ~ educ | countyfips, dat, vcov = ~countyfips)
# feols(wage ~ educ | countyfips, dat) # same (see below)
# Note: fixest automatically clusters SEs by the first
# fixed effect (if there are any). We'll get to SEs
# later, but if you just want iid errors for a fixed
# effect model:
feols(wage ~ educ | countyfips, dat, vcov = 'iid')
# Add more fixed effects...
feols(wage ~ educ | countyfips + year,
dat, vcov = ~countyfips + year)
feols(wage ~ educ | countyfips^year,
dat) # defaults to vcov = ~countyfips^year
```

2

3

4

5

6

7

8

9

10

11

12

13

14

### Instrumental variables

```
ivreg 2sls wage (educ = age)
ivreg 2sls wage marr (educ = age)
* With fixed effects
ivreghdfe 2sls wage marr (educ = age), absorb(countyfips)
```

2

3

4

5

```
feols(wage ~ 1 | educ ~ age, dat)
feols(wage ~ marr | educ ~ age, dat)
# With fixed effects (IV 1st stage always comes last)
feols(wage ~ marr | countyfips | educ ~ age, dat)
```

2

3

4

5

### Nonlinear models

While we don't really show it here, note that (almost) all of the functionality that this page demonstrates w.r.t. `feols()`

carries over to **fixest's** non-linear estimation functions too (`feglm()`

, `fepois()`

, etc.). This includes SE adjustment, and so forth.

```
xtset statefips
logit marr age black hisp
* Note: Attempting to replicate the feglm() model with fixed
* effects at right using xtlogit or xtprobit leads to
* numerical overflow or matsize issues
* https://github.com/sergiocorreia/ppmlhdfe
ppmlhdfe educ age black hisp, absorb(statefips year) ///
vce(cluster statefips)
```

2

3

4

5

6

7

8

9

10

```
feglm(marr ~ age + black + hisp,
dat, family = 'logit')
# Add fixed effects (probit this time)
feglm(marr ~ age + black + hisp | statefips + year,
dat, family = 'probit')
# fepois() is there for Poisson regression
fepois(educ ~ age + black + hisp | statefips + year, dat)
```

2

3

4

5

6

7

8

9

### Macros, wildcards and shortcuts

```
local ctrls age black hisp marr
reg wage educ `ctrls'
reg wage educ x*
reg wage educ *sp
reg wage educ *ac*
```

2

3

4

5

6

```
ctrls = c("age", "black", "hisp", "marr")
feols(wage ~ educ + .[ctrls], dat)
feols(wage ~ educ + ..('^x'), dat) # ^ = starts with
feols(wage ~ educ + ..('sp#39;), dat) # $ = ends with
feols(wage ~ educ + ..('ac'), dat)
# Many more macro options. See `?setFixest_fml` and
# `?setFixest_estimation`. Example (reminder) where
# you set your dataset globally, so you don't have to
# retype `data = ...` anymore.
setFixest_estimation(data = dat)
feols(wage ~ educ)
feols(wage ~ educ + .[ctrls] | statefips)
# Etc.
```

2

3

4

5

6

7

8

9

10

11

12

13

14

15

### Multi-model estimations (advanced)

**fixest** supports a variety of multi-modelopen in new window capabilities. Not only are these efficient from a coding perspective (you can get away with much less typing), but they are also highly optimized. For example, if you run a multi-model estimation with the same group of fixed-effects then **fixest** will only compute those fixed-effects *once* for all models. The next group of examples are meant to highlight some specific examples of this functionality. They don't necessarily have direct Stata equivalents that we are aware of. Moreover, while we don't show it here, please note that all of these options can be combined (e.g. split sample with stepwise regression). Multi-model objects can also be sent directly to presentation functions like `etable()`

and `coefplot()`

.

#### Split sample

```
## Separate regressions for hispanic and non-hispanic
feols(wage ~ educ | countyfips, dat, split = ~hisp)
# As above, but now includes the full sample as a 3rd reg
feols(wage ~ educ | countyfips, dat, fsplit = ~hisp)
```

2

3

4

5

#### Multiple dependent variables

```
# Regress wage & educ separately on the same indep. vars & FEs
feols(c(wage, educ) ~ age + marr | countyfips + year, dat)
```

2

#### Stepwise regression

```
# Stepwise. First reg doesn't include "marr", second reg
# doesn't include "age"
feols(wage ~ educ + sw(age, marr), dat)
# Cumulative stepwise. As above, except now the second reg
# includes "age".
feols(wage ~ educ + csw(age, marr), dat)
# Stepwise operators work in the FE slot too
feols(wage ~ educ | csw(year, statefips), dat)
# Aside: You can also use "sw0()" and "csw0()", in which
# case you'll get an extra regression at the start that
# doesn't include the stepwise components.
```

2

3

4

5

6

7

8

9

10

11

12

13

14

## Interactions

### Interact continuous variables

```
reg wage c.educ#c.age
reg wage c.educ##c.age
* Polynomials
reg wage c.age#c.age
reg wage c.age##c.age
```

2

3

4

5

6

```
feols(wage ~ educ:age, dat)
feols(wage ~ educ*age, dat)
# Polynomials
feols(wage ~ I(age^2), dat)
feols(wage ~ poly(age, 2, raw = TRUE))
```

2

3

4

5

6

### Interact categorical variables

```
reg wage i.treat#i.hisp
reg wage i.treat i.treat#i.hisp
reg wage i.treat##i.hisp
```

2

3

4

5

6

7

```
feols(wage ~ i(treat, i.hisp), dat)
# Aside: i() is a fixest-specific shortcut that also
# has synergies with some other fixest functions. But
# base R interaction operators all still work, e.g.
feols(wage ~ factor(treat)/factor(hisp), dat)
feols(wage ~ factor(treat)*factor(hisp), dat)
```

2

3

4

5

6

7

### Interact categorical with continuous variables

```
reg wage i.treat#c.age
reg wage i.treat#c.age
reg wage i.treat i.treat#c.age
reg wage i.treat##c.age
```

2

3

4

5

6

7

8

```
feols(wage ~ i(treat, age), dat)
# Aside: i() is a fixest-specific shortcut that also
# has synergies with some other fixest functions. But
# base R interaction operators all still work, e.g.
feols(wage ~ factor(treat):age, dat)
feols(wage ~ factor(treat)/age, dat)
feols(wage ~ factor(treat)*age, dat)
```

2

3

4

5

6

7

8

### Difference-in-differences

In addition to the ability to estimate a difference-in-differences design using two-way fixed effects (if the design is appropriate for that; no staggered treatment, for instance), **fixest** offers several other DID-specific tools. The below examples use generic data sets, since the CPS data used in the rest of this page is not appropriate for DID.

```
* No immediate Stata equivalent to did_means that we know of,
* although you could replicate much of it by hand with an
* elaborate call to table
* Sun and Abraham can be estimated using the
* eventstudyinteract package on ssc
```

2

3

4

5

6

```
# did_means() provides tables of means, SEs, and treatment/
# control and pre/post differences for 2x2 DID
did_means(outcome + control ~ treat | post)
# sunab() produces interactions of the type that allow you to
# estimate the Sun & Abraham model for staggered treatment
# timing, and automatically get average treatment effects for
# each relative period
feols(y ~ control + sunab(year_treated, year))
```

2

3

4

5

6

7

8

9

### Interact fixed effects

```
* Combine fixed effects
reghdfe wage educ, absorb(statefips#year)
* Varying slopes (e.g. time trend for each state)
reghdfe wage educ, absorb(statefips#c.year) ///
vce(cluster statefips#c.year)
```

2

3

4

5

6

```
# Combine fixed effects
feols(wage ~ educ | statefips^year, dat)
# Varying slopes (e.g. time trend for each state)
feols(wage ~ educ | statefips[year], dat)
```

2

3

4

5

## Standard errors

While you can specify standard errors inside the original **fixest** model call (just like Stata), a unique feature of R is that you can adjust errors for an existing model *on-the-fly*. This has several benefitsopen in new window, including being much more efficient since you don't have to re-estimate your whole model. We'll try to highlight examples of both approaches below.

### HC

```
reg wage educ, vce(robust)
reg wage educ, vce(hc3)
```

2

```
feols(wage ~ educ, dat, vcov = 'hc1')
feols(wage ~ educ, dat, vcov = sandwich::vcovHC)
# Note: You can also adjust the SEs of an existing model
m = feols(wage ~ educ, dat) # iid
summary(m, vcov = 'hc1') # switch to HC1
```

2

3

4

5

6

### HAC

```
xtset id year
ivreghdfe wage educ, bw(auto) vce(robust)
```

2

```
feols(y ~ x, dat, vcov = 'NW', panel.id = ~unit + time)
# feols(y ~ x, dat, vcov = 'NW') # if panel id already set (see below)
```

2

### Clustered

```
reghdfe wage educ, absorb(countyfips) ///
vce(cluster countyfips)
* Twoway clustering etc.
reghdfe wage educ, absorb(countyfips year) ///
vce(cluster countyfips year)
reghdfe wage educ, absorb(countyfips#year) ///
vce(cluster countyfips#year)
```

2

3

4

5

6

7

8

9

10

11

```
feols(wage ~ educ | countyfips, dat) # Auto clusters by FE
# feols(wage ~ educ | countyfips, dat, vcov = ~countyfips) # ofc can be explicit too
# Twoway clustering etc.
feols(wage ~ educ | countyfips + year,
dat, vcov = ~countyfips + year)
# feols(wage ~ educ | countyfips + year,
# dat, vcov = 'twoway') ## same as above
feols(wage ~ educ | countyfips^year,
dat, vcov = ~countyfips^year)
```

2

3

4

5

6

7

8

9

10

11

### Conley standard errors

```
* See: http://www.trfetzer.com/conley-spatial-hac-errors-with-fixed-effects/
```

```
feols(wage ~ educ, dat, vcov = conley("25 mi"))
```

### On-the-fly SE adjustment

We're belabouring the point now, but one last reminder that you can adjust the standard errors for existing models "on the fly" by passing the `vcov = ...`

argument. There's no performance penalty, since the adjustment is done instantaneously and it therefore has the virtue of separating the mechanical *computation* stage of model estimation from the *inference* stage. As we'll see below, on-the-fly SE adjustment works for a variety of other **fixest** functions, e.g. `etable()`

. But here is a quick example using `summary()`

:

```
m = feols(wage ~ educ | countyfips + year, dat)
m # Clustered by countyfips (default)
summary(m, vcov = 'iid') # Switch to iid errors
summary(m, vcov = 'twoway') # Cluster by countyfips and year
summary(m, vcov = ~countyfips^year) # Cluster by countyfips*year interaction
```

2

3

4

5

## Presentation

### Regression table

```
reg wage educ age
eststo est1
esttab est1
* Add second regression
reg wage educ age black hisp
eststo est2
esttab est1 est2
* Export to TeX
esttab using "regtable.tex", replace
```

2

3

4

5

6

7

8

9

10

11

```
est1 = feols(wage ~ educ + age, dat)
etable(est1)
# Add second regression
est2 = feols(wage ~ educ + age + black + hisp, dat)
etable(est1, est2)
# Export to Tex
etable(est1, est2, file = "regtable.tex")
```

2

3

4

5

6

7

8

9

10

11

**Note:** The `etable()`

function is extremely flexible and includes support for many things that we won't show you here. See the relevant vignettes for more (1open in new window, 2open in new window). Below we highlight a few unique features that don't have direct Stata equivalents. (You could potentially mimic with a loop, but that will require more code and be slower, since your whole model has to be re-estimated each time.)

```
# SEs for existing models can be adjusted on-the-fly
etable(est1, est2, vcov = 'hc1')
# Report multiple SEs for the same model
etable(est1, vcov = list('iid', 'hc1', ~id, ~countyfips))
# Multi-model example: Two dep. vars, stepwise coefs,
# varying slopes, split sample, etc. (18 models in total!)
est_mult = feols(c(wage, age) ~ educ + csw0(hisp, black) |
statefips[year],
dat, fsplit = ~marr)
etable(est_mult, vcov = ~statefips^year)
```

2

3

4

5

6

7

8

9

10

11

12

### Joint test of coefficients

```
* Rename so we can use the wildcard later
rename (black hisp) (raceeth_black raceeth_hisp)
regress wage educ age raceeth_* marr
testparm raceeth_black raceeth_hisp
testparm raceeth_*
```

2

3

4

5

```
# Rename so we can use a regular expression later
data.table::setnames(dat, c('black','hisp'), c('raceeth_black','raceeth_hisp'))
est1 = feols(wage ~ educ + age + ..('raceeth_') + marr, dat)
wald(est1, c('raceeth_black','raceeth_hisp'))
wald(est1, 'raceeth_')
```

2

3

4

5

### Coefficient plot

```
* Assume we have est1 and est2 from above
coefplot est1
coefplot est1 est2
```

2

3

```
# Assume we have est1 and est2 from above
coefplot(est1)
coefplot(list(est1, est2))
```

2

3

### Interaction Plot

```
regress wage hisp##c.age
* Show how effect differs by group
margins hisp, dydx(age)
marginsplot
# Show predictive margins with an interaction
regress wage hisp##c.age
margins hisp, at(age = (16(1)55))
* The recast here gives a line and error area instead of points and lines
marginsplot, recast(line) recastci(rarea)
```

2

3

4

5

6

7

8

9

10

11

```
est1 = feols(wage ~ i(hisp, age), dat)
# Show how effect differs by group
iplot(est1)
# Show predictive margins with an interaction
# This requires plot_cap from the marginaleffects package
library(marginaleffects)
plot_cap(est1, condition = c('age','hisp'))
```

2

3

4

5

6

7

8

9

10

## Panel

**Note:** You don't need to specify your panel variables globally and this functionality is mostly for convenience features associated with time-series operations like leads and lags. You can also use `panel(dat, ~ id + var)`

to do so on-the-fly in your regression call. But Laurent, the **fixest** author, recommends setting the panel ID globally when applicable, so that's what we do below.

### Lag variables

```
xtset id year
reg wage educ l1.wage
```

2

```
setFixest_estimation(panel.id = ~id+year)
feols(wage ~ educ + l(wage, 1), dat)
# feols(wage ~ educ + l(wage, 1), dat, panel.id = ~id+year) # if not set
```

2

3

### Lead variables

```
xtset id year
reg wage educ f1.wage
```

2

```
# setFixest_estimation(panel.id = ~id+year) # already set
feols(wage ~ educ + f(wage, 1), dat)
```

2

### First difference

```
xtset id year
reg wage educ D.x
```

2

```
# setFixest_estimation(panel.id = ~id+year) # already set
feols(wage ~ educ + d(wage), dat)
```

2