# Regression analysis with fixest

**fixest** (by Laurent Bergé) is a package
designed from the ground up in C++ to make running regressions fast and
incredibly easy. It provides built-in support for a variety of linear and
nonlinear models, as well as regression tables and plotting methods.

## Installation

Before continuing, make sure that you have installed **fixest**. You only
have to do this once (or as often as you want to update the package).

Once **fixest** is installed, don’t forget to load it whenever you want to
use it. Unlike Stata, you have to re-load a package every time you start a new R
session.

All of the examples in this section will use a modified dataset from the CPS with some added variables for demonstration purposes. To load the data run the following:

## Introduction

The **fixest** package contains a highly flexible set of functions that allow you to estimate a large set of regression models. While the package obviously doesn’t cover *every* model out there, there is a non-negligible subset of Stata users for whom every model they’ve ever needed to estimate is covered by **fixest**.

This includes regular linear regression via the `feols()`

function, which extends the Base R `lm()`

function by supporting (high-dimenional) fixed effects. But **fixest** isn’t just limited to linear regression. The package supports efficient instrumental variables (IV) estimation, as well as a wide range of GLM models like logit, probit, Poisson, and negative binomial.

You also get a lot of convenience features with **fixest**. Adjusting for heteroskedasticity-robust or clustered standard errors is easily done via the `vcov`

option. The package provides built-in methods for exporting regression tables and coefficient plots. You can select long lists of control variables without having to type them all in, retrieve estimated fixed effects, conduct Wald tests, adjust the reference levels for categorical variables and interactions on the fly, efficiently estimate simulation studies, etc. etc. You even get some stuff that’s rather tricky in Stata, like automatically iterating over a bunch of model specifications, basic and staggered difference-in-difference support, or Conley standard errors.

**fixest** offers all of this in addition to being *very* fast. If you felt a speed boost going from Stata’s `xtreg`

to `reghdfe,`

get ready for another significant improvement when moving to **fixest**.

Using **fixest** for regression starts with writing a formula. While there are plenty of bells and whistles to add, at its core regression formulas take the form ** y ~ x1 + x2 | fe1 + fe2**, where

`y`

is the outcome, `x1`

and `x2`

are predictors, and `fe1`

and `fe2`

are your sets of fixed effects.## Models

Unlike Stata, which only ever has one active dataset in memory, remember that
having multiple datasets in your global environment is the norm in R. We
highlight this difference to head off a very common error for new Stata R users:
*you need to specify which dataset you’re using in your model calls*, e.g.
`feols(..., data = dat)`

. We’ll see lots of examples below. At the same time,
note that **fixest** allows you to set various global
options,
including which dataset you want to use for all of your regressions. Again,
we’ll see examples below.

### Simple model

### Categorical variables

### Fixed effects

### Weights

### Instrumental variables

### Nonlinear models

While we don’t really show it here, note that (almost) all of the functionality that
this page demonstrates w.r.t. `feols()`

carries over to **fixest’s** non-linear
estimation functions too (`feglm()`

, `fepois()`

, etc.). This includes SE
adjustment, and so forth.

### Macros, wildcards and shortcuts

### Multi-model estimations (advanced)

**fixest** supports a variety of
multi-model
capabilities. Not only are these efficient from a coding perspective (you can get
away with much less typing), but they are also highly optimized. For example,
if you run a multi-model estimation with the same group of fixed-effects then
**fixest** will only compute those fixed-effects *once* for all models. The next
group of examples are meant to highlight some specific examples of this
functionality. They don’t necessarily have direct Stata equivalents that we are
aware of. Moreover, while we don’t show it here, please note that all of these
options can be combined (e.g. split sample with stepwise regression).
Multi-model objects can also be sent directly to presentation
functions like `etable()`

and `coefplot()`

.

#### Split sample

#### Multiple dependent variables

#### Stepwise regression

## Interactions

### Interact continuous variables

### Interact categorical variables

### Interact categorical with continuous variables

### Difference-in-differences

In addition to the ability to estimate a difference-in-differences design using
two-way fixed effects (if the design is appropriate for that; no staggered
treatment, for instance), **fixest** offers several other DID-specific tools.
The below examples use generic data sets, since the CPS data used in the rest of
this page is not appropriate for DID.

### Interact fixed effects

## Standard errors

While you can specify standard errors inside the original **fixest** model call
(just like Stata), a unique feature of R is that you can adjust errors for an
existing model *on-the-fly*. This has several
benefits, including being
much more efficient since you don’t have to re-estimate your whole model. We’ll
try to highlight examples of both approaches below.

### HC

### HAC

### Clustered

### Conley standard errors

### On-the-fly SE adjustment

We’re belabouring the point now, but one last reminder that you can adjust the
standard errors for existing models “on the fly” by passing the `vcov = ...`

argument. There’s no performance penalty, since the adjustment is done
instantaneously and it therefore has the virtue of separating the mechanical
*computation* stage of model estimation from the *inference* stage. As we’ll see
below, on-the-fly SE adjustment works for a variety of other **fixest**
functions, e.g. `etable()`

. But here is a quick example using `summary()`

:

## Presentation

### Regression table

**Note:** The `etable()`

function is extremely flexible and includes support for
many things that we won’t show you here. See the relevant vignettes for more
(1,
2). Below we highlight a few unique features that don’t have direct Stata
equivalents. (You could potentially mimic with a loop, but that will require
more code and be slower, since your whole model has to be re-estimated each
time.)

### Joint test of coefficients

### Coefficient plot

### Interaction Plot

## Panel

**Note:** You don’t need to specify your panel variables globally and this functionality is mostly for convenience features associated with time-series operations like leads and lags. You can also use `panel(dat, ~ id + var)`

to do so on-the-fly in your regression call. But Laurent, the **fixest** author, recommends setting the panel ID globally when applicable, so that’s what we do below.