# Other Packages

While we think you can get really far in R with just **data.table** and **fixest**, of course these two packages don't cover everything.

This page covers a small list of packages you may find especially useful when getting started with R. We won't try to cover everything under the sun here. Just a few places to get going. For the rest, well, that's what StackOverflow or your favourite search engine is for.

All of the below packages have far more applications than is shown here. We'll just provide one or two examples of how each can be used. Finally, don't forget to install them with `install.packages('PKGNAME')`

and load them with `library(PKGNAME)`

. The former command you only have to run once per package (or as often as you want to update it); the latter whenever you want to use a package in a new R session.

## base

*Where it all begins*

Like many programming languages, one of R's great strengths is its package ecosystem. But *none* of that would be possible without the scaffolding provided by **base**open in new window R. The "base" part here represents a set of core libraries and routines that get installed and loaded automatically whenever you start an R session. And you really get a lot out of the gate, because base R is incredibly versatile and function rich. Many of the operations that we have shown you on the preceding pages could equally have been implemented using off-the-shelf base R equivalents. We won't attempt to persuade you of that here, but there are lots of good tutorials available if you're interested (hereopen in new window for example). Below we'll just highlight a few simple examples to give you an idea.

#### Plotting (simple histogram)

```
set obs 100
gen x = rnormal()
histogram x
```

2

3

```
x = rnorm(100)
hist(x)
```

2

#### Linear regression

```
reg y x1 x2
```

```
lm(y ~ x1 + x2, dat)
```

#### Iteration (loops)

```
foreach i of numlist 1/10 {
display `i' + 100
}
```

2

3

```
for (i in 1:10) {
print(i + 100)
}
# Aside 1: A single line works too here.
for (i in 1:10) print(i + 100)
# Aside 2: R provides "functional programming" eqivalents
# to for-loops via the *apply family of functions. These
# have various advantages, which we won't get into here.
# Still the most important member is arguably "lapply", which
# we've already seen a couple of times and returns a list
# result (which is great for programming). Here's the
# equivalent lapply code to the previous for-loop.
lapply(1:10, function(i) print(i + 100))
```

2

3

4

5

6

7

8

9

10

11

12

13

14

15

## ggplot2

*Beautiful and customizable plots*

**ggplot2**open in new window is widely considered one of the preeminent plotting libraries available in any language. It provides an intuitive syntax that applies in the same way across many, many different kinds of visualizations, and with a deep level of customization. Plus, endless additional plugins to do what you want, including easy interactivity, animation, maps, etc. We thought about giving **ggplot2** its own dedicated page like **data.table** and **fixest**. But instead we'll point you to the Figuresopen in new window section of the *Library of Statistical Techniques*, which already shows how to do many different graphing tasks in both Stata and **ggplot2**. For a more in-depth overview you can always consult the excellent package documentationopen in new window, or Kieran Healy's wonderful *Data Visualization*open in new window book.

#### Basic scatterplot(s)

```
twoway scatter yvar xvar
twoway (scatter yvar xvar if group == 1, mc(blue)) \\\
(scatter yvar xvar if group == 2, mc(red))
```

2

3

4

```
ggplot(dat, aes(x = xvar, y = yvar)) + geom_point()
ggplot(dat, aes(x = xvar, y = yvar, color = group)) +
geom_point()
```

2

3

4

## tidyverse

*A family of data science tools*

The **tidyverse**open in new window provides an extremely popular framework for data science tasks in R. This meta-package is actually a collection of smaller packages that are all designed to work together, based on a shared philosophy and syntax. We've already covered **ggplot2** above, but there are plenty more. These include **dplyr** and **tidyr**, which offer an alternative syntax and approach to data wrangling tasks. While we personally recommend **data.table**, these **tidyverse** packages have many ardent fans too. You may find that you prefer their modular design and verbal syntax. But don't feel bound either way: it's totally fine to combine them. Some other **tidyverse** packages worth knowing about include **purrr**, which contains a suite of functions for automating and looping your work, **lubridate** which makes working with date-based data easy, and **stringr** which offers functions with straightforward syntax for working with string variables. In the examples that follow, note that `%>%`

is a pipe operatoropen in new window.

#### Data wrangling with dplyr

*Note: dplyr doesn't modify data in place. So you'll need to (re)assign if you want to keep your changes. E.g. dat = dat %>% group_by...*

Subset by rows and then columns.

```
keep if var1=="value"
keep var1 var2 var3
```

2

```
dat %>%
filter(var1=="value") %>%
select(var1, var2, var3)
```

2

3

Create a new variable by group.

```
bysort group1: egen mean_var1 = mean(var1)
```

```
dat %>%
group_by(group1) %>%
mutate(mean_var1 = mean(var1))
```

2

3

Collapse by group.

```
collapse (mean) mean_var1 = var1, by(group1)
```

```
dat %>%
group_by(group1) %>%
summarise(mean_var1 = mean(var1))
```

2

3

#### Manipulating dates with lubridate

```
* Shift a date forward one month (not 30 days, one month)
* ???
```

2

```
# Shift a date forward one month (not 30 days, one month)
shifted_date = date + months(1)
```

2

#### Iterating with purrr

Read in many files and append them together.

```
local filelist: dir "data/" files "*.csv"
tempfile mytmpfile
save `mytmpfile', replace empty
foreach x of local filelist {
qui: import delimited "data/`x'", case(preserve) clear
append using `mytmpfile'
save `mytmpfile', replace
}
```

2

3

4

5

6

7

8

```
filelist = dir("data/", pattern=".csvquot;, full.names=TRUE)
dat = map_df(filelist, data.table::fread)
# Note: map_*df* means map (iterate) then coerce the
# result to a data frame
```

2

3

4

5

Iterate over variables.

```
ds, has(type long)
collapse (mean) `r(varlist)'
```

2

```
# Note: map is a stand-in replacement for lapply
dat[, map(.SD, mean), .SDcols=is.numeric]
```

2

#### String operations with stringr

```
subinstr("Hello world", "world", "universe", .)
substr("Hello world", 1, 3)
regexm("Hello world", "ello")
```

2

3

4

```
str_replace_all("Hello world", "world", "universe")
str_sub("Hello world", 1, 3)
str_detect("Hello world", "ello")
# Note all the stringr functions accept regex input
```

2

3

4

## collapse

*Extra convenience functions and super fast aggregations*

Sure, we've gone on and on about how fast **data.table** is compared to just about everything else. But there is another R package that can boast even faster computation times for certain grouped calculations and transformations, and that's collapseopen in new window. The **collapse** package doesn't try to do everything that **data.table** does. But the two play very well togetheropen in new window and the former offers some convenience functions like `descr`

and `collap`

, which essentially mimic the equivalent functions in Stata and might be particularly appealing to readers of this guide. (P.S. If you'd like to load **data.table** and **collapse** at the same time, plus some other high-performance packages, check out the **fastverse**open in new window.)

#### Quick Summaries

```
summarize
describe
```

2

```
qsu(dat)
descr(dat)
```

2

#### Multiple grouped aggregations

```
collapse (mean) var1, by(group1)
collapse (min) min_var1=var1 min_var2=var2 (max) max_var1=var1 max_var2=var2, by(group1 group2)
```

2

```
collap(dat, var1 ~ group1, fmean) # 'fmean' => fast mean
collap(dat, var1 + var2 ~ group1 + group2, FUN = list(fmin, fmax))
```

2

## sandwich

*More standard error adjustments*

**fixest** package comes with plenty of shortcuts for accessing standard error adjustments like HC1 heteroskedasticity-robust standard errors, Newey-West, Driscoll-Kraay, clustered standard errors, etc. But of course there are still more than that. A host of additional options are covered by the **sandwich**open in new window package, which comes with a long list of functions like `vcovBS()`

for bootstrapped standard errors, or `vcovHC()`

for HC1-5. **sandwich** supports nearly every model class in R, so it shouldn't surprise that these can slot right into `fixest`

estimates, too. You shouldn't be using those `, robust`

errors for smaller samples anyway... but you knew thatopen in new window, right?

#### Linear Model Adjustments

```
* ", robust" uses hc1 which isn't great for small samples
regress Y X Z, vce(hc3)
```

2

```
# sandwich's vcovHC uses HC3 by default
feols(Y ~ X + Z, dat, vcov = sandwich::vcovHC)
# Aside: Remember that you can also adjust the SEs
# for existing models on the fly
m = feols(Y ~ X + Z, dat)
summary(m, vcov = sandwich::vcovHC)
```

2

3

4

5

6

7

## modelsummary

*Summary tables, regression tables, and more*

The **fixest** package already has the `etable()`

function for generating regression tables. However, it is only really intended to work with models from the same package. So we also recommend checking out the fantastic **modelsummary**open in new window package. It works with all sorts of model objects, including those not from **fixest**, is incredibly customizable, and outputs to a bunch of different formats (PDF, HTML, DOCX, etc.) Similarly, **modelsummary** has a wealth of options for producing publication-ready summary tables. Oh, and it produces coefficient plots too. Check out the package websiteopen in new window for more.

#### Summary tables

```
* Summary stats table
estpost summarize
esttab, cells("count mean sd min max") nomtitle nonumber
* Balance table
by treat_var: eststo: estpost summarize
esttab, cells("mean sd") label nodepvar
```

2

3

4

5

6

7

```
# Summary stats table
datasummary_skim(dat)
# Balance table
datasummary_balance(~treat_var, dat)
```

2

3

4

5

6

#### Regression tables

**Aside:** Here we'll use the base R `lm()`

(linear model) function, rather than `feols()`

, to emphasize that **modelsummary** works with many different model classes.

```
reg Y X Z
eststo est1
esttab est1b
reg Y X Z, vce(hc3)
eststo est1b
esttab est1b
esttab est1 est1b
reg Y X Z A, vce(hc3)
eststo est2
esttab est1 est1b est2
```

2

3

4

5

6

7

8

9

10

11

12

13

```
est1 = lm(Y ~ X + Z, dat)
msummary(est1) # msummary() = alias for modelsummary()
# Like fixest::etable(), SEs for existing models can
# be adjusted on-the-fly
msummary(est1, vcov='hc3')
# Multiple SEs for the same model
msummary(est1, vcov=list('iid', 'hc3'))
est3 = lm(Y ~ X + Z + A, dat)
msummary(list(est1, est1, est3),
vcov = list('iid', 'hc3', 'hc3'))
```

2

3

4

5

6

7

8

9

10

11

12

13

## lme4

*Random effects and mixed models*

**fixest** can do a lot, but it can't do everything. This site isn't even going to attempt to go into how to translate every single model into R. But we'll quick highlight random-effects and mixed models. The **lme4**open in new window and its `lmer()`

function covers not just random-intercept models but also hierarchical models where slope coefficients follow random distributions. (**Aside:** If you prefer Bayesian models for this kind of thing, check out **brms**open in new window.)

#### Random effects and mixed models

```
xtset group time
xtreg Y X, re
mixed lifeexp || countryn: gdppercap
```

2

3

```
# No need for an xtset equivalent
m = lmer(Y~(1|group) + X, data = dat)
nm = lmer(Y~(1+x|group) + X, data = dat)
```

2

3

## marginaleffects

*Marginal effects, constrasts, etc.*

The Stata `margins`

command is great. To replicate it in R, we highly recommend the **marginaleffects**open in new window package. Individual marginal effects or average marginal effects for nonlinear models, or models with interactions or transformations, etc. It's also very fast.

#### Basic logit marginal effects

```
* A logit:
logit Y X Z
margins, dydx(*)
```

2

3

```
# This example incorporates the fixest function feglm()
m = feglm(Y ~ X + Z, family = binomial, data = mtcars)
summary(marginaleffects(m))
```

2

3

## multcomp / nlWaldTest

*Joint coefficient tests*

Stata provides a number of inbuilt commands for (potentially complex) postestimation coefficient tests. We've already seen the `testparm`

command equivalent with `fixest::wald()`

. But what about combinations of coefficients *a la* Stata's `lincom`

and `nlcom`

commands? The **multcomp**open in new window package handles a variety of linear tests and combinations, while **nlWaldTest**open in new window has you covered for nonlinear combinations.

#### Test other null hypotheses and coefficient combinations

```
regress y x z
* One-sided test
test _b[x]=0
local sign_wgt = sign(_b[x])
display "H0: coef <= 0 p-value = " ttail(r(df_r),`sign_wgt'*sqrt(r(F)))
* Test linear combination of coefficients
lincom x + z
* Test nonlinear combination of coefficients
nlcom _b[x]/_b[z]
```

2

3

4

5

6

7

8

9

10

11

12

13

```
m = feols(y ~ x + z, dat)
# One-sided test
m2 = multcomp::ghlt(m, '<=0')
summary(m2)
# Test linear combination of coefficients
m3 = multcomp::glht(m, 'x + z = 0')
summary(m3) # or confint(m3)
# Test nonlinear combination of coefficients
nlWaldtest::nlWaldtest(m, 'b[2]/b[3]') # or nlWaldtest::nlConfint()
```

2

3

4

5

6

7

8

9

10

11

12

13

## sf

*Geospatial operations*

R has outstanding support for geospatial computation and mapping. There are a variety of packages to choose from here, depending on what you want (e.g. vector vs raster data, interactive maps, high-dimensional data cubes, etc.) But the workhorse geospatial tool for most R users is the incredibly versatile **sf**open in new window package. We'll only provide a simple mapping example below. The **sf** websiteopen in new window has several in-depth tutorials, and we also recommend the *Geocomputation with R*open in new window book by Robin Lovelace, Jakub Nowosad, and Jannes Muenchow.

#### Simple Map

```
* Mapping in Stata requires the spmap and shp2dta
* commands, and also that you convert your (say)
* shapefile to .dta format first. We won't go through
* all that here, but see:
* https://www.stata.com/support/faqs/graphics/spmap-and-maps/
```

2

3

4

5

```
# This example uses the North Carolina shapefile that is
# bundled with the sf package.
nc = st_read(system.file("shape/nc.shp", package = "sf"))
plot(nc[, 'BIR74'])
# Or, if you have ggplot2 loaded:
ggplot(nc, aes(fill=BIR74)) + geom_sf()
```

2

3

4

5

6