2 minute read

I’m always hunting for ways to make my work more efficient, error free, and reproducible. I’m particularly obsessed with pipelines that eliminate inconsistencies between the data, code, and tables.

Pasting results into papers seems like a waste of time and an unnecessary opportunity for errors to enter the paper, and my system1 of piping of results from Stata or Python directly into Overleaf eliminates much of the copy-paste malebolge. However, I’ve always wondered if there was a straightforward way to pass things like sample counts and other data out of Stata (where I produce most of my results) into the Overleaf draft and reference them there.

The most obvious place for this is the sample selection discussion in a manuscript. This discussion is unlikely to change from draft to draft, but the actual sample size may change as datasets are expanded, controls added, sample restrictions introduced, etc. Stata makes this quite easy (thanks, as usual, to Ben Jann).

First we can define and reference numeric objects in LaTeX like this:

\documentclass{article}
% this import allows us to specify formatting:
\usepackage[group-separator={,},group-minimum-digits={3}]{siunitx}

\title{Quick example}

\begin{document}

% declare a count:
\newcount\example
% assign a value to it:
\example=599884422

\the\example \ is the number without formatting.

\num{\the\example} is the number with formatting.

\end{document}

Which gives us this:

Quick example

Honestly, reading the siunitx documentation is the most fiddly part of this. Then we can use Ben Jann’s devilishly simple yet powerful texdoc (ssc install texdoc) to export a simple file with some numbers in it from Stata:

// load some data
sysuse auto
// set aside some counts in local macros
count if foreign == 1
local foreign = r(N)
count if foreign == 0
local domestic = r(N)

// fire up texdoc
texdoc init samples.tex, replace force // note the options here are required to use texdoc this way
// write lines declaring counts with local macros:
texdoc write \newcount\forsample
texdoc write \forsample=`foreign'
texdoc write \newcount\domsample
texdoc write \domsample=`foreign'

Now you have a file samples.tex that reads:

\newcount\forsample
\forsample=22
\newcount\domsample
\domsample=52

Which you can load into your LaTeX document with an \input{} statement:

\documentclass{article}

\usepackage[group-separator={,},group-minimum-digits={3}]{siunitx}
\title{Counting cars}

\begin{document}

\input{samples.tex}


There are \the\forsample \ foreign cars in the dataset, and \the\domsample \ domestic cars in the sample. There are \the\numexpr\domsample - \the\forsample\relax \ more domestic cars than foreign cars in the sample.  

\end{document}

Which compiles to this:

Counting cars

So far I’ve been using this in the sample size discussions in papers without causing my coauthors to riot.

  1. I’m certainly over claiming here. What I’m calling “my system” seems like the main thing B-School empiricists would use LaTeX for.