2024-07-13
Easy subsample comparisons with esttab.
It’s often the case that we want to display summary statistics for two samples
along with a t-test comparing the two samples. The ttest
command with
esttab
allows this, but limits the statistics that you can display.
Here is a minimal example of how to do this while retaining all of the statistics generated by summarize:
First load some example data:
// load example data
sysuse auto, clear
drop make // this is not numeric, dropping simplifies the example
I’m dropping make
so that I can use *
to reference all the variables in the dataset.
Next calculate standard summary stats for both samples:
We’ll use estpost su
with d
for detail
and q
for quantile
and store each set as a named estimate.
// sum saving quantiles
estimates clear
estpost su * if foreign == 1 , d q
estimates store FOR
estpost su * if foreign == 0 , d q
estimates store DOM
Third, we’ll run the T-test:
We pass foreign
to the by()
option so that ttest
computes the sample
split that we did above. The welch
option relaxes the default assumption that
the two samples have the same variance, this is almost never what we want when
using observational data.
// run the ttest (note welch relaxes the default assumption of equal variances)
estpost ttest * , by(foreign) welch
estimates store ttests
Finally, tabulate!
Here things get a little exciting!
esttab
has several features where you use a series of 1s and 0s to tell it
where to put things. I’ll just note that even thought these look similar, they
are not always doing the same thing. So pay attention and read the excellent
docs. Just a note, not a complaint.
Here is a minimal example of using esttab
to tabulate counts, means, medians,
and the results of a t-test comparing the two.
// switch delimiter so that the command is easier to read
#delimit ;
esttab FOR DOM ttests ,
// the cells option allows us to tell esttab what stats we want from each estimate
cells((
// 'pattern' turns the cells on an off for each estimate (i.e. FOR, DOM, and ttest)
count(pattern(1 1 0 ) fmt(%12.0fc))
mean(pattern(1 1 0 ) fmt(%12.2fc))
sd(pattern(1 1 0 ) fmt(%12.3fc))
p50(pattern(1 1 0 ) fmt(%12.2fc))
b(star pattern(0 0 1) fmt(%12.2fc))
))
;
#delimit cr
Okay, lets take this line by line.
#delimit ;
lets us break this long command across several lines so that it’s easier to read.esttab TAB FOR ttests ,
these are the three sets of estimates that we are passing toesttab
cells()
is the option that allows us to specify which statistics to report. Inside this we are going to pass a list of statistics to report.count()
,mean()
,sd()
,p50()
are all stats generated byestpost su, d q
which we have inFOR
andDOM
.ttests
doesn’t have any of these, but does haveb()
so we’ll pass all five of them along with arguments that tellesttab
which estimates have which statistics.- the first four stats are in the first two estimates so for each of them we
use
pattern(1 1 0)
to indicate that they should be displayed forFOR
andDOM
but not forttests
. - the final statistic
b()
is only estimated inttests
and we indicate this withpattern(0 0 1)
we also includestar
so thatesttab
will report the difference in means (b()
) along with stars to indicate statistical significance. - In all of these cases we are also specifying the format of the numbers with
the
fmt()
option. Display formats in Stata are their own special delight… but well beyond the scope of this post.