2024-07-13
Easy subsample comparisons with esttab.
It’s often the case that we want to display summary statistics for two samples
along with a t-test comparing the two samples. The ttest command with
esttab allows this, but limits the statistics that you can display.
Here is a minimal example of how to do this while retaining all of the statistics generated by summarize:
First load some example data:
// load example data
sysuse auto, clear
drop make // this is not numeric, dropping simplifies the example
I’m dropping make so that I can use * to reference all the variables in the dataset.
Next calculate standard summary stats for both samples:
We’ll use estpost su with d for detail and q for quantile and store each set as a named estimate.
// sum saving quantiles
estimates clear
estpost su * if foreign == 1 , d q
estimates store FOR
estpost su * if foreign == 0 , d q
estimates store DOM
Third, we’ll run the T-test:
We pass foreign to the by() option so that ttest computes the sample
split that we did above. The welch option relaxes the default assumption that
the two samples have the same variance, this is almost never what we want when
using observational data.
// run the ttest (note welch relaxes the default assumption of equal variances)
estpost ttest * , by(foreign) welch
estimates store ttests
Finally, tabulate!
Here things get a little exciting!
esttab has several features where you use a series of 1s and 0s to tell it
where to put things. I’ll just note that even thought these look similar, they
are not always doing the same thing. So pay attention and read the excellent
docs. Just a note, not a complaint.
Here is a minimal example of using esttab to tabulate counts, means, medians,
and the results of a t-test comparing the two.
// switch delimiter so that the command is easier to read
#delimit ;
esttab FOR DOM ttests ,
// the cells option allows us to tell esttab what stats we want from each estimate
cells((
// 'pattern' turns the cells on an off for each estimate (i.e. FOR, DOM, and ttest)
count(pattern(1 1 0 ) fmt(%12.0fc))
mean(pattern(1 1 0 ) fmt(%12.2fc))
sd(pattern(1 1 0 ) fmt(%12.3fc))
p50(pattern(1 1 0 ) fmt(%12.2fc))
b(star pattern(0 0 1) fmt(%12.2fc))
))
;
#delimit cr
Okay, lets take this line by line.
#delimit ;lets us break this long command across several lines so that it’s easier to read.esttab TAB FOR ttests ,these are the three sets of estimates that we are passing toesttabcells()is the option that allows us to specify which statistics to report. Inside this we are going to pass a list of statistics to report.count(),mean(),sd(),p50()are all stats generated byestpost su, d qwhich we have inFORandDOM.ttestsdoesn’t have any of these, but does haveb()so we’ll pass all five of them along with arguments that tellesttabwhich estimates have which statistics.- the first four stats are in the first two estimates so for each of them we
use
pattern(1 1 0)to indicate that they should be displayed forFORandDOMbut not forttests. - the final statistic
b()is only estimated inttestsand we indicate this withpattern(0 0 1)we also includestarso thatesttabwill report the difference in means (b()) along with stars to indicate statistical significance. - In all of these cases we are also specifying the format of the numbers with
the
fmt()option. Display formats in Stata are their own special delight… but well beyond the scope of this post.