2025-01-06
uv
for managing python packages
My workflow is basically: python –> Stata –> latex/markdown
I use Stata for estimation and Python for everything before that. Mostly pandas… until recently when polars became mature enough to invest time in. But I’ll have to write about that separately. If you’re using Python you’ll need a package manager and a way to manage virtual environments.
You need this for two reasons:
-
Package managers are how you download and keep track of python packages like
pandas
andpolars
. Mac OS and Linux ship with Python (often referred to as “system Python”), but this does not come with the packages you want to use for data analysis and you don’t want to dump a bunch of packages into this environment. -
Virtual environments keep track of what you are and are not using for a particular project. For example, if you wrote a paper in 2015 with pandas, the commands you used do not work on the pandas you would get if you downloaded it today. You’d like to keep that around, and potentially be able to share it with others if they want to replicate your work.
For a long time I’ve been using conda
. conda
is great at 1. But not so great
at 2. Let’s say that I want to create a virtual environment with python 13 and
pandas:
I would:
mkdir newproj
cd newproj
brew install miniconda
conda init
conda create --name newproj python=3.13 pandas
conda activate
and now I’m using this version of python and pandas. I can come back to it when
ever I need to, and even if I’ve updated other versions of python, the version
in this environment will not have changed. However, if I want to share this with
someone else it’s a bit fiddly to make sure that yo’ve actually tracked the
precise versions your code depends on. In particular, notice that the
information about this virtual environment is not, by default, associated with
the project I’m using it in. This is part of point two point 2 and uv
solves
this. All of the information about the packages you are using is saved in your
project by default.
Lets say that you are going to start a project called newproj
and you want to
use the same tools as I used in the conda
example. Here is what you would do
with uv
mkdir newproj
cd newproj
brew install uv
uv init
uv add polars # this is how you 'add' packages
uv venv --python=3.13
source .venv/bin/activate
uv sync
Now all of the information about the packages you installed (including the version of pandas that you got but did not specify) are recorded in the newproj folder. So you can share this folder and someone using it would only need to:
cd newproj
source .venv/bin/activate
uv sync # TODO: confirm
and they are using the exact same versions that you are. Also if they are already using UV and have the same versions in another project then they will not have to download a new version!
There are a few drawbacks, in these cases I fall back to conda:
wrds
requiresscipy
which has non-python dependencies anduv
doesn’t handle that well… as in: fails to handle it. So I useconda
to create an environment that has thewrds
module.- that’s it. There is no second one.