Covariate Selection for Generalizing Experimental Results: Application to a Large-Scale Development Program in Uganda


Scientists are often interested in generalizing causal effects estimated in an experiment to a target population. However, analysts are often constrained by available covariate information, which has limited applicability of existing approaches that assume rich covariate data from both experimental and population samples. As a concrete context, we focus on a large-scale development program, called the Youth Opportunities Program (YOP), in Uganda. Although more than 40 pre-treatment covariates are available in the experiment, only 8 of them were also measured in a target population. To tackle this common issue of data constraints, we propose a data-driven method to estimate a separating set – a set of variables affecting both the sampling mechanism and treatment effect heterogeneity – and show that the population average treatment effect (PATE) can be identified by adjusting for estimated separating sets. Our approach has two advantages. First, our algorithm only requires a rich set of covariates in the experimental data, not in the target population. Second, the proposed algorithm can select separating sets under researcher-specific constraints on the population data. Using the YOP experiment, we find that the proposed algorithm can allow for estimation of the PATE in situations where conventional methods fail due to data requirements.

In Journal of the Royal Statistical Society, Series A
Erin Hartman
Erin Hartman
Assistant Professor of
Political Science