Target Selection as Variable Selection: Using the Lasso to Select Auxiliary Vectors for the Construction of Survey Weights


Survey nonresponse is a ubiquitous problem in modern survey research. As individuals have become less likely to respond to surveys there has been a simultaneous rise in highly granular data sources that can be used to help ameliorate the nonresponse problem. While much research has been done on post-hoc weighting methods, which provide a flexible and general solution for unit nonresponse, there is an open question of how to select the optimal auxiliary vector to include in the weighting method. We formulate this as a methodological question of variable and interaction selection where the goal is, assuming an individual level stochastic response probability, to construct an optimal set of weights for each individual respondent to account for an observed pattern of nonresponse. We use recent literature on hierarchical group-lasso regularization to determine the best auxiliary vector for weighting. We show the advantages of this method in simulations that are derived from real survey data sampled o↵ of an individual level voter file in recent elections. We also apply the method to historic quota sampled survey data from the 1930s and 1940s to show the advantages of this method even where the sampling design is unknown.

Erin Hartman
Erin Hartman
Assistant Professor of
Political Science