Survey Design and Analysis

Sep 3, 2020

Another line of my research focuses on advanced methods for dealing with non-representative surveys. As technology has changed the way people live, it has made it harder for pollsters to reach them using traditional methods. Live call interviews to a landline phone will now, typically, result in a completed interview less than 10% of the time. Additionally, there are systematic differences in the response rates of key demographic groups, which leads to biased and unrepresentative measures of public opinion. With the rise of access to the Internet among the general public, innovations have emerged that allow polling firms to move from telephone interviewers to self-completed forms administered through the Internet. These polls have proven more cost effective, but suffer from systematic biases in who is included in the survey, and therefore whose opinions are represented in our measures of American public opinion. This precipitous drop in response rates to probability sampled surveys, and an increasing reliance on convenience samples from internet surveys, has lead to an acute need for methods which can leverage increasingly common large, covariate rich datasets to address the non-representative nature of modern surveys.

In my project A kernel balancing approach for reducing specification assumptions in survey weighting, with Chad Hazlett and Ciara Sterbenz, we develop a machine learning technique, Kernel Population Weighting (KPop), that allows for researchers to leverage rich covariate data without requiring researchers to make decisions about what variables, or functions thereof, to include when constructing survey weights.

In Multilevel calibration weighting for survey data, with Eli Ben-Michael and Avi Feller, we address the challenge of finding calibration weights when covariates are high dimensional and especially when interactions between variables are important. We propose multilevel calibration weighting, which enforces tight balance constraints for marginal balance and looser constraints for higher-order interactions. This incorporates some of the benefits of post-stratification while retaining the guarantees of raking. We then correct for the bias due to the relaxed constraints via a flexible outcome model; we call this approach Double Regression with Post-stratification (DRP).

In Sensitivity Analysis for Survey Weights, with Melody Huang, we propose two sensitivity analyses for the exclusion of important covariates from the construction of survey weights–(1) a sensitivity analysis for partially observed confounders (i.e., variables measured across the survey sample, but not the target population), and (2) a sensitivity analysis for fully unobserved confounders (i.e., variables not measured in either the survey or the target population). We provide graphical and numerical summaries of the potential bias that arises from such confounders, and introduce a benchmarking approach that allows researchers to quantitatively reason about the sensitivity of their results.

In addition to developing statistical methods for survey weighting, I have done work to improve best practices for applied researchers in survey weighting. In Target Estimation and Adjustment Weighting for Survey Nonresponse and Sampling Bias, with a number of co-authors, we provide an in-depth introduction to survey weighting. We discuss the methodological framework, practical considerations, and work through two examples—a modern survey example concerning the 2016 Presidential election and a historic example using 1940s and 1950s public opinion data. In Accounting for Complex Survey Designs: Strategies for Post-stratification and Weighting of Internet Surveys (with Ines Levin) we provide an intuitive, minimally technical introduction to survey weighting.

See the full list of relevant manuscripts and publications below!