Introduction
Most population-based surveys aim to achieve an accurate representation of the population being surveyed. However, certain groups, such as Hispanic, Black/African American, lower-income, and young adults, have traditionally been underrepresented in survey research (Groves 2006; M. W. Link and Burks 2013). Numerous studies have shown that these individuals often exhibit different behaviors, attitudes, and opinions (Blumberg and Luke 2009; Delnevo, Gundersen, and Hagman 2008; Keeter et al. 2007; Krisberg 2009; M. Link et al. 2006). Consequently, failing to adequately represent them in population-based surveys can lead to biased estimates.
In recent years, address-based sampling (ABS), often utilizing push-to-web designs, has emerged in the United States as an effective sampling strategy, offering near full coverage of the housed population. ABS typically relies on an address frame generated from the U.S. Postal Service’s Computerized Delivery Sequence File (CDSF). ABS also facilitates various oversampling techniques, including oversampling based on vendor-appended sample flags, predictive modeling of household characteristics, and geographic sub-sampling, all of which can improve the representation of traditionally underrepresented groups.
Nevertheless, studies that only rely on ABS still tend to obtain lower proportions of Hispanic and Black/African American respondents, as well as respondents with lower-education and lower-incomes. This paper demonstrates that adding a supplemental sample of prepaid cell phone numbers (PPD) to an ABS study can help reach more of these harder-to-survey groups.
Prepaid (PPD) phones, also known as “pay-as-you-go” phones, are associated with plans that don’t require long-term contracts or a credit check. Acquiring a PPD phone is fast, straightforward, and often cost-effective. This may make them appealing to individuals with low income and/or limited or poor credit histories.
While research on the utility of PPD phones as a sample source remains somewhat limited, existing studies (Berzofsky et al. 2015, 2019; Dutwin 2017; Goyle and Sherr 2022; McGeeney 2015) suggest prepaid phone users are more likely to be non-white, possess lower incomes, rent their homes, have less education, and reside in urban areas, compared to other cellphone users. Furthermore, this research suggests that having a PPD phone may be significantly correlated with key health access and status indicators, including reduced access to healthcare, unmet healthcare needs, and a higher likelihood of having a chronic health condition. This correlation makes sense since other research (Brown et al. 2000) has established that demographics such as low-income Hispanics and African Americans are more likely to be uninsured and have reduced access to health care. If those demographics are also more likely to use a PPD phone, then including a PPD phone frame in sampling may help better represent these groups. Consequently, this may have implications for surveys that seek to report on the health and health-related metrics of a population.
Current Research
The main goal of the current research was to assess whether respondents from the PPD sample differ in demographic composition and key health status, access, and affordability indicators when compared with ABS respondents. While previous research has indicated such differences between regular Random Digit Dialed (RDD) cell sample and PPD samples (McGeeney 2015), there are to our knowledge, limited, if any, published comparisons between ABS and PPD samples. Additionally, it is worth noting that the existing research on the inclusion of PPD samples in survey research is now almost ten years old. As cell phone adoption patterns evolve, the demographic composition of the PPD market may have shifted. For instance, previous research indicates that PPD respondents were more likely to reside in Urban areas (McGeeney 2015), but this trend may no longer hold true. To ensure the continued relevance and accuracy of findings in this area, it is essential for research on PPD samples to remain current.
The research presented here attempts to fill this gap by analyzing data from four surveys that employed a multi-frame sample design, utilizing both ABS and PPD samples.
Methods
The studies used in our analysis were conducted between March 2021 and August 2023. All employed a mixed mode approach, integrating push-to-web and Computer-Assisted Telephone Interviewing (CATI) techniques. The population of interest for three of the included studies was the U.S. residential population of adults. One study was focused specifically on U.S. immigrant adults in the 50 states and D.C. Crucially, all four studies blended sample selected from both ABS and PPD frames.
The samples for the studies were sourced from the Marketing Systems Group (MSG). The ABS samples were generated from the U.S. Postal Service’s Computerized Delivery Sequence File (CDSF), which includes all delivery point addresses services by the USPS. Only records flagged as residential or mostly residential, as well as P.O. boxes defined as the only way a household can get mail (OWGM, that is, the homeowner has requested no mail delivery to the actual household, just the P.O. Box) were eligible for sampling. The PPD sample was generated from the MSG PPD cell sample universe.
Table 1 below shows the study target populations and responding sample sizes for the ABS and PPD samples across the 4 studies.
To compare respondents from the ABS and PPD samples, we first conducted two-sample column proportion z-tests examining key demographic characteristics and health status, access, and affordability indicators across the four distinct studies. While differences in outcomes at the bivariate stage are informative, they may simply reflect demographic differences that can later be adjusted through weighting. To more directly assess the impact of the sample frame on key outcomes, we conducted logistic regressions, which controlled for any variations in outcomes attributable to measured demographics. If we observe a significant effect of the sample frame on key outcomes even after controlling for demographics, it implies that incorporating PPD sample with ABS may help to mitigate biases in the ABS that weighting alone might not sufficiently correct.
All data were appropriately base-weighted to address variances stemming from differences in sample designs across the various studies and the analysis used the SPSS Complex Samples Module to account for the study designs.
Results
Demographic Comparisons
Table 2a presents the incidence of key demographics by sample frame and study. Compared with ABS respondents, PPD respondents are significantly more likely to be Hispanic, African American, and have completed the survey in a language other than English. They tend to have lower levels of education, typically completing a high school diploma or less. PPD respondents also exhibit lower income levels, are more likely to be born outside the U.S., and be young adults in the 18-24 age group. In contrast, they are less likely to be white, Asian, or homeowners.
Health Status, Access, and Affordability Comparisons
Table 2b presents the incidence of key health status, access, and affordability indicators by sample frame and study. Compared with ABS respondents, PPD respondents are significantly more likely to be uninsured or, if insured, to have Medicaid coverage. They are also significantly more likely to self-report fair or poor health status and report the quality of the health care they received in the past 12 months as fair or poor compared to ABS respondents. PPD respondents tend to indicate greater difficulty paying living expenses or medical bills. In two of the three studies where this question was asked, PPD respondents were significantly less likely to have a regular doctor or health care practitioner/place. In Study 3, the results trended in the same direction but were not significant, possibly due to small sample sizes.
Regressions
The bivariate analyses discussed above indicate that PPD respondents differ in demographic composition compared to ABS respondents. Moreover, they indicate potential differences in key health status, access, and affordability indicators between the two sample frames. However, the bivariate differences in the health-related outcomes may be a result of demographic differences that could be adjusted through weighting. To thoroughly evaluate the influence of the sample frame on key outcomes, we conducted multiple logistic regressions for each study while controlling for key demographics. The independent variables included sample frame, as well as race, ethnicity, interview language, education, age, household tenure, income, urbanicity, and nativity, which are commonly used demographic variables for weighting. The dependent variables were the health status, access, and affordability indicators listed in Table 3.
The results from the logistic regression analysis are presented in Table 3, where the key independent variable is a binary indicator of whether a respondent was from the prepaid phone (PPD) frame (coded as 1) versus the address-based sample (ABS) frame (coded as 0). The results predict the difference in outcome incidence between respondents from the PPD and ABS frames controlling for the demographic variables noted above. A positive coefficient (B) indicates that the variable is associated with a higher incidence for respondents in the PPD sample, while a negative coefficient indicates a lower incidence in respondents from the PPD sample.
These results indicate that, even after controlling for key demographics, in a majority of the studies respondents from the PPD frame were significantly more likely than those from the ABS frame to:
- Be uninsured.
- Lack employer-sponsored insurance.
- Report fair or poor health status.
- Report receiving fair or poor quality of health care in the past 12 months.
- Report not having a regular doctor or usual source of care.
Although bivariate analyses found that PPD respondents were more likely to be enrolled in Medicaid, be worried or stressed about healthcare costs, and accumulate medical debt, the regression analyses showed less consistent results. Specifically, after controlling for demographic differences between the responding samples, only two of the four studies demonstrated a significant relationship between sample frame and Medicaid coverage. For stress and medical debt outcomes, no consistent significant associations were observed across studies. This suggests that for those outcomes, weighting may help correct for bias caused by the sample frame.
Benefits and Barriers to Including PPD Sample
Across four national studies, our results reveal that compared with ABS respondents, PPD respondents exhibit a higher likelihood of having characteristics such as lower income, lower educational attainment, renting their homes, and being Hispanic, Black/African American individuals and young adults aged 18 to 24. Additionally, PPD respondents are more likely to be uninsured, report fair or poor health status and fair or poor quality of health care in the past 12 months. If insured, they are less likely to have employer sponsored insurance. They may also be less likely to have access to health care via a regular doctor or healthcare provider and more likely to struggle with covering living expenses. Most of these differences between ABS and PPD respondents held true even after controlling for key demographics typically used in weighting, suggesting that blending PPD samples with ABS may help to mitigate biases that weighting alone may not fully address.
Including the PPD sample enhances the representation of populations facing unique healthcare challenges, especially the uninsured. This is often an important analytical focus for surveys that seek to report on the health and health-related metrics of a population and inform health policy.
Since the demographics associated with the PPD frame are also linked to important societal outcomes such as educational access and achievement (Nitardy et al. 2015; Sirin 2005) and social and political engagement (Brown-Iannuzzi, Lundberg, and McKee 2017), it is likely that possessing a PPD phone may correlate with these outcomes. Future research should explore these potential relationships more explicitly. Prior research had indicated a higher proportion of prepaid cell phone users residing in urban areas compared to more rural areas. However, the results from our analysis challenge this assumption. In our findings, we did not detect any significant difference in urbanicity between PPD and ABS respondents, suggesting a departure from the previously held belief.
Combining PPD sample with ABS is not without challenges. The primary challenge in incorporating a prepaid sample into a study lies in the associated costs and additional complexity. Interviewing prepaid respondents can be resource-intensive and time-consuming.
Many sample providers can flag whether a phone number is associated with a prepaid plan and has been active using information from carriers. However, due to the non-contractual nature of PPD phones, there is a notable rate of non-working numbers even after applying activity flags. For instance, across the four studies referenced in this paper the non-working rate for the PPD phones ranged from 44.1% to 57.6%. Incorporating a PPD sample into a predominantly ABS design may also introduce complexity on the front end of the fielding process, requiring additional programming efforts to provide a CATI interface for interviewing prepaid respondents.
On the back end, merging ABS and PPD samples requires careful compositing during the weighting process. First, the base weights get assigned differently depending on whether the responding case was from the ABS or PPD frame. Then, since the ABS sample and PPD samples are drawn from separate but overlapping frames they need to be combined with a composite adjustment that downweighs cases that overlap. One mechanism for handling this is to ask the ABS respondents about their prepaid phone usage during the survey. This, however, risks introducing measurement error into the weighting if respondents don’t understand or misreport their PPD phone use. Future research opportunities include exploring ABS respondents’ understanding and accuracy in responding to PPD phone usage questions and determining optimal methods to account for respondents with multiple prepaid phones. These challenges underscore the importance of careful planning and execution when incorporating prepaid samples into research designs and weighing the challenges against the potential benefit of improved representation of hard-to-reach populations.
Despite these challenges, our results strongly emphasize the concrete benefits of incorporating a prepaid sample into a study’s design. This becomes particularly vital when the research aims to report on traditionally underrepresented subpopulations. Incorporating respondents from the PPD sample frame can enhance a survey’s accuracy and ensure a more comprehensive representation of diverse demographic and health-related characteristics.
Limitations
One limitation of this research is that the analysis is restricted to data from survey respondents. Since we lack demographic and outcome variable information for the entire sample frame, it becomes difficult to distinguish between the bias inherent in the sample frame itself and the bias introduced by non-responding segments of the population. This makes it challenging to fully account for potential sources of bias in our analysis. So, while this study hypothesizes that the composition of the PPD (prepaid) sample frame may differ meaningfully from the ABS (address-based) frame, our conclusions are limited to comparisons between respondents from each frame.
Additionally, our ability to evaluate the proportion of sample that should be allocated to the prepaid phone (PPD) frame is limited due to the absence of strong external benchmarks. As a result, decisions regarding optimal sample distribution and compositing frames may not fully reflect the true composition of the target population.
Finally, the analyses presented in this paper focus on health status, access and affordability indicators. While these are central to the objectives of the surveys examined, our findings may not generalize to other domains of interest.
Acknowledgements
The authors would like to thank Robyn Rapoport, Mickey Jackson and Eran Ben-Porath for their important contributions and input to this research.
Corresponding author contact information
Arina Goyle, SSRS, 1 Braxton Way, Suite 125, Glen Mills, PA 19342
Email: agoyle@ssrs.com