List-assisted random digit dialing (RDD) is the sampling procedure that is normally used in constructing samples of telephone households. This is a truncated design because it only includes telephone hundreds banks with one or more listed numbers. However, this design has become widely accepted after a 1995 study found that only 3.7% of working household telephone numbers fell in the unlisted banks with no significant demographic biases.
A recent study has re-examined the coverage of 100-series banks with one or more listed telephone numbers for landline households. Fahimi and his colleagues concluded that “the coverage loss for designs based on the 1+ listed banks is closer to 20% than 4%” today. Such a coverage error would call into question the acceptability of the current RDD sampling procedures for landline households, and in combination with cell phone coverage issues, the very future of telephone surveys.
The current study attempts to replicate the Fahimi study with sample from a second vendor and a somewhat different process for classifying households and non-households. Based on a national RDD sample of 10,000 numbers from 1+ listed banks and 27,175 numbers from unlisted banks, we find that 95% of landline households are still located in 1+ listed banks. These findings would seem to support the continued viability of list-assisted RDD sampling in the design and conduct of telephone surveys.
Background on List-Assisted RDD Sampling
Telephone surveys became the dominant mode of data collection for general population surveys in the United States in the 1980’s. The number of U.S. households with no (landline) telephones fell to about 10% by the early 1970’s. The publication of the Mitofsky-Waksberg RDD sampling procedure in 1978 (Waksberg 1978) established an accepted standard for the sampling of telephone households.
During the 1970’s and 1980’s, market research companies frequently conducted telephone surveys using list-assisted RDD sampling. By restricting the sampling frame to the banks with listed numbers, the efficiency of the sampling procedure is equal to or greater than the Mitofsky-Waksberg method. Moreover, the list-assisted RDD procedures provided an efficient method of drawing an element sample of telephone households. However, adoption of list-assisted RDD sampling was inhibited because of concerns about the unknown coverage of listed banks and the potential sampling bias associated with the excluded population of households.
In 1995, a seminal study of the coverage and bias of list-assisted RDD sampling was conducted. After excluding categories that were not available for general residential usage, they divided the remaining frame of telephone numbers from the Bellcore file into two strata. The first stratum consisted of all telephone numbers in 100-banks that have at least one listed residential telephone number. The second stratum was the zero-listed stratum containing telephone numbers in 100-banks that have no listed, residential telephone numbers (Brick et al. 1995).
The investigators drew a single-stage, epsem sample of 10,000 telephone numbers from the zero-listed stratum. These numbers were dialed by interviewers to determine whether they were residential. Out of the 10,000 telephone numbers in the zero banks, only 135 were found to be residential. This was a residential hit rate of 1.4% in the zero-listed stratum. Based on the estimated proportions of residential telephone numbers in the zero-listed banks and the estimate proportion of residential telephone numbers in the listed stratum from other studies, they estimated that 3.7% of all telephone households were not covered when the sample was restricted to the listed stratum. The authors concluded: “The results from this research indicate that the truncated, list-assisted RDD sampling method is efficient and the estimates from the design are not subject to important coverage bias.” Consequently, truncated list-assisted RDD sampling procedure emerged as the standard sampling method for telephone surveys of the general population.
Unlisted Blocks Emerge as a Potentially Serious Problem for RDD Sampling
In 2008, a very different set of findings about the coverage of list-assisted RDD samples were reported by Fahimi and his colleagues. Marketing Systems Group (MSG) drew samples of telephone numbers from “three strata that collectively constitute the entire pool of available landline telephone numbers.” One stratum (1+ listed banks) includes all telephone numbers in 100-series banks that have at least one listed number. This is directly equivalent to the “listed stratum” in the earlier study. A second stratum (zero listed banks) consisted of telephone numbers in 100-series banks that have no listed number but are part of telephone exchanges (NXXs) with at least one listed number. Finally, a third stratum of the remaining telephone numbers in plain old telephone service (OPOTS) 100-series and mixed-use banks from exchanges with no listed numbers.
A sample of approximately 20,000 numbers was drawn from the 1+ banks stratum and nearly 10,000 each from the zero banks and OPOTS strata. These numbers were dialed up to 9 times to determine their household status using MSG’s CSS-attended screening service. Those numbers whose status was undetermined after nine dialing attempts were processed through two additional database matches. After the nine calling attempts, they reported that approximately 7% of the sample remained undetermined. After the additional matching processes, they reported that less than 3% of the sample remained undetermined.
The estimated hit rate for residential households was 30.8% in the 1+ listed banks, 4.0% in the zero listed banks, and 2.7% in the remaining OPOTS. When the household hit rate in each of the three strata was applied to the number of telephone numbers in each stratum, the authors concluded that the percentage of residential numbers in 1+ banks has dropped from 96% in 1995 to 80% in 2008. Most of the coverage loss (14% of residential numbers) was found in the zero banks where there were no listed numbers in the 100-series but one or more listed numbers in the exchanges. The remaining OPOTS numbers accounted for another 4% of households.
Based on these estimates, the authors concluded in 2008: “These changes have greatly reduced the utility of 100-series banks for constructing RDD sampling frames. Consequently, continuing to sample from a frame that contains only 1+ listed 100-banks entails a much larger coverage loss than suggested by previous studies.” “Telephone samples that ignore cell phones and use the standard 1+ listed design can exclude over 30% of the population. The potential for substantial coverage bias in this situation cannot be ignored.”
Current Study: A Second Look at Unlisted Hundred Banks
We undertook the current study to replicate the findings of the Fahimi study with a different sample, while also exploring whether households reached in unlisted banks might be represented in listed banks, as well as describing the characteristics of households reached in unlisted banks compared to listed banks. We attempted to replicate the sample design used by Fahimi, Kulp and Brick using a second sample vendor. They used Marketing Systems Group (MSG) as the sample vendor for their survey. We used Survey Sampling, Inc. (SSI) for our study. Both organizations draw their samples based on information from the Telcordia (formerly Bell Core) TPM Data Source, and the same list compiler for determining listed numbers, so we would anticipate general agreement on the definition of the strata and the size of the population within stratum. However, some counts might vary depending on when the last updates of the sampling frame were done. It is also possible that differences in timing and proprietary validations rules could cause some variation in the codes used to define eligible NXXs and thousand banks.
Telephone numbers in the United States consist of ten numbers. The first three numbers are the area code or NPA. The next three numbers are designed as the NXX, which are often called the exchange, prefix or central office number. The N ranges from 2 to 9, while the X ranges from 0 to 9. The last four numbers are designated as the thousand bank (Xxxx). The last three numbers are designated as the hundreds bank (XXxx). The current practice of list- assisted random digit dialing in the United States is to: (a) construct a frame of all NPA-NXX numbers which are available for residential household numbers; (b) restrict the frame to hundreds series banks with one or more listed numbers; and (c) randomly select a sample of these hundreds banks with listed numbers and append a two-digit random number to complete the ten digit telephone number.
For this study, we initially constructed a sampling frame that included all valid NXX’s or thousand banks available for residential numbers in the Telcordia data base. At the time the sample was drawn, there were a total of 766,540 thousand banks or 7,665,400 hundred banks in the sampling frame. The frame was stratified into 1+ listed hundreds banks, zero listed hundreds banks, and the remainder OPOTS. The 1+ listed hundreds bank frame was ordered by State and County, area code, exchange and hundred bank and a systematic epsem sample of 10,000 telephone numbers was generated. Known business numbers were pre-identified but were not removed from the frame or sample. The zero listed and OPOTS frames were ordered by area code, exchange and hundred bank and systematic epsem samples of 10,000 telephone numbers were generated from each frame.
A comparison of the universe counts from the MSG sample and the SSI sample revealed similar counts for the listed hundred banks (2.9 versus 2.8 million). However, the MSG counts from the RDD zero hundred banks and OPOTS (6.1 million) were substantially higher than the SSI counts (4.8 million). The difference, however, could be accounted for by the inclusion of non-Telcordia banks in the MSG sample. Non-Telcordia banks are banks in “pooled” prefixes for which there is no 1000-block record on the Telcordia file. Thousand Block Pooling designates a pool of prefixes that are assigned a thousand telephone lines at a time by the Pool Administrator to potentially different companies. Any 1000-block in a pooled exchange that did not appear on the Telcordia file was presumed to be unassigned or not currently in use and was therefore excluded from the SSI frames. However, rather than under-represent any unlisted banks, we added a fourth stratum from the non-Telcordia banks to our study. An additional 10,000 numbers were selected from this stratum in the same manner as from the other two zero listed strata. With the inclusion of the non-Telcordia banks in the SSI sample, the total number of listed and unlisted banks were roughly equivalent for the MSG (9.03 million) and SSI (9.17 million) sampling frames (Figure 1). The 1.5% difference in the total SSI sampling frame (larger) and the 3.7% difference in 1+ banks (smaller) is small and probably the result of the timing of the updates from the two sampling frames. However, if there were a bias, then it we would expect to find more households in the 1+ banks in the MSG sample, where 1+ banks represent a slightly larger proportion of the total sampling frame.
A sample of 10,000 numbers was drawn by SSI for each of the four non-overlapping sample strata: 1+ listed, zero listed, OPOTS, and non-Telcordia. As a result of a selection error, 2825 of the numbers drawn in the non-Telcordia sample were found to be invalid and dropped from the sample for that stratum. Hence, a total sample of 37,175 telephone numbers across the four strata was drawn and fielded.
These numbers were dialed by interviewers at Abt SRBI Inc. using a predictive dialer, which should be equivalent to the Genesys CSS process. In order to classify the status of as many of these numbers as possible, we increased the contact attempts to reach a household and interview a respondent compared to the earlier study. A total of 11 contact attempts were made to reach an individual with whom to conduct screening for household status.
This is where we expanded on Fahimi’s procedures by adding a brief interview regarding the nature of the telephone numbers reached. The interview explicitly confirmed whether the number reached was residential, business, or some other category. The informant interview then went on to collect some additional information about the nature of the phone number and the household. This brief interview allowed us to expand on the information collected by Fahimi, and to delineate a major difference in the estimated distribution of household phone numbers between “listed” and “unlisted” telephone bank strata.
When contact was made at a sampled number, the maximum number of attempts was increased from 11 to 20 in order to complete the informant screening interview. The samples were drawn from frames that represented the May 2008 Telcordia file and June 2008 list-assisted frame. The samples were later matched to the most recent Telcordia file and list-assisted frame in order to append current Telcordia and list frame information for analysis. The survey was conducted between July and October 2008.
Findings on Household Coverage in Listed and Unlisted Banks
The vast majority of all numbers dialed were classified as non-residential or “bad numbers” prior to the interviewer administered screening question. This ranges from 58% bad numbers in the 1+ listed stratum to nearly 90% bad numbers in zero banks and the remaining OPOTS strata. Not surprisingly, virtually all of the numbers (99%) dialed in the non-Telcordia banks were bad numbers (Figure 2).
There were a total of 522 numbers (1.4%) that were “no answer” on each of 11 attempts over the course of several weeks of interviewing. The number of permanent no-answers was somewhat higher in the listed banks (285) than the zero banks (108), the OPOTS (128), and the non-Telcordia strata (1). These “permanent no answers” are likely to be a mix of unassigned numbers, unattended numbers (e.g., public phones, seasonal or unoccupied locations) and systematically unanswered numbers (e.g., screening by Caller ID).
We allocated the permanent no-answers by stratum, proportionally to the ratio of “presumed good numbers” and known “bad numbers”. Excluding the permanent no-answers from the total sample, the proportion of presumed good numbers was 40% in the listed stratum, compared to 12% in the zero banks and OPOTS strata and less than 1% in the non-Telcordia stratum. We used this proportion to allocate the permanent no answers between the bad numbers and the presumed good numbers by stratum. The estimated good numbers from the permanent no answers were added to the other presumed good numbers to yield an estimated number of potential residential numbers per stratum.
The presumed good numbers were dialed up to 20 times in order to classify them as household or non-household. The basis of the classification was an interviewer administered question to anyone answering the phone: “Have I reached a private residence?” The responses were: Yes, private residence or household; No, business; No, dormitory or group home; No, other; or Refused. A case was treated as interviewer resolved when an interviewer obtained a response to this question from a live informant at the number.
Among the interviewer resolved cases in the listed banks, the vast majority (73%) were private residences, while only 25% were businesses. By contrast, the vast majority of interviewer resolved numbers were businesses in both the zero banks (89%) and the OPOTS banks (86%). Only 5.3% of interviewer resolved numbers in the zero banks and 7.4% of resolved numbers in OPOTS were households. By contrast, while there were very few potential household numbers in the non-Telcordia stratum, 50% of the resolved cases were households. Most of the “other, non-household” responses in all four strata could be classified as business (e.g., police station, hospital, military base, conference line, etc.) or group home (e.g., college, school), while none of them were households (Figure 3).
That the difference between the listed and unlisted strata is large is not surprising here, especially for the zero banks. These are frequently banks for which active phone numbers exist but they are not listed as residential (and probably not listed at all). They are primarily business numbers, which are in fact sold in blocks of 100 (or 1000) by telephone companies as “direct inward dial” (DID) to give individual direct numbers to workers served by a company telephone system. In large part, such numbers are not answered with a company (or even departmental) name, but as an individual. They will only be identifiable as business numbers if directly asked, as we found by asking the type of phone in the interviewer resolution process.
There were a total of 6,346 presumed good numbers out of the total sample of 37,175. In addition, another 143 out of the permanent no answers were allocated as estimated good numbers. Hence, there was a total of 6,489 potential household numbers out of the initial sample of 37,175 numbers that needed to be resolved by interviewer screening.
Among these potential residential numbers, we were able to positively resolve the household status on the basis of the interviewer administered question in more than 2 out of 5 cases. The resolution rate among potential residential numbers was approximately the same for the listed banks (42%) and the zero banks (43%), but somewhat lower in the OPOTS (36%) and non-Telcordia (30%) numbers. The unresolved numbers after interviewer screening represented 23.4% of the numbers in the 1+ RDD sample, but only 6.7% of the numbers in the zero banks, 7.7% of the numbers in the OPOTS, and 0.6% of the numbers in the non-Telcordia sample. In total, 5.5% of the numbers were unresolved in the unlisted banks after interviewer screening.
The standard approach to estimating the number of eligible households in the sample for purposes of response rate calculation is to apply the eligibility rate in the resolved cases to the unresolved cases. In the case of the listed banks, 72.7% of the 1700 resolved numbers were determined to be households. When this eligibility rate is applied to the 2,344 unresolved cases in this stratum, we would estimate that 1,704 would be eligible households. Combining the 1,236 known households and the 1,704 estimated eligible households in the unresolved sample yields an estimated household rate of 29.4% in the 1+ listed household stratum (Figure 4). This household rate is consistent with the 28% to 29% residential hit rates reported in the 2007 National Household Education Survey and the 2006 National Immunization Survey, which used national 1+ list-assisted telephone surveys (Fahimi, Kulp, and Brick 2008a, 2008b).
Using the identical procedure, we apply the 5.3% household rate in the 508 resolved cases in the zero banks sample to the 672 unresolved cases. This yields an estimated 36 households in addition to 27 resolved households. The total number of actual and estimated households in the zero banks is 63 out of 10,000 numbers. Applying the same procedure to the OPOTS, we find a total of 89 actual and estimated households out of the 10,000 numbers dialed. Finally, the total number of actual and estimated households in the non-Telcordia sample is 33 out of 7,175. Since we used the identical procedures for estimating the number of households in the zero banks, OPOTS and non-Telcordia strata that produced the expected rate in the listed banks, we believe these estimates are credible.
When the estimated household rate of 29.4% is applied to the population of 281,647,100 telephone numbers in hundreds banks in the listed stratum, it yields an estimated 82,804,247 eligible household numbers.
When the estimated household rate of 0.63% in the zero banks is applied to the 256,260,000 telephone numbers in hundreds banks in the stratum, it yields an estimated 1,614,438 eligible household numbers in those banks.
When the estimated household rate of 0.89% in the OPOTS banks is applied to the 228,633,000 telephone numbers in hundreds banks in that stratum, it yields an estimated 2,034,834 eligible household numbers in those banks. Finally, the same procedure yields an estimated 692,981 eligible banks in the non-Telcordia banks.
Thus, our findings suggest that 82.8 million household numbers out of a total of a potential 87.1 million household telephone numbers are located in 1+ listed hundreds banks. Consequently, these findings suggest that approximately 95.0% of working residential telephone numbers in the United States are found in 100 series banks with one or more listed number. By contrast, 5.0% of working residential landline telephone numbers are located in zero banks, OPOTS and non-Telcordia banks, and hence would be excluded from any sampling frame based on listed hundreds banks (Figure 5).
Database Matching for Resolved and Unresolved Numbers
The Fahimi study conducted nine dialings of sampled numbers as the first step in determining the household status. It is not clear from the paper whether a formal screening assessment similar to our screening question was administered or whether interviewers only classified numbers as businesses/non-residential on the basis of telephone responses, (e.g., “This is Acme Construction. How can I help you?”). It is also not clear how answering machines and voice mail were handled for classification purposes during the dialing. However, the cases that were unresolved after the initial dialings were subsequently compared to two commercial data bases to improve the resolution rate.
Although we believe that an extended interviewer administered screening is the more reliable approach to determining household status, we also submitted our resolved and unresolved telephone numbers to a commercial database match process for comparison purposes. The matching process that we used was Allant’s Prevalence Reverse Append (PRA). They report over 640 million feed records per month from more than twenty sources, including telecommunications carriers, caller ID providers, directory compilers, major consumer marketing companies, data compilers and directory assistance providers. They claim 60 million records not found in white pages or Directory Assistance sourced databases. The unresolved and resolved telephone numbers from all four strata were submitted to this database search for consumer name.
Among the 2,661 interviewer resolved households, approximately 38.0% were found to have complete or partial secondary matches on consumer names in the PRA search. There were 988 consumer matches in the resolved listed bank numbers (58.1%). This compares to 1,236 households identified by interviewers in those numbers. There were 11 consumer matches in the resolved zero bank numbers (2.2%), compared to 27 households identified by interviewers in those numbers. There were 8 consumer matches in the resolved OPOTS numbers (1.8%), compared to 32 households identified by interviewers in those numbers. Finally, there were 4 consumer matches in the resolved non-Telcordia numbers (20.0%), compared to 10 households identified by interviewers in those numbers (Figure 6). In short, the database match yielded fewer households compared to interviewer resolved cases in all strata, but the ratio of interviewer to database identified households was much higher in the unlisted banks.
Among the 3,685 unresolved households (not counting the permanent no answers estimated to be good numbers in Figure 4), approximately 36.7% were found to have complete or partial secondary matches on consumer names in the PRA search. The proportion of telephone numbers with consumer matches is actually slightly higher in unresolved cases (59.8%) compared to resolved cases (58.1%) in the listed banks. By contrast, the proportion of matched consumer names is higher in resolved than unresolved bases for numbers from zero banks (2.2%–1.8%), OPOTS (1.8%–0.5%), and non-Telcordia (20.0%–6.5%) strata. So, while the proportion of households among resolved numbers appears to be a relatively good predictor of the proportion of households among the unresolved households, based on consumer name matching comparisons between the two samples, it may somewhat overestimate the number of households in unlisted banks compared to listed banks.
Since our interviewer administered screening achieved a higher household rate for resolved numbers and a higher estimated rate for unresolved numbers than the database matching, we believe it is a more reliable indicator. Indeed, since many of the sources for the database matches come from published directories, there is a bias against households in the unlisted banks. If we had used this approach rather than interviewer administered screening or to estimate the rate in unresolved numbers, then the difference between our estimates of the number of households in unlisted banks compared to Fahimi would be even greater.
This project was undertaken to confirm and expand the findings of Fahimi and his colleagues that approximately twenty percent of residential landline household numbers were not covered by the current practice of using hundred bank series with one or more listed numbers. We hoped to test whether transferred numbers (or phantom numbers) or other household numbers in listed banks might mitigate the apparent problem. We also wished to examine the differences in households in listed and unlisted banks to determine the amount of bias associated with the exclusion of unlisted banks.
However, our findings suggest a much smaller coverage error (five percent) from the exclusion of unlisted hundreds banks from RDD landline sampling frames than reported by Fahimi and his colleagues. Our five percent non-coverage is only slightly higher than the proportion of households (3.7%) found in unlisted banks in the 1995 study. Although we used different sources for the sampling frames, our population counts for the total number of banks and the banks with 1 or more listed numbers are almost identical. Our findings on the household rate in the listed banks are similar and consistent with the literature. We used the same procedure for estimating the number of households in the unlisted banks that we used to estimate the household rate in the listed banks. Consequently, we believe that it is more likely that the difference in the estimates of the number of households in unlisted banks between the two studies is a result of the procedures for estimating residential households, than the sampling frame or sample.
These findings would seem to support the continued viability of list-assisted RDD sampling in the design and conduct of telephone surveys, at least in terms of coverage error. However, given the difference in the findings of these two large-scale studies, and the absence of any “smoking gun” that would explain the differences, additional research on this issue is needed. In the meantime, however, we believe that it is not necessary to abandon listed hundreds banks for listed thousand banks, or list-assisted RDD sampling altogether, until this difference is resolved to the satisfaction of the telephone research community.
A longer version of this article was subsequently published in Public Opinion Quarterly.