Transitioning the FDA Food Safety and Nutrition Survey from RDD to ABS

Martine Ferguson; Amy M. Lando; Fanfan Wu; Linda Verrill

doi:10.29115/SP-2022-0003

1. INTRODUCTION

Household probability surveys are useful for monitoring self-reported behaviors and beliefs over time.

As telephone random digit dialing (RDD) surveys suffer from continued and increasingly low response rates (Pierannunzi et al. 2019; Link et al. 2008); increasing costs associated with each interview (Guterbock et al. 2011); and increased field time needed to complete RDD surveys (Guterbock et al. 2011), researchers conducting RDD surveys have been experimenting with different strategies to improve survey participation and quality. One strategy that has been employed by many U.S. federal and state government sponsored health focused surveys is to transition RDD phone surveys to address-based mail or mixed-mode (mail and web) surveys. Some of these include the following: the Health Information National Trends Survey (HINTS; Finney Rutten et al. 2012; Peytchev, Ridenhour, and Krotki 2010), Behavioral Risk Factor Surveillance System (BRFSS) (Link et al. 2006) and the California Health Interview Survey (CHIS; Wells 2020).

While transitioning survey modes have become relatively common in the past two decades, significant care must be taken to understand the potential shifts in biases which may occur because of the change of mode of administration and to determine if survey trends can be accurately reported. The two main sources of potential bias when transitioning RDD to address-based sampling (ABS) are (1) sampling bias, resulting from differences in the makeup of the survey samples and differences in unit (sampled person) non-response between the two modes (RDD vs. ABS) and (2) measurement bias, resulting from the use of different data collection modes: phone interviews (RDD), online responses (ABS-web), and pencil and paper responses (ABS-paper).

This paper systematically examines each form of potential survey bias (see Table 1 for list of biases examined) and serves as a guide to survey researchers, seeking to transition survey modes, using a U.S. Food and Drug Administration (FDA) household probability survey to structure the discussion. We begin with a comparison of sampling bias due to sample composition differences and respondent non-response (section 3.1). We then assess measurement bias by testing for mode measurement equivalence (section 3.2); straightlining (section 3.3); and finally, social desirability and acquiescence (section 3.4). By comprehensively examining biases, researchers can make conclusions about the overall success of the survey transition while also pinpointing specific questions that may require additional work before making trends comparisons.

2. DATA

2.1. Survey populations and sampling methods

Since the 1980s, the FDA regularly conducted national probability, cross-sectional, RDD interviewer-administered surveys to track consumer knowledge, exposure to and understanding of key food safety and nutrition messages, and related reported behaviors. In 2019, the FDA combined questions from the previous surveys and administered the FDA Food Safety and Nutrition Survey (FSANS) as an ABS, self-administered web and paper (paper and pencil) survey.

To assess the effects of the changes in survey administration mode, undistorted by temporal effects, the 2019 FSANS used a mixed-mode, parallel design, with participants assigned to ABS or RDD. This mixed-mode, parallel design provided an opportunity for mode measurement bias testing that few studies offer (Couper 2011).

Questions about food safety, health, and diet were included in the 2019 FSANS. There were two versions for each survey mode—one that focused on food safety and one that focused on nutrition. Two versions were needed to be able to include all questions of interest without burdening respondents with very long survey questionnaires. The RDD and ABS mode were designed to take 15 minutes and 20 minutes, respectively, to complete. The respondents for all versions of the survey were English- or Spanish-speaking non-institutionalized adults (≥18 years old) living in the 50 U.S. states and the District of Columbia.

Prior to conducting the survey, three rounds of cognitive interviews and a pretest were conducted to enhance survey understandability, minimize respondent fatigue, and ensure data integrity (completeness).

For details regarding the RDD and ABS sampling methods please refer to the Supplemental Material (S.1). The RDD data were collected from October 14, 2019, through December 22, 2019, and yielded a sample of 834 respondents, 415 of which were randomly assigned the Food Safety (FS) version, and 419 of which were assigned the Nutrition (N) version. The ABS data were collected from October 1, 2019, through November 2, 2019, and yielded a sample of 4,398 respondents, 2,227 of which were randomly assigned the FS version, and 2,171 of which were assigned the N version.

Survey data were weighted to account for sampling design and non-response. The sampling weights were calculated, separately for the RDD and ABS samples, to control for differential probabilities of selection (within household and across socio-demographic group).

The survey questions can be found in the 2019 FSANS report at https://www.fda.gov/food/science-research-food/2019-food-safety-and-nutrition-survey-report.

3. METHODS and RESULTS

In this section, we present the methods used to assess sampling bias and the results of each. We then present the methods used to assess mode measurement biases (each adjusted for any sampling bias found) and their results. Table 1 summarizes the methods used. Throughout this discussion, unless stated otherwise, data are weighted (i.e., sampling weights have been applied), and unless stated otherwise, all statistical analyses were performed using SAS 9.4 (SAS Institute, Cary, NC) and variances were estimated using Taylor series linearization (TSL). TSL is a method often used for computing the variance of a complex sample by reducing the form of a point estimate to a linear form by applying Taylor approximation and then uses the variance estimate for this linear approximation to estimate the variance of the point estimate (Woodruff 1971).

Table 1.Summary of biases examined.

Bias Source	Bias	Description	In This Paper
Sampling bias	Sample composition	Differences in sample selection are due to the use of different survey sampling frames for RDD and ABS samples (Link et al. 2008). Differences in coverage (i.e., coverage bias) can arise from different modes potentially attracting different kinds of respondents to take the survey (Sterrett et al. 2017).	Section 3.1.1
Sampling bias	Respondent non-response	Potential differences in respondent non-response between modes have to do with potential differences in the number and type of invited sampled persons who chose not to participate in the survey (Groves and Peytcheva 2008).	Section 3.1.2
Measurement bias	Mode measurement unequivalence	Differences due to respondents answering a question differently because of the way the question is presented (Hox, De Leeuw, and Klausch 2017, chap. 23); the differences may also be situational and/or motivational (Hox, De Leeuw, and Chang 2012).	Section 3.2
	Straightlining or satisficing	The tendency of providing satisfactory but not optimal answers to reduce effort (Krosnick 1991). Straightlining, a kind of satisficing, is responding with identical ratings to a series of questions (Zhang and Conrad 2014).	Section 3.3
	Acquiescence bias	The tendency of the respondent to favor the ‘yes’ or ‘agree with’ answer regardless of the content of the question (Heerwegh and Loosveldt 2011).	Section 3.4
	Social desirability bias	The bias caused by participants under-reporting socially undesirable behaviors and/or over-reporting socially desirable behaviors to comply with social norms, is a major source of response bias in survey research (DeMaio 1984; Kreuter, Presser, and Tourangeau 2008).	Section 3.4

3.1. Sampling Bias: Sample Composition and Unit Non-response

3.1.1 Sample Composition—Methods Bivariate logistic regression, unweighted and weighted, was used to test for differential sample composition across mode. Since the ABS-web and ABS-paper were from the same sampling frame, the RDD sample was compared to the combined ABS-web and ABS-paper sample. Mode was regressed on each respondent characteristic: age, race and Hispanic origin, education, gender, income, urbanicity, region and home ownership.

Results: The differences in sample composition are presented in Table 4. Applying the survey sampling weights, successfully corrects for the mode sampling bias for all covariates adjusted for in the weight calibration process—i.e., gender, age, race and Hispanic origin, education, census region and urbanicity. Regarding the respondent characteristics not accounted for in the weight calibration process, income and home ownership, the combined weighted ABS-web/paper sample consisted of more homeowners and more high-income respondents than the weighted RDD sample.

Table 4.Sample composition: comparing ABS respondent characteristics to RDD respondent characteristics.

Respondent Characteristics	Question	Answer	Unadjusted Percentages
			Unweighted		Weighted
			RDD	Combined ABS -web/paper	RDD	Combined ABS -web/paper
Age	In what year were you born? (Converted to age and categorized)	18-30 yrs	14.2	9.2***	23.1	20.3
		31-50 yrs	26.6	26.5	32.2	35.4
		51-50 yrs	18.0	18.8	16.6	17.5
		>60 yrs	41.2	45.5**	28.1	26.7
Race and Hispanic origin	What is your race? Are you Hispanic or Latino? (Aggregated and categorized)	White	67.6	75.5***	65.3	65.4
		Non-Hispanic Black	8.8	7.4	10.3	11.7
		Hon-Hispanic Other	12.1	10.0	8.8	7.3
		Hispanic	11.6	7.2**	15.6	15.7
Education	What is the last grade or year of school that you have completed?	HS or less than HS	28.5	24.2*	40.5	40.0
		Some college	29.3	28.3	30.5	31.0
		College graduate	42.2	47.5**	29.0	29.0
Gender	How do you describe yourself? Male or Female.	Male	51.0	37.2***	48.7	48.7
Gender	How do you describe yourself? Male or Female.	Female	49.0	62.8	51.3	51.3
Income	What was your total household income before taxes during the past 12 months?	Less than $25K	17.8	15.2	22.1	18.0
		$25K - $49,999	23.2	22.9	23.9	25.1
		$50K - $99,999	34.8	32.2	33.8	30.6
		$100K+	24.2	29.8**	20.2	26.2*
Urbanicity	Urban or rural zip code (Mapped using Rural-Urban Commuting Area Codes (RUCA))	Urban	82.0	85.6	83.6	84.9
Urbanicity		Rural	18.0	14.4**	16.4	15.1
Region	Census Bureau-designated regions (Mapped using zip code/state)	Northeast	14.7	19.0**	18.2	17.8
		Midwest	20.5	24.7*	21.6	21.0
		South	36.6	34.4	36.5	37.7
		West	28.2	21.8***	23.7	23.5
Home ownership	Do you: Own your own home, Rent your home, or Have some other arrangement?	No/Other	41.6	26.3***	47.2	37.4**
Home ownership		Yes	58.4	73.7	52.8	62.6

The regression slope was estimated for each respondent characteristic and category (Answer).
Each slope tested for significant difference from 0: (t-test)* p<.05, ** p<.01, *** p<.0001

3.1.2 Unit Non-response– Methods Non-response bias arises when survey respondents have different characteristics than the non-respondents, and those characteristics are correlated with survey estimates. Although, by definition, survey responses are unknown for non-respondents and thus non-response bias cannot directly be assessed, respondent characteristics of RDD and ABS (web/paper combined) were compared to the general U.S. population characteristics, as reported by the 2014-2018 American Community Survey (ACS; United States Census Bureau, n.d.), to ascertain potential non-response bias. Response rates in this paper were calculated using the American Association of Public Opinion Research (AAPOR) Response Rate 3 (RR3) formulation.

Results: The RDD RR3 was 6.6% and the ABS RR3 was 28.1%. There were indications of non-response bias related to age, education, gender, and race/ethnicity for both RDD and ABS (information available upon request). Notably, the RDD respondents trended slightly more male than the U.S. population, while ABS respondents trended more female than the U.S. population. Raking of the sampling weights to the ACS demographic control totals was performed to reduce these observed non-response biases in RDD and ABS.

3.2. Mode Measurement Equivalence

Methods The three modes (RDD, ABS-web, and ABS-paper) were assessed for measurement equivalence (ME). ME (also known as measurement invariance) is defined as ‘‘whether or not, under different conditions of observing and studying phenomena, measurement operations yield measures of the same attribute’’ (Horn and McArdle 1992). ME was assessed using multiple group confirmatory factor analysis (MGCFA) through a series of increasingly stringent models (Table 2), i.e., increasing number of constraints (Martinez-Gomez, Marin-Garcia, and Giraldo O’Meara 2017): (1) configural invariance; (2) metric invariance; (3) partial scalar invariance; (4) strict scalar invariance (Hox, De Leeuw, and Zijlmans 2015); and (5) latent factor variance invariance.

Table 2.Mode measurement equivalence: levels of invariance (sorted from least to most stringent).


Level of invariance	Interpretation	Number of constraints	Constraint
Level of invariance	Interpretation	Number of constraints	Factor loadings equal	Intercepts equal	Residual variances equal	Latent variable variances equal
Configural (base model)	Ascertains whether a relationship exists between observed variables and their underlying latent construct. In other words, is the same general pattern of factor loadings present for each mode (RDD, ABS-web, ABS-paper) (Hox, De Leeuw, and Zijlmans 2015).	0
Metric	Respondents from the three modes attribute the same meaning to the latent variable (van de Schoot, Lugtig, and Hox 2012).	1	x
Partial scalar	Respondents from the three modes attribute the same meaning to both the latent variable (i.e., equal factor loadings) and to the observed survey questions (i.e., equal intercepts) (van de Schoot, Lugtig, and Hox 2012).	2	x	x
Strict scalar	Latent variables are measured with the same precision across mode (van de Schoot, Lugtig, and Hox 2012).	3	x	x	x
Latent factor variance	Latent variables have the same precision across mode (Xu 2012).	4	x	x	x	x

Model fits are generally considered good when the Root Mean Square Error of Approximation is low (RMSEA <.08) and the Comparative Fit Index is high (CFI >.90) (Van de Schoot et al., 2012). Further information about the model fit criteria can be found in the Supplemental material (S.2.1). As Barrett (2007) suggests, we calculated confidence intervals around the fit indices to parallel the logic of statistical inference of the chi-square test.

The five models are tested in sequence, looking for fit non-decreases (i.e., non-worsening) in chi square using the Satorra-Bentler scaled chi-square difference test (SBSD) (Satorra and Bentlee 2001). A non-significant SBSD test accompanied by a drop in CFI of at most .01 and an increase in RMSEA of at most .015 (Chen 2007) indicates the additional constraint will not cause a significant decrease in model fit. The model constraint can then be retained and the level of invariance to which the model corresponds is supported. Otherwise, that level of invariance is not established.

To avoid confounding sample compositional differences with mode measurement bias, the observed survey questions in the ME models were adjusted for mode selection effects using multiple propensity scores (Spreeuwenberg et al. 2010); i.e., adjusted for sampling bias. A discussion of the propensity scores can be found in the Supplemental material (S.2.2).

The manifest survey questions, measured on an ordinal or Likert-type scale, comprising the latent variables for each ME model, Food Safety Perception, Nutrition Awareness, and Food Handling Behavior are presented in Table 3. Figures 1, 2, and 3 illustrate the theoretical constructs of the three ME models. Most latent variables are represented by at least three survey questions. Further details regarding the inclusion of survey questions are in the Supplemental Material (S.2.3).

Figure 1.Measurement model for Food Safety Perception.

Square boxes are observed survey questions, circles are latent variables, (β) are the factor loadings and represent the strength of the relationship between the latent variable and the observed question, (Int) are the intercepts and represent the conditional mean of the observed question when the latent variable is 0, ε is the residual variance and represents reliability, and double-headed single-lined black arrows are residual variances. Each survey question was also regressed on the propensity scores (PS), but to avoid cluttering the plot, the PS regressions are not explicitly plotted. A blow-up example for one of the items is shown.
Covariances between latent variables were assumed and covariances between observed questions, which were suggested either by the context of the questions or by the fit of the model, were included. The covariances are drawn with double-headed double-lined gray arrows.

Figure 2.Measurement model for Nutrition Awareness.

Note: Although the latent variable ‘Calories’ is represented by only two items (only two calorie-specific questions were posed to the respondents), which may result in unstable estimates, the model fit was sufficiently good to proceed with our proposed models (Hemphill 2003).

Figure 3.Measurement model for Food Handling Behavior.

Note: Although the latent variable ‘Hand Washing’ is represented by only two items (only two handwashing questions were posed to the respondents), which may result in unstable estimates, the model fit was sufficiently good to proceed with our proposed models (Hemphill 2003).

MGCFA was implemented using the cfa function in the R lavaan package (Rosseel 2012). Only complete cases (i.e., Blank/Refused excluded) were included in the ME models. Yoon and Lai (2018) indicate that the power to detect a violation of invariance decreases as the ratio of the mode sample sizes increases. The ABS web survey had 1,374 food safety and 1,393 nutrition cases; the ABS paper had 853 food safety cases and 778 nutrition cases; and the RDD survey had 415 food safety and 419 nutrition cases. To overcome this unbalance, a subsampling procedure was employed. One hundred bootstrap samples of size 415 for the Food Safety Perception and Food Handling Behavior ME models and 419 for the Nutrition ME model were randomly chosen from the ABS-web and ABS-paper data frames and the measurement equivalence analysis was run using each of these 100 subsamples and the full RDD sample (Yoon and Lai 2018). Measures of fit (CFI, RMSEA) were calculated for each of the 100 runs and their means and percentile-based 95% confidence intervals (2.5^th percentile, 97.5^th percentile) were then calculated across the 100 runs. The 100 SBSD test p-values were adjusted for multiplicity using the Bonferroni adjustment and the p.adjust function in the R stats package (R Core Team 2019), and the percentage of non-significant p-values (p_Bonferroni>.05) was calculated.

Results: Latent factor variance invariance, the highest form of mode equivalence, was established for the Food Safety and Food Handling models, and partial scalar invariance, the third highest form of mode equivalence, was established for the Nutrition model (Table 5a). The inability to establish strict scaler equivalence for the Nutrition model, suggests difference(s) in variances (precision) of the manifest variables across mode. Table 5b presents the coefficients of variation for the questions comprising the Nutrition model. In general, ABS-paper, and to a lesser extent ABS-web, had higher coefficients of variation than RDD.

Table 3.Survey questions included in mode measurement equivalence, straightlining, acquiescence, and social desirability bias analyses..

Observed Question		Variable Name	ABS Scoring	RDD Scoring	Included in Analysis for
Question Stem (if matrix question)	Question Item				Mode Measurement Equivalence		Straightlining	Social Desirability	Acquiescence
					Latent Variable	ME Model
How likely do you think it is that the following foods contain bacteria or other germs that could make people sick?	Raw chicken	A5a	1 = Don't know 2 = Not at all likely 3 = 2 4 = 3 5 = 4 6 = Very likely	Same as ABS scoring	Specific Food (risk perceptions about specific foods)	Food Safety Perception	Battery Group 1
	Raw eggs	A5Dv1
	Raw vegetables	A5Ev1
	Raw shellfish	A5Fv1
	Raw beef	A5bV1
	Raw fish	A5Gv1
	Raw fruit	A5Cv1
How likely are you to get sick if you ate food that was handled in each of the following ways?	If you forget to wash your hands before you begin cooking.	F10A	1 = Not at all likely 2 = 2 3 = 3 4 = 4 5 = Very likely	Same as ABS scoring	Behavior (perception about risk-related behaviors hand washing & cooking)		Battery Group 2	Battery Group 2
	If you eat raw vegetables that touched raw chicken.	F10B
	If you eat chicken that is not thoroughly cooked.	F10C
	If you eat chicken that was left at room temperature for more than 2 hours after it was cooked.	F10D
How common do you think it is for people in the United States to get food poisoning because of…	the way food is prepared in their home?	A1	1 = Not very common 2 = Somewhat common 3 = Very common	Same as ABS scoring	General (perception about general foodborne illness risks)		Battery Group 3
	the way food is prepared at restaurants?	A2
	food being contaminated with bacteria?	A4
How strongly do you disagree or agree with each of the following statements?	If I eat a healthy diet I can reduce my chance of getting heart disease.	dl_1	1 = Don't know^ 2 = Strongly disagree 3 = Somewhat disagree 4 = Neither agree nor disagree^ 5 = Somewhat agree 6 = Strongly agree	Same as ABS scoring	Healthy Diet	Nutrition Awareness	Battery Group 4	Battery Group 4	Battery Group 4
	If I eat a healthy diet I can reduce my chance of getting cancer.	dl_2
	I am confident that I know how to choose healthy foods.	dl_3
	Eating a healthy diet is important for my long-term health.	dl_4
	Thinking about yourself, about how many calories do you need to consume in a day to maintain your current weight?	CBQ645	1 = Don’t know 2 = Less than 1000 calories, More than 3000 calories 3 = 1000 – 3000 calories	Same as ABS scoring	Calories (caloric intake)			CBQ645
	In general, do you think that you consume too few, too many, or about the right amount of calories?	dietcal	1 = Don't know 2 = Too few/many calories 3 = About the right amount of calories	Same as ABS scoring				dietcal
	Before you begin preparing food, how often do you wash your hands with soap?	D4	1 = Rarely 2 = Some of the time 3 = Most of the time 4 = All of the time	Same as ABS scoring	Hand Washing (during food prep)	Food Handling Behavior		D4
	After you have cracked open raw eggs, what do you usually do?	D11	1 = Continue cooking without washing hands 2 = Rinse or wipe hands 3 = Wash hands with soap 4 = Never handle "Something else" recoded as missing, 1, 2, 3 or 4 depending on the text specification	1 = Continue cooking without washing hands 2 = Rinse or wipe hands 3 = Wash hands with soap 4 = Never handle or Do not prepare at beginning of the meal (raw meat/chicken/fish)				D11
Over the past 12 months, how often did you use a food thermometer to test for doneness when you prepare the following foods?	Whole chickens or turkeys	thermwholechicken	1 = Don't own/Don't know if have or use food thermometer / Never use food thermometer* 2 = Sometimes use food thermometer 3 = Often use food thermometer 4 = Always use food thermometer 5 = Didn't cook food in past 12 months	Same as ABS scoring	Food Thermometer (use during cooking)		Battery Group 5	Battery Group 5
	Beef, lamb, or pork roasts	H8a
	Chicken parts such as breasts or legs	H8b
	Baked egg dishes such as quiche, custard, or bread pudding	H8c
	Hamburgers made from beef	H8d
Have you heard of …	Salmonella as a problem in food?	F1	1 = No 2=Yes	Same as ABS scoring			Battery Group 6		Battery Group 6
	Listeria as a problem in food?	F4
	Campylobacter as a problem in food?	F5
	Norovirus as a problem in food?	F6
	E. coli as a problem in food?	F7
How do you use calorie information when deciding what to order? Do you use it to…	Do you ever use the calorie information on menus or menu boards to decide what to order?	restcal_use	1 = No 2=Yes	Same as ABS scoring			Battery Group 7		Battery Group 7
	Avoid ordering high-calorie menu items.	restcal_avoidhical
	Avoid ordering something that would leave you hungry.	restcal_avoidhungry
	Decide on a smaller portion size.	restcal_smallerp
	Decide on a larger portion size.	restcal_largerp
	Order fewer items.	restcal_feweritem
	Order more items.	restcal_moreitem
	Share the meal with someone else.	restcal_sharemeal
	Save part of the meal for later.	restcal_savemeal

* Blank/Refused (scored as 9) included in the straightlining analysis;
^ Excluded from acquiescence analysis. A battery group is a group of related or connected questions, such as matrix or grid questions.

Table 5a.Assessment of mode measurement equivalence (ME) models (assessed over 100 subsamples): Percent of non-significant (N.S.) Satorra-Bentler scaled chi-square difference (SBSD) tests, mean Comparative Fit Index (CFI), mean Root Mean Square Error of Approximation (RMSEA), drop in CFI and increase in RMSEA.

ME Model	Level of equivalence	% N.S. SBSD p-values^a	Mean CFI ^b	Mean RMSEA ^c	Drop in CFI ^d	Increase in RMSEA ^e
Food Safety Perception	configural		0.86 (.82, .90)	0.10 (.089, .11)
	metric	0.80	0.84 (.81, .89)	0.097 (.086, .11)	-0.02	-0.003
	partial scalar	1.00	0.83 (.79, .88)	0.096 (.084, .11)	-0.01	-0.001
	strict scalar	0.96	0.82 (.78, .87)	0.092 (.082, .10)	-0.01	-0.004
	factor invariance	0.93	0.81 (.75, .86)	0.094 (.083, .11)	-0.01	0.002
Nutrition Awareness	configural		0.94 (.90, .97)	0.089 (.066, .12)
	metric	0.98	0.93 (.89, .96)	0.081 (.062, .10)	-0.01	-0.008
	partial scalar	0.94	0.92 (.87, .96)	0.081 (.063, .10)	-0.01	0.000
	strict scalar	0.67	0.86 (.78, .93)	0.096 (.070, .12)	-0.06	0.015
	factor invariance	0.01	0.75 (.65, .83)	0.13 (.11, .15)	-0.11	0.034
Food Handling Behavior	configural		0.98 (.97, .99)	.053 (.033, .072)
	metric	0.92	0.98 (.96, 1.00)	.049 (.027, .072)	0.00	-0.004
	partial scalar	0.77	0.97 (.95, .99)	.056 (.040, .074)	-0.01	0.007
	strict scalar	0.89	0.97 (.94, .99)	.056 (.038, .073)	0.00	0.000
	factor invariance	0.80	0.96 (.94, .98)	.059 (.041, .077)	-0.01	0.003

^a percent of nonsignificant SBSD tests; ^b average CFIs (95% CI); ^c average RMSEAs (95% CI); ^d drop in average CFI from previous equivalence level; ^e increase in average RMSEA from previous equivalence level.
Equivalence level established if % N.S. p-values >.8, drop in CFI ≤ .01, increase in RMSEA ≤.015, CFI>.90, RMSEA<.08. Greater importance placed on % N.S. p-values.
Full-sample model fits: for all models, p-values for SBSD: <0.0001. Hence, the theoretical SEM constructs are appropriate.
Although the Food Safety CFI of .81 is less than the criterion of .90, the upper 95% bound of .86 was close enough to .9, and the additional criteria of SBSD nonsignificance and low RMSEA were deemed sufficient to establish factor invariance.

Table 5b.Coefficients of variation (95% CI) of observed questions for Nutrition ME model (full sample).

Question	Latent variable	RDD	ABS-web	ABS-paper
dl_1	Healthy Diet	17.6 (16.5, 19)	20.9 (20.1, 21.7)*	24.2 (23.0,25.6)*
dl_2		26.2 (24.4, 28.2)	26.5 (25.5, 27.6)	31.3 (29.7, 33.2)*
dl_3		17.6 (16.5, 19.0)	20.7 (19.9, 21.5)	23.9 (22.7, 25.3)*
dl_4		14.5 (13.6, 15.6)	19.2 (18.5, 20.0)*	22.1 (21.0, 23.4)*
CBQ645	Calories	31.6 (29.5, 34.2)	28.1 (27.0, 29.2)*	34.6 (32.7, 36.6)
dietcal	Calories	24.0 (22.4, 25.8)	26.3 (25.3, 27.4)	30.5 (29.0, 32.2)*

Confidence intervals calculated using the Kelley method (Kelley 2021) and implemented using the cv_versatile function in the R cvcpv package (Beigy 2019). Nonoverlapping 95% confidence intervals indicate significant differences in variability from RDD (denoted by *).

3.3. Straightlining

Methods “Respondents who take “mental shortcuts,” are said to “satisfice,” by which it is meant that they do not (properly) perform all the necessary cognitive steps to answer a survey question” (Heerwegh and Loosveldt 2011). As shown in Table 1, one indicator of satisficing associated with speeding through grid or matrix questions on a self-administered survey is straightlining (Zhang and Conrad 2014). Low within-respondent variability may be a measure of low data quality in the form of straightlining and can be used to assess how attentively a respondent answered the survey (Cernat and Revilla 2020). Straightlining was assessed via the Standard Deviation of Battery Method (SDBM) (Kim et al. 2019). SDBM was calculated for each respondent by “battery group,” which are groups of related or connected questions, such as matrix or grid questions (Table 3). Linear regression was performed to assess the effect of mode on SDBM, while adjusting for covariates gender, race, age, education, income and urbanicity and adjusted means were compared across mode using two-tailed t-tests, for each battery group. Ninety-five percent confidence intervals were also calculated around the adjusted SDBM means assuming normality.

$\text{SDBM}_{g} = \alpha_{g} + \beta_{g}X + \varepsilon\quad \text{where}\ g\ \text{is the battery group,}\ g = 1, 2, ...7$

Results: ABS-web and ABS-paper respondents had a greater tendency to straightline (i.e., lower SDBM) than RDD respondents for battery group 6 only (p_web =.0007, p_paper =.0006) (Table 6). This can most likely be explained by the way the questions were presented. The RDD survey posed each question in battery 6, “Have you heard of…,” individually for each pathogen, whereas the ABS survey posed one mark-all-that-apply question with a check box for each pathogen.

Table 6.Results of straightlining (Standard Deviation of Battery Method [SDBM]), acquiescence, social desirability bias and visual display bias analyses, comparing modes RDD, ABS-web and ABS-paper.

Battery Group^	Straightlining			Acquiescence^＆			Social Desirability Bias ^＆
	Adjusted mean SDBM (95% CI)			Adjusted mean acquiescence (95% CI)			Adjusted mean (95% CI)
	RDD	ABS-web	ABS-paper	RDD	ABS-web	ABS-paper	RDD	ABS-web	ABS-paper
1	1.4 (1.3, 1.5)	1.5 (1.4, 1.5)	1.5 (1.4, 1.6)	N/A			N/A
2	1 (0.9, 1.1)	1 (0.9, 1.1)	1 (0.9, 1.1)	N/A			3.7 (3.5, 3.8)	3.9 (3.8, 4.0)**	3.7 (3.6, 3.9)
3	0.5 (0.4, 0.6)	0.4 (0.4, 0.5)	0.4 (0.4, 0.5)	N/A			N/A
4	0.6 (0.5, 0.7)	0.7 (0.6, 0.7)	0.7 (0.6, 0.8)	5.6 (5.5, 5.7)	5.3 (5.2, 5.4)***	5.2 (5.1, 5.4)***	5.5 (5.4, 5.6)	5.1 (5.0, 5.2)***	5.1 (4.9, 5.2)***
5	0.6 (0.5, 0.7)	0.6 (0.5, 0.7)	0.5 (0.4, 0.6)	N/A			1.7 (1.6, 1.9)	1.9 (1.8, 2.0)*	1.7 (1.6, 1.8)
6	0.8 (0.6, 0.9)	0.5 (0.5, 0.6)**	0.5 (0.5, 0.6)**	1.6 (1.6, 1.6)	1.5 (1.5, 1.5)***	1.5 (1.4, 1.5)***	N/A
7	0.9 (0.8, 1.1)	0.9 (0.8, 1)	1 (0.9, 1.2)	1.3 (1.3, 1.4)	1.2 (1.2, 1.2)***	1.2 (1.1, 1.2)***	N/A
Non-Battery questions
D4	N/A			N/A			3.7 (3.6, 3.8)	3.7 (3.6, 3.7)	3.7 (3.6, 3.8)
D11							2.2 (2.1, 2.3)	2.3 (2.2, 2.3)	2.2 (2.2, 2.3)
CBQ645							2.4 (2.3, 2.6)	2.4 (2.3, 2.5)	2.3 (2.2, 2.4)
dietcal							2.4 (2.3, 2.5)	2.3 (2.2, 2.3)**	2.2 (2.1, 2.3)***

N/A, Did not meet criteria for inclusion in analysis; ^ Please refer to Table 3 for questions included in each battery group.
^& Blank/Refused excluded for acquiescence and social desirability bias. Don’t know/Neither agree nor disagree further excluded from acquiescence bias.
Higher SDBM adjusted means lower straightlining tendency; higher adjusted means equate to higher acquiescence and higher social desirability.
95% confidence intervals were calculated around the adjusted means assuming normality.
Significant difference from RDD: * p<.05, ** p<.01, *** p<.0001. Acquiescence and social desirability: 1-tailed t-test: (i.e., Ha1: µRDD> µABS-web, Ha2: µRDD> µABS-paper); Straightlining: 2-tailed t-test.

Methods Acquiescence is the tendency of a respondent to select the “yes” or “agree” option regardless of the content of the question or topic (Heerwegh and Loosveldt 2011). Social desirability is the tendency of respondents to overstate positive behaviors and understate negative ones (Andersen and Mayerl 2017). In the context of the FSANS questions, the socially desirable responses were those corresponding to higher levels of knowledge regarding healthful eating or safe food handling practices. Questions were coded or re-coded such that higher scores represented higher acquiescence and higher social desirability (Table 3).

The hypothesis that RDD (interviewer-administered) fostered higher acquiescence than ABS (self-administered) was assessed by calculating the acquiescence for “battery groups” of questions with either a “yes/no” or on a Likert agree scale, namely battery groups 4, 6 and 7 (Table 3) (Kim et al. 2019). “Don’t know,” blanks, “Neither agree nor disagree,” and “Refused” (RDD only) were excluded from the analysis as they are not informative of acquiescence in any direction. Social desirability bias was assessed on battery groups 2, 4, and 5, and non-battery questions CBQ645, dietcal (self-reported calorie consumption, Table 3), D4 and D11.

To assess if RDD respondents presented higher acquiescence or social desirability bias, the average score (value) for each battery group or question was linearly regressed on mode. Because acquiescence or social desirability may be related to respondent characteristics (Heerwegh and Loosveldt 2011), gender, age, race, education, income, and urbanicity were included as covariates in the model.

Results: As hypothesized, RDD fostered higher acquiescence than ABS web and paper (Table 6). RDD also fostered greater social desirability than ABS for battery group 4, the battery comprising the latent variable Healthy Diet, and dietcal.

4. DISCUSSION AND CONCLUSION

To maintain survey quality, many probability-based surveys, including the FDA Food Safety and Nutrition Survey (FSANS), have transitioned from RDD to ABS methodology. Using the 2019 FSANS survey, this paper provides a thorough cross-mode analytical comparison of sampling and measurement biases between the RDD and ABS (web and paper) modes and serves as a guide to survey researchers, seeking to transition survey modes .

The most notable difference between the RDD and ABS sample compositions is home ownership, with the ABS sample comprised of significantly more homeowners than the RDD sample. Since the early 1990s, renters have been much less likely than homeowners to complete mailed Census questionnaires (Word 1997). The ABS paper option was included to encourage renters to respond to the survey, but they were still underrepresented, suggesting that other strategies to oversample or recruit renters may be warranted in future ABS surveys.

Overall, the transition between RDD and ABS was successful. Latent factor variance equivalence, the highest level of mode measurement equivalence examined, was established for the Food Safety and Food Handling models and partial scalar equivalence was established for the Nutrition Awareness model. This demonstrates that these questions were successfully transitioned from an interview administered telephone mode to a self-administered written survey without changing the meaning of the questions or underlying theoretical concepts the questions were measuring.

As expected, there were some minor differences in mode measurement bias between the RDD and ABS survey modes. Consistent with the literature, RDD respondents were more likely to acquiesce and for one group of questions related to Healthy Diet, RDD respondents were also more likely to provide socially desirable answers (Heerwegh and Loosveldt 2011). We hypothesize that for the Healthy Diet questions, the RDD acquiescence bias and the social desirability bias likely worked in conjunction to contribute to the lower variability observed in the RDD ME Nutrition Awareness model, for which only partial scalar equivalence was established.

The many data quality checks that were implemented before and during data collection, including writing questions in as neutral a manner as possible, conducting cognitive tests of the survey instruments, pretesting all data collection methods, and removing respondents who sped through the survey were helpful in achieving mode equivalence.

Finally, in situations where data cycles differ by survey administration mode and bias(s) is(are) found, a trend analysis may still be conducted as long as remedial actions are taken. One action is to include the cycle or survey mode as an indicator variable in predictive models in order to compare variables of interest after adjusting for mode (i.e., Type 3 or Partial analyses). Another action, in cases where strict scalar or latent factor variance equivalence is not established, i.e., the modes have different precision or variability, survey sampling weights can be further adjusted by giving greater weight to the survey mode with higher precision.

If mode equivalence is not established and remedial actions are not implemented, proceeding with a trend analysis may lead to finding differences over time which are not true differences (i.e., false significance) but are instead due to mode invariance. For example, if a mode exhibits greater precision (as in RDD for the FSANS Nutrition awareness questions), differences may be found due to non-homogeneity of variances (heteroskedasticity) and not true differences in means (Frost 2017).

While there were some minor imbalances between the two survey modes, we find that the FSANS RDD telephone and ABS web and paper modes are acceptably equivalent to justify the transition from RDD to ABS and to maintain continuity in tracking trends over time. Our findings highlight the importance of comprehensively assessing survey biases, so that researchers can feel confident about the overall success of the survey transition while also pinpointing specific questions (or topics) that may require additional work before making trends comparisons. Additionally, for researchers planning new surveys, our findings suggest that ABS web and paper surveys offer many advantages over RDD phone surveys, such as higher response rates and lower tendencies for acquiescence.

Funding

The authors are with the Office of Analytics and Outreach, Center for Food Safety and Applied Nutrition, US Food and Drug Administration. This work was funded by the U.S. Food and Drug Administration, Center for Food Safety and Applied Nutrition.

Acknowledgements

The authors want to thank Jennifer Berktold, PhD, Hyunshik Lee, PhD, Michael Jones, Jonathan Wivagg and supporting staff at Westat for their support during this project.

Transitioning the FDA Food Safety and Nutrition Survey from RDD to ABS

Abstract

1. INTRODUCTION

2. DATA

2.1. Survey populations and sampling methods

3. METHODS and RESULTS