Design Considerations for Live Video Survey Interviews

Michael F. Schober; Frederick G. Conrad; Andrew L. Hupp; Kallan M. Larsen; Ai Rene Ong; Brady T. West

doi:10.29115/SP-2020-0014

Even before the COVID-19 pandemic, survey research that relies on in-person interviewing had been facing significant challenges: declining response rates, increasing costs, waning trust in survey organizations, and trends toward increasing remote and mediated interaction among the public (see, e.g., Schober 2018). With the advent of COVID-19, a number of survey researchers—particularly those who carry out longitudinal studies that provide essential information for public policy and decision-making—are quickly needing to consider alternatives to in-person^[1] interviewing for collecting data. The potential for live video interviewing in qualitative research (e.g., Janghorban, Roudsari, and Taghipour 2014), telepsychiatry (e.g., De Las Cuevas et al. 2006), and medical intervention research (e.g., Marhefka, Lockhart, and Turner 2020) has been under investigation for some time, but its potential for large-scale surveys has been explored to a more limited extent (see, e.g., Anderson 2008; Endres and Hillygus 2019; Jeannis et al. 2013). The increase in use of live video calls and meetings during the pandemic—at least among members of the public who have access to the technology and sufficient connectivity—makes it particularly timely to consider whether and when live video interviews might plausibly substitute for in-person survey data collection today.

In preparing to collect data for a methodological experiment comparing data quality in live video interviews and two self-administered survey modes (Conrad et al. 2020), we encountered several choice points and new issues that live video interviewing raises. This study was designed to compare response quality and respondent subjective experience in live two-way video interviews, recorded-video “interviews” in which video recordings of interviewers asking questions are embedded in a self-administered online survey (Fuchs 2009; Fuchs and Funke 2007; Haan et al. 2017; Krysan and Couper 2003), and a typical textual web survey. In this study, fielded from August 2019 to March 2020, a total of 1,104 US-based online panel members completed interviews in which they were randomly assigned to answer the same 36 questions borrowed from ongoing US government and social scientific surveys in one of three modes. Items were selected to allow measures of conscientious responding: giving precise (vs. rounded) answers, differentiating answers to batteries of questions (vs. straight-lining), and providing socially undesirable responses to sensitive questions (e.g., about participants’ sexual behaviors).

Based on our experience in designing this study, collecting the data, and learning from eight seasoned professional interviewers about their experience conducting from 27 to 40 live video interviews each, we summarize our take on design considerations and practical questions for researchers who are interested in carrying out live video interviews. We first address questions about respondent access and participation, video platform(s), and recruiting respondents and scheduling video interviews. We then address design considerations involving interviewers: their screen configurations, their visual and auditory environment, and how they should be trained to handle the special requirements of video. As we see it, all these issues need to be considered simultaneously; choices on one front constrain options on others. For example, choosing a platform that only some members of the public have access to (e.g., FaceTime, which will only work for Apple device users) necessarily limits researchers’ ability to generalize beyond those users and also shapes how interviewers need to be trained, what devices interviewers can use, and whether a virtual background is an option.

Respondent considerations: Who has access and who will participate?

Not all potential respondents have access to video communication: respondents must have a stable internet connection and a computer or mobile device with a working camera and microphone. Furthermore, not all potential respondents with video access are necessarily willing or comfortable to participate in a video interview, nor to go through the extra steps (e.g., a video connection test or downloading a new app) that participating in a video interview might require. Whether a study should restrict its population to only those with the right equipment and connectivity, provide the needed equipment to those without it, or allow video as an option for those who have access raises various questions: scientific (e.g., might coverage or nonresponse error bias estimates? will video interviews produce the same kinds of interviewer effects observed in in-person settings [West and Blom 2017]); ethical (e.g., is it fair to exclude those without the needed resources, or to burden respondents who are uncomfortable using video?); and budgetary and logistical (e.g., is it feasible to provide the needed equipment to those without it?).

Questions about access to video interviewing are more complicated than they at first seem, and perhaps newly so in the current moment: while video clearly excludes some populations, other previously excluded populations may now be included. Some people who are chronically or newly unwilling to participate in an in-person interview might now agree to participate via video. Some people who need sensory assistance may find advantages in video over other modes—for example, making it possible for a physically distant interviewer and deaf respondent to communicate by signing (unlike on the phone), or allowing a respondent to unobtrusively increase the volume without the potential awkwardness of having to ask the interviewer to speak up. Also, some participants who are not native speakers of the survey language may benefit from having visual access to the interviewer (Wenz, Al Baghal, and Gaia 2020).

To our knowledge, studies that directly address these scientific, ethical, and logistical questions have not yet been conducted. In our study, we found it took longer to recruit online panelists who were used to doing web surveys into our live video interviews compared to our two self-administered modes, but it is not yet clear how this pattern will extend into the current era in which live video communication is so widespread for at least some members of the public.

Video platform considerations: Which one and how many?

A consequential decision is which video platform(s) (e.g., Zoom, Skype, Microsoft Teams, BlueJeans, FaceTime, WhatsApp, WeChat) will be supported, as this decision has important implications for what interviewers and respondents will need: which devices (laptops, tablets, phones); operating systems; browsers or apps (that might need to be downloaded to participate in a study); cameras; or headsets. The decision also affects how interviewers will need to be trained and which respondents can easily participate in the interview. For example, including FaceTime allows sample members who only use FaceTime to participate but excludes Windows and Android users if FaceTime is the only option, and does not easily allow web-based scheduling. Requiring respondents to download an unfamiliar app could reduce participation in biasing ways.

Selecting a single platform (the choice we made in our study) is simpler for survey organizations—and for respondents comfortable with the chosen platform—but it may cause problems for respondents unfamiliar with that platform. Supporting multiple platforms has the benefit of reducing barriers to participation (more respondents are included) but it also requires interviewers to provide broader technical support to respondents, and back-end designers to ensure compatibility with the survey organization’s sample management systems and survey software. Platforms vary in their usability and who their users are, their costs (must additional licenses be purchased?), how scheduling interviews might work, how similarly their desktop and mobile variants function, whether they support virtual backgrounds, and whether they support video/audio recording for training, quality control, and/or transcription (if a project requires this and respondents consent). Supporting multiple platforms might lead to a proliferation of operational decisions (what if one platform supports video recording and another does not?) and potential data-analytic complexities (e.g., should one adjust estimates to account for platform effects?) on all these fronts. In our case, we chose a single platform that allowed desktop respondents to participate from their browser without being required to download an app, which also simplified interviewer training. We speculate that providing more, and more familiar, platform alternatives might have speeded up recruitment, but discovering whether this is the case will require systematic testing.

Another consideration in selecting a platform is how well it supports universal use—for example whether it supports closed-captioning for those who need it or lip-reading or hands-free use.

Recruitment considerations: Scheduled or on-demand interviews?

Video interviewing raises new questions about how respondents should be recruited and able to join interviews. Expecting respondents to be willing to accept an incoming video call without prior warning or scheduling–the video analog to a telephone cold call or doorstep invitation–seems unlikely to be workable; most respondents probably will not accept incoming calls from strangers, and researchers probably will not be able to assemble relevant cold-call video contact information such as a platform-specific username (e.g., for Skype) or a phone number that allows video contact (e.g., for FaceTime). Two plausible procedures are (1) scheduling video interviews in advance for a particular time slot (the procedure we used in our study) or (2) assigning video interviewers to be available to conduct on-demand video interviews initiated by the respondent, most likely during particular hours. Either way, interviewers do not need to recruit sample members or engage in refusal conversion; all interviews will be with respondents who signed up for an appointment, which is an advantage over typical in-person or telephone interviews.

Scheduling interviews—which is not a routine practice in other interview modes—can be initiated by either the researchers or the respondents: researchers could propose to a sample member (e.g., through an email or postal mail invitation) one or more upcoming time slots for a video interview, or sample members could choose a time slot available to them on a calendar. For survey centers, this approach facilitates supervision and advance scheduling of interviewers, though it brings along with it a potential for no-shows and inefficient use of interviewer time that would not typically occur in other interview modes.

In the on-demand approach, interviewers would be available (on standby) during designated time slots for respondents to join a video call. While this approach may well be attractive for respondents whose schedules match the available interview times, it will require careful coordination and oversight by the researchers (e.g., adaptive shift scheduling models based on observed patterns) to assure that enough interviewers are available to meet the demand at any moment. Researchers will need to avoid, on one extreme, respondents having to wait an unreasonable time in a queue, and on the other extreme interviewers being on the clock without respondents to interview. A simple solution might be for interviewers to carry out tasks from other projects while waiting.

Based on evidence from studies in other modes (e.g., McGonagle and Sastry 2019) and our own experience with video interviews, we suspect that the scheduling approach may work better for participants who have already agreed to participate in an ongoing study than for newly invited sample members in cross-sectional studies.

Interviewer considerations: How should the screen(s) be configured?

Video interviewers need to be able to see the respondent (and vice versa) and interact with survey software to read questions and record answers (unless they are using a paper questionnaire). Until custom video interviewing software is designed, this in most cases will mean having one video window open as well as another window—on the same or a second screen—for the survey software. An important design consideration is where these screens and windows should be placed and sized relative to each other and to the camera that is transmitting the interviewer’s image: should the interviewer appear to the respondent to be looking “directly” at them while reading the question (because they are looking more or less directly into the camera), or look “up” at the respondent (away from the survey software) while the respondent is answering, as is common in in-person interviews? We chose the latter in our study, in which our interviewers used one desktop screen, and we placed the survey software (Blaise 5) window below the video (BlueJeans) window, but some interviewers reported having wished for the former.

Different solutions may be appropriate for different studies—where an interviewer is looking during responses to sensitive questions may matter in ways that are less of a concern for nonsensitive questions—and so testing the feel that alternative placements create before launching a study is likely to be time well spent. The size of the interviewer’s screen(s) and the interviewer’s distance from the camera will affect the respondent’s experience of where the interviewer is looking, as well as how large and “zoomed-in” the interviewer’s image will loom in the respondent’s view—and this can vary depending on the video platform. For in-person interviews that depend on show cards, video screen sharing is a plausible adaptation, although other features of video platforms might also fulfill the same functions—for example, sending response options in the platform’s chat window. Because different platforms implement screen sharing differently, if at all, and screen sharing can change how the two parties appear to one another (e.g., the interviewer may now appear as a thumbnail video, or disappear), additional testing will be needed to decide among the alternatives. Screen sharing may work less well on small screens for respondents on mobile devices.

Interviewer considerations: Visual background and auditory environment

Video interviews raise new questions about how standardized an interviewer’s background visuals and auditory environment need to appear to respondents and about the extent to which interviewers’ settings may (intentionally or not) give cues about the interviewer or the survey organization that could bias responses. Even in a call center, video can enable respondents to see what else is going on in the survey center—e.g., who is walking by, what the center looks like—in addition to the background sounds that telephone respondents can experience. In our study, we chose to standardize the background for our call-center interviewers by conducting all interviews from the same workspaces (cubicles) with neutral tones and in an area with little traffic, but one could also imagine using a standard virtual background, perhaps even with the organization’s or sponsor’s logo.

If video interviewers are distributed (interviewing from home, with appropriate technology and connectivity), this raises questions about what kinds of home or outdoor settings are appropriate for a study—what potentially biasing signals they send about the interviewers’ or the researchers’ political affiliation, cultural background, socioeconomic status, family structure, etc. The interviewer’s environment could also lead to unexpected distractions during an interview, for example a child walking in or a dog barking, which presumably ought to be minimized. Potential solutions include standard backgrounds (virtual or even physical) and noise-canceling audio equipment.

Interviewer considerations: Training

Few survey interviewers are likely to already have much experience collecting data via video. The most plausible scenario is that survey organizations will retrain interviewers who currently conduct telephone or in-person interviews to use video. In the case of our study, we recruited experienced telephone interviewers who had previously undergone training in standardized interviewing (the University of Michigan’s General Interviewer Training) to conduct our video interviews—because they were available and our budget did not support employing field interviewers.

Video interaction is different from voice-only interactions, in that interviewers’ facial expressions and visual reactions can be seen by the respondent and potentially affect their answers. It is also different from physically copresent face to face interaction, in that it is mediated, with closer-up views of the interlocutor’s face than in most in-person settings, often a self-view window, and the need to attend to camera positioning to provide evidence of gaze. Sensitizing interviewers to how their gaze, facial expressions, and reactions might be perceived by respondents in video—perhaps via platform-specific practice sessions—seems crucial. If screen sharing is needed in a study, interviewers will need to be trained in the correct use of screen sharing tools; if virtual backgrounds are used, interviewers may need training on how not to “disappear” into the background while asking questions.

Professional behavior. Interviewers will need guidance on what their organization sees as appropriate professional behavior in video interviews, for example when it is okay for them to take a break, to address a family interruption, or to eat or drink during an interview. They will need guidance on appropriate professional attire, especially if like most telephone interviewers they are not used to being seen during interviews. Interviewers will need guidance on handling respondents’ distraction, whether or not it is acceptable for respondents to turn off their cameras, and how to react if respondents behave in ways the interviewer might find inappropriate (e.g., being less than fully clothed). Interviewers should understand when and how mandatory reporting laws may apply in video, for example if they observe that someone in the household is in danger. The logistics of monitoring video interviews for quality control will also need to be thought through: is full video recording practically and ethically feasible? Would audio-recording or supervisor monitoring (virtual or in-person at the call center) be preferable? How would such monitoring affect participation and data quality?

Technical problems. Interviewers also need to be trained on how they should handle technical problems that will inevitably arise in video-mediated interviews: what level of assistance and troubleshooting they are expected to provide and when they can and should enlist others with more technical skills. In our study, we trained interviewers on a small set of frequently encountered issues and fixes that often work (e.g., advising the respondent to check that the microphone or camera was properly connected, to exit and reconnect to the video call to eliminate an echo, or even to fully reboot their device), but we also provided protocols for soliciting additional technical support if a connection failed and for how an interview that encountered technical failure could be rescheduled. It seems likely that interviewers who use video more—and perhaps even the video platform(s) used in a particular survey—will be able to handle technical problems more easily with less training.

Conclusion

As we see it, video communication is here to stay, and we expect that it is likely to have an ongoing role in personal interactions, education, and remote work for the foreseeable future. As such, we expect that survey researchers will continue to have good reasons to seriously consider live video for survey data collection. Much will be learned from currently planned efforts to supplement or replace in-person interviewing in large-scale surveys^[2] about which implementations are practical and produce data of the quality provided by in-person interviews. Lessons learned from these efforts will inform how survey researchers think about how and when live video might fit into the suite of data collection modes they support moving forward—whether as one of several modes in mixed mode studies or as the only mode.

We do not imagine that our list of design considerations is exhaustive; new considerations are likely to emerge as video technologies develop, as access to video changes, and as norms of and expectations about video use—which may vary for different populations—evolve. Our anecdotal experience is that many people who until recently would have been uninterested in or overwhelmed by the prospect of trying out video communication have gotten used to the idea—whether from professional necessity or their personal experiences with family and friends—but it is unclear how evolving experiences and norms from business meetings or classrooms or casual conversations with friends will apply to survey interviews. It is also clear that there are massive inequities in who has access to reliable internet connections and devices on which video can work, as well as in comfort with the technology and the ability to troubleshoot when there are problems. For survey researchers interested in exploring video data collection, these inequities raise serious questions about potential coverage and nonresponse error.

Far more methodological research will be needed to verify when and how video survey interviewing is viable for which populations, and which features have which effects (see, e.g., Feuer and Schober 2015, 2019; Sun and Conrad 2019). It will be important to evaluate when and how findings and advice from qualitative video interviewing research, for example on participant recruitment and data quality (e.g., Forrestal, D’Angelo, and Vogel 2015; Janghorban, Roudsari, and Taghipour 2014; Lobe 2017), apply to structured interviews to collect quantitative data, particularly as video and communication technology use have so radically changed. We see attending to the design considerations listed here—and documenting the effects of different design choices—as complementing evolving guidance on the use of live video in other domains of research (e.g., Marhefka, Lockhart, and Turner 2020) and as critical to discovering which implementations of video interviewing lead to the best data quality, cost effectiveness, and the best respondent experience.

Acknowledgments

We gratefully acknowledge financial support from NSF grants SES-1825113 and SES-1825194 (Methodology, Measurement, and Statistics program) and NIA grant P30 AG012846 (Michigan Center on the Demography of Aging), and access to participants through NCATS grant UL1TR002240 (Michigan Institute for Clinical & Health Research). We also thank Pooja Varma-Laughlin and nine interviewers from the University of Michigan Survey Research Center for conducting video interviews, allowing us to observe them and providing feedback.

We use “in-person” for physically copresent interviews rather than the long-standing label “face to face” (e.g., Dialsingh 2008; Sturgis et al. 2020; Wenz, Al Baghal, and Gaia 2020; Williams and Brick 2018) to reflect the fact that live video also involves faces in the interaction and may even emphasize participants’ faces more than occurs in person.
e.g., https://electionstudies.org/announcement-to-the-american-national-election-studies-anes-user-community and https://the-sra.org.uk/common/Uploaded files/Research Matters Magazine/sra-research-matters-june-2020-edition.pdf