The Evolution of Audio Recording in Field Surveys

M Rita Thissen; Sridevi Sattaluri; Emily McFarlane; Paul P. Biemer

doi:10.29115/SP-2008-0018

By taking advantage of new technology, survey managers can boost the effectiveness, efficiency and quality of data collection. One such technology is computer audio-recorded interviewing (CARI), a way to ensure the quality of data through unobtrusive digital recording. This article reviews the evolution of CARI technology, with an emphasis on its feasibility for routine use with field surveys.

Traditional methods of monitoring field staff include live observation or re-contacting the respondent to confirm the interview’s authenticity and inquire about the professionalism of the interviewer. With CARI the process is much easier. Sound files can be created electronically without the need for external equipment and can be transmitted along with the usual response data files. Because the recording process is “invisible” once consent has been given, it can provide a faithful representation of the reality of in-person data collection.

Audio recording supports quality control for field surveys (Biemer et al. 2000) and telephone surveys (Basson 2005), including those invoking web instruments (Suresh 2005). The technology provides a potent tool for deterring and detecting falsification, providing performance feedback and enabling study of questionnaire item effectiveness.

Audio Recording Equipment, Past and Present

From the marketing of the Dictaphone in 1907 (Nuance Communications Inc 2005) to the availability of miniature recorders embedded in portable electronic devices today (Dwyer et al. 1998), people have been using audio recording to capture voices for later review. While the early recorders helped journalistic interviews, they were not usable for large-scale research surveys. The introduction of cassette tapes improved convenience (Stockdale 2002) but introduced logistic problems and disrupted the flow of interviewing.

As computers began offering built-in sound cards, digital audio recording became feasible for field surveys. In 1999, CARI was first deployed on a national field survey (Biemer et al. 2000), the result of innovative technical work by RTI developers R. Suresh, A. Bethke and P. Cooley. Use of CARI has spread since then as the utility of the approach has been confirmed.

Many laptops now have built-in microphones, sound cards and more disk space. Recording with internal microphones requires no additional equipment, and offers no distraction. Feedback from respondents and interviewers indicates that most people forget about recording when the microphone is hidden (Biemer et al. 2000). Depending on microphone placement and laptop settings, audio fidelity from internal microphones is adequate to capture voices within about 8 feet of the laptop at a quality level that allows a listener to distinguish multiple voices and discern the spoken content.

Benefits of Audio Recording for Quality Assurance and Questionnaire Evaluation

Among the advantages offered by CARI, perhaps the most compelling are to confirm the authenticity of data for a reduced cost compared to traditional verification methods and to provide detailed “observation” of interviewer performance (See Figure 1). CARI can act as a deterrent to curbstoning and a means of detecting poor interviewing technique. The presence of CARI might reduce cheating if interviewers are aware of being recorded.

Table 1 Performance issues found in review of 5600 cases.

Count	% of Cases	Problem Definition
13	0.2	Authenticity Questionable
217	3.9	Reading – Minor Deviation
72	1.3	Reading – Major Deviation
73	1.3	Recording Errors
44	0.8	Unprofessional Behavior
86	1.5	Inappropriate Probing
79	1.4	Feedback not Neutral
1	0.01	Incorrect Incentive Provided

Using CARI, a survey may reduce effort and costs. CARI monitoring may replace verification calls or re-interviews. It reduces respondent burden and allows confirmation of interviews from households which lack telephones or are hard to contact. However, it remains desirable to follow up a sample of the cases since some respondents may refuse to allow audio recording after consenting to the interview, and interviewers may take advantage of that option to prevent detection of poor interviewing habits or curbstoning.

Another benefit of CARI is to provide a method for identifying questionnaire problems and data collection difficulties in interviewer-respondent interactions. CARI offers a unique opportunity to listen to the interview exactly as it took place, without the interference of personal observation. Using CARI allows questionnaire specialists to evaluate the success of the survey items in eliciting the desired information and the success of the interviewer in faithfully capturing responses (Mitchell et al. 2008).

Audio File Formats

Many audio file formats have been developed over the years, including wave, MP3, RealMedia, AIFF, CD Audio and others. The sound recording algorithm affects the audio file size, quality, playback software, platform requirements, cost and licensing. The size of a particular recorded file depends on the parameters selected in its creation (see Figure 2). File compression techniques can reduce the space needed to store the audio files.

Table 2 File sizes and quality for uncompressed wav files.

Band-width	Sampling	Chan-nels	Sound Quality	MB PerMin
8 bit	11.25 KHz	1	Low	0.66
16 bit	11.25 KHz	1	Medium	1.31
8 bit	22.5 KHz	1	Medium	1.79
16 bit	22.5 KHz	1	High	1.19
16 bit	44.1 KHz	1	Very High	5.25
16 bit	44.1 KHz	2	Very High	12.3

Integrating Audio Recording with Survey Software and Information Systems

CARI has been implemented on survey instruments in a variety of languages including Blaise (Statistics Netherlands, Statistical Informatics Department, n.d.; Thissen and Rodriguez 2004), CASES (University of California, Berkeley, n.d.; Biemer et al. 2000) and ASP.NET (Microsoft) (Suresh 2005). Many telephone systems used by call centers offer the capability for recording. One of the challenges of incorporating audio recording is to make the process unnoticeable to the interviewer. The recording process must not slow the system or provide any visual or audible clue as to when it starts and stops.

Once a survey instrument has been enabled with CARI technology, survey information systems (Thissen and Rodriguez 2004) must also be expanded to handle the audio data files. From a case management and data security perspective, CARI files are simply response data stored in a different format. The files can be transferred to the central servers using dialup transmission, broadband, or removable media like flash drives. The choice of transmission option may depend on the size of files being transmitted.

For CARI to be used during an interview, most states and countries require that participants give express consent for the interview to be recorded. Because audio files could potentially have personally identifying information, and given the heightened consciousness of confidentiality and security concerns, audio files are best treated as sensitive data. Encryption can be used to ensure protection of the files in transit and storage.

After audio files are received at a central location, the monitoring process may be as simple as opening up the files and making notes. However, manual case management is impractical for all but the smallest of surveys, and it is best to build an interface for reviewing the files and storing evaluations.

Operational Results

In this section, we present a brief discussion of RTI’s experiences with CARI technology.

At RTI, files are recorded with Windows Sound Recorder from Blaise or CASES, resulting in file sizes of about one MB per recorded minute. Use of the LAME compression algorithm (The LAME Project, n.d.) yields an average compression ratio of approximately 11:1 without loss of audio quality, reducing storage to about 100KB. Recording directly to a compressed format makes a more compact file but requires more processing power, producing time lag and visible indication of recording, thus limiting its usefulness.

Audio files collected by RTI were considered adequate in quality if voices could be heard plainly and understood. Problems included background noise, static, faintness of voices, key tapping, hum and other recording problems which interfered with detection of vocal content, but generally the quality was acceptable. See Figure 3.

Table 3 Example of CARI sound file quality distribution

Sound Quality	Number of Interviews
1 – Poor	4
2 – Passable	5
3 – Adequate	21
* – Acceptable	48
4 – Good	49
5 – Excellent	37

*Raters were allowed to leave the score blank for Acceptable.

We estimated the minimum number of CARI audio files required for making consistent monitoring evaluations. 165 interviews were coded by three independent reviewers for a pair-wise comparison of rater convergence, in which each rater listened to recordings from two groups of 55 interviews. From this, we concluded that three audio files each of 30-second duration would be adequate for verification purposes.

A theoretical cost-analysis model was created to compare the expected costs of operating traditional verification processes with CARI systems at the “steady state” in which systems had been implemented already. Analysis suggests that the cost of verification is 10% to 40% lower for CARI than for the traditional approach, depending on levels of traditional and CARI review. Results of production surveys confirm that prediction due to reduced labor.

CARI provides added benefits of reduction in respondent burden, the opportunity to review cases which are too old for re-contact and the option of having multiple reviewers for a questionable case or performance problem.

Visions of the Future

Looking forward, we see expanded use of CARI in field surveys for monitoring survey quality and as an integral part of data collection. Advances in digital signal processing may allow automation of some monitoring activities such as detecting files with just one voice, applying speech analytics or leveraging “pausology” (O’Connell and Kowal 1983). CARI can be used to collect open-ended responses (Mitchell et al. 2008). At some future time, recordings may be transcribed automatically to text for behavior coding. Research is underway on speech-to-text conversion tools in uncontrolled surroundings (Ming, Hazen, and Glass 2006), which may broaden its applicability to include home environments. These research areas will enhance CARI’s usefulness for years to come.

Acknowledgements

The authors would like to acknowledge the work of our colleagues at RTI in the studies mentioned here. A version of this paper was presented at the 62nd Conference of the American Association for Public Opinion Research, Anaheim, CA, 2007.