Introduction
The American Driving Study (also known as the National Light Vehicle Use Survey) is a nationally representative study that continuously gathers data on the driving exposure of different groups of drivers. The American Driver Study (ADS) is sponsored by the American Automobile Association Foundation for Traffic Safety (AAAFTS) and managed by the Urban Institute with data collected by Social Science Research Solutions (SSRS).
This report includes ADS data collected between May 21, 2013 and December 31, 2015. For completeness, we briefly summarize our ADS approach and protocols. A detailed description of the ADS methods and strategy can be found in the AAAFTS report American Driving Survey 2014–2015.
The ADS is a telephone interview which uses a random sample of both landlines and cell phones. The survey instrument begins with a household roster which is administered to an adult respondent. If the respondent reports that one or more drivers live in the household, the program then randomly selects the driver(s) who are asked to complete the second part of the instrument, the Trip/Driver Interview. The Trip Interview is administered to one or more drivers in the household, determined using a probability procedure that ensures that teenage drivers, drivers over 75 years of age, and those who report driving every day receive a higher chance of being selected.
Table 1 provides detailed information on the number of households that participated in the study, number of persons 16 or older living in these households, and how many of them were drivers. The table also shows how many drivers were sampled to fill-out the trip interview, how many of them finished the trip interview, and how many trips were reported. In addition, the table also provides information on the 2014 through 2015 response rates (using the AAPOR – RR3 formula) and length of the survey. Given data are collected continually throughout the year, we break the data down by seasons.
Table 1 shows we collected a total of 16,130 driving trips, and for most people, we were able to estimate the miles they were driving. However, there were 601 driving trips for which the driver either was not able to provide an estimate of the miles or duration they drove or provided a response where the miles driven was inconsistent with the duration of the trip. The focus of this methods brief is to learn more about the drivers who had difficulty recalling or reporting the miles they drove yesterday and to think about possible questionnaire changes that could possibly improve the quality of the recall data.
Frequency and Impact of Recall Problems
Table 2 shows both the number and percentage of driving interviews and driving trips that required imputation. Figure 1 displays graphically the breakdown of driving reports that required data editing or imputation compared with those did not require any editing or imputation. The data are based the 7,913 American driver interviews in which drivers interviewed for this study completed a 24-hour report of driving trips taken the day before the respondent was interviewed. Eligibility for the driver interviews was anyone age 16 or older in the United States who lived in households with a landline or had a cell phone and for whom they or a household member reported that they drove ‘almost every day,’ ‘sometimes’ or ‘rarely’. Around 6 percent of all driving interviews included a driving trip that required imputation to come up with the estimate of the miles driven. However, there were only about 2 percent driving interviews where all the driving trips reported required imputing the miles driven.
From the 7,913 driving interviews, a total of 16,130 unique driving trips were reported. Of this 601 or almost 4 percent were trips in which imputation was needed to determine miles or duration of the trip. In most cases (92 percent), the imputation was a result of a person unable to estimate the miles driven or they were able to provide an estimate of when the trip started and ended, but not both. Thus, much of the imputation is based on estimating miles driven or length of trip for trips in which we either know how long the trip lasted or how many miles were driven.
Table 3 compares the estimates of duration and miles driven when you include or exclude respondents whose estimates had trip data that required imputation. We show this comparison since approximating the miles driven on average is one of the most important estimates that comes out of the ADS analyzes. The key finding is that although less than 4 percent of trips require imputation the inclusion of imputed trips estimates does increase the estimate of annual miles driven by almost 1,000 miles annually. This increase means that the trips requiring imputation are not random occurrences and more likely to have occurred on longer trips. This is somewhat intuitive since you would expect it to be harder to estimate miles driven on longer trips and longer trips are also more likely to be unusual trips that drivers do not routinely make.
Recall Difference by Demographics and Types of Trip
Table 4 shows differences in the percentage of drivers with missing miles driven or missing trip duration by various subgroups. Figure 2 displays which groups required more imputation on average (above the overage average of 6.1 percent) followed by groups that required less imputation on average.
The likelihood of having to impute miles driven was almost three times greater for female drivers relative to male drivers. But, there was almost no gender difference in the need to impute trip duration. We find higher rates of imputation for miles and duration among African American drivers and lower rates of imputation for White drivers. Teenage drivers and drivers age 75 or older had higher rates of imputation for both miles driven and duration of trip. Education was not a factor for imputation rates because of trip duration but was a factor for imputation rates for miles driven with lesser educated drivers having higher rates of imputation.
Table 5 shows differences in the percentage of drivers with missing miles driven or missing trip duration by various other factors such as whether the trip took place on a weekend versus weekday, time of year the trip took place, the length of the trip, how often does the person drive, and the region of the country the driver lives in. Figure 3 displays which of these other factors required more imputation on average (above the overage average of 6.1 percent) followed by factors that required less imputation on average.
Driving reports for weekend compared with weekday travel did not differ for the percentage of estimates that needed edits or imputation. This was somewhat surprising, since you would expect weekday driving to be more routine and thus easier to report on. We did observe a higher percentage of driving reports during the winter that needed to be edited or imputed. Also, a lower percentage of driving reports from people who live the Northeast region of the country needed to be edited or imputed. Trip length was by far the most important factor in determining whether a driver is more likely to have trouble recalling trip miles or duration. Because longer trips, which are trips that were greater 20 miles, were much more likely to require editing or imputation.
Discussion
A key factor in deciding whether to impute for missing data is whether the missing data is missing at random. In our driving study, few respondents had difficulty recalling the length or duration of driving trips, but the trips where that information was missing were not random trips. So, it became important to impute values for the missing data especially since the reported estimates were aggregate variables the summed information across all trips taken. Had we not imputed but just aggregated across fewer trips, the estimates would have been inaccurate since the trips with missing information differed considerably from the typical driving trip.
Some of the differences in the missing trip information can be attributed to the characteristics of the respondent. Since women, African Americans, and both younger and older drivers were all more likely to be unable to estimates the miles a trip was or how long a trip lasted. However, the bigger story is about the type of trip that was being reported. That is there was a greater chance for respondents to have difficulty reporting miles driven or the duration of trips that were longer. Because these longer trips are invaluable in estimating overall miles driven or time spent driving, replacing the missing data with imputed values based on other trip information was essential for any analysis of driving behavior.
Like the ADS study, we would expect that many if not most studies that collect recall data will find that some groups of people will tend to have a harder time providing the requested information. Therefore, it can be useful for recall studies to figure out who and why people have difficulty providing responses. Knowing who could lead to thinking about ways of improving the wording or tailoring the questions. It may even be worth developing questions that collect alternative information that would help with imputing or interpreting a respondent’s responses. For instance, for our travel study, we learned that longer trips provided more recall challenge which could be aided by collecting more information on longer or unusual driving trips.
Another possibility for collecting difficult recall information is to ask respondents about their confidence in their responses. Perhaps then collecting more information from those respondents who report a lower confidence in their answers? Finally, consider developing flexible probes that interviewers could tailor to help some respondents or use for unusual events. It would be important to test to see if these probes do lead to less missing or unusable data without producing any bias in the responses collected. For instance, for the second year of data collection, we started checking the estimated miles for each hour a person was driving based on the miles and duration of his or her driving trip. For trips where people were driving less than 5 miles per hour of more 65 miles per hour we had them verify their responses. We found that this reduced the number of trips that required imputation without introducing any potential bias in the responses.