Polling error for the national popular vote for the 2020 US pre-election polls was the highest in 40 years and no mode of interviewing or method of sampling was unambiguously more accurate (AAPOR 2021). This poor result for pollsters at the 2020 US election occurred amid a spate of well documented polling failures in several countries in recent years (Cornesse et al. 2020). Online panels, being the dominant method of surveying voters nowadays, are well-placed to help reduce the level of error in pre-election polls.
Trying to improve survey-based estimates of voting intentions by adjusting (i.e., balancing or weighting) one’s sample so that the recalled vote choice of respondents in the previous election is consistent with that of the voting population in the previous election is a commonly used method in pre-election polls in many parts of the world (Cabrera-Álvarez and Escobar 2019; Durand and Johnson 2021; Wells 2019) but less common in the US (AAPOR 2021). Despite widespread use of this practice, there has been relatively little recent research into the efficacy of past vote weighting, and the research that has been done tends to show mixed results (Durand, Deslauriers, and Valois 2015).
However attractive balancing or weighting a pre-election poll sample so that it is representative of the voting choices of the population at the last election may appear, such an approach is not without potential respondent-related measurement error. Recall of past vote may be inaccurate due to: (1) memory failure, (2) the tendency of voters to misreport how they previously voted to reconcile it with how they currently intend to vote, and (3) social desirability (Durand, Deslauriers, and Valois 2015). In addition, some respondents will not have voted in the previous election.
Furthermore, the implications of weighting using an unreliable measure of past vote can be substantial. Wells (2019) conducted an experiment in which he reweighted YouGov polling data for the 2017 Brexit election using a reported past vote measure collected immediately after that election (at which time 41% reported voting for Labour) and a reported past vote measure collected from the same respondents two years later in 2019, at which time only 33% reported having voted for Labour in 2017. When used as weighting variables and aligned to past vote benchmarks, the difference that these two measures had on the subsequent estimates of voting intentions was relatively large in the context of pre-election polling. When the 2017 (short-term) measure of recalled past vote was used as the weighting variable, estimated support for Labour was 21%; when the 2019 (long-term) measure of recalled past vote choice was used, estimated support for Labour increased to 24%.
Our research adds to the literature on past vote weighting by discussing how more reliable measures of past vote choice can be gathered and used to improve estimates of voting intentions. The focus of the current research is to compare alternative methods available to online panel providers for measuring respondents past voting behavior with a view to offering some practical guidance as to how panel providers can contribute to efforts to reduce bias in the pre-election polls conducted on their panels.
Data and Methods
The data used for this study are from an Australian pre-election poll undertaken by the Australian National University (ANU) in April 2019, one month prior to the 2019 federal election in Australia. This ANUpoll was conducted on a sample drawn from Australia’s only national probability-based online panel, Life in Australia™.[1] A total of 2,686 panel members aged 18 years and older were invited to take part in the survey and 2,054 (76.5%) completed the questionnaire with a cumulative response rate (for forming the panel and completing this questionnaire) of 8.6% (Callegaro and DiSogra 2008).
In our view, despite the Australian electoral system using “compulsory” preferential voting,[2] there are sufficient similarities between the Australian electoral system and polling practices and those of many other countries, particularly those countries that have single member multi-party constituencies, to make these results broadly applicable to online pollsters. Despite compulsory voting, the voting age population turnout in Australia’s most recent national election, 76%, was not too dissimilar from the 62.6% voting age population turnout in the US and 62.3% in the UK (DeSilver 2022). As such, predicting the proportion of the eligible population that will cast a valid vote is almost as significant an issue for Australian pollsters as it is elsewhere and this suggests a broad applicability of these findings.
Measuring past vote
Three alternative measures of recalled past vote choice were created as weighting variables and added to the survey data: One was based on respondents’ short-term recall of their past vote choice (collected three months after the 2016 election), another was based on their long-term recall of their past vote choice (collected 34 months after the 2016 election), and the third was a blended measure whereby a random half of respondents were assigned their short-term measure of their 2016 vote choice and the remainder their long-term measure. These measures of past vote in 2016 were alternatively incorporated into a standard survey weighting solution and aligned to voting benchmarks in the 2016 election. The impact of each weighting solution on the bias and variance on the resultant ANUpoll estimates of voting intentions for the impending 2019 election are then compared to the actual 2019 election outcome.
Short-term recall of past vote: At the initial recruitment stage of the panel in October 2016, panelists were asked about their vote choice in the preceding federal election which was held just a few months earlier on July 2, 2016. The question wording and response options are provided in Appendix 2. The responses to this question were merged with the ANUpoll dataset and formed our short-term recall measure of past vote.
Long-term recall of past vote: The long-term recall measure of past vote was collected almost three years after the 2016 election and was available for a large subset of the same panelists. The question asked of these panelists is also provided in Appendix 2. The responses to this question were merged with the ANUpoll data and formed our long-term recall measure of past vote.
Blended recall measure of past vote: This measure was created to simulate a situation very likely to confront online panel providers whereby due to reasons associated with unit and item nonresponse, panel attrition, and panel replenishment, even if a short-term measure of past vote choice is routinely collected from all panelists immediately following each election, it is unlikely that all of these panelists will still be responding to survey requests when the next pre-election polling cycle is underway. This means that the measure of past vote likely to be available for a substantial portion of panelists would be collected at different times during the election-to-election cycle. While we could not simulate this situation exactly, we did approximate it by creating the blended measure of past vote using a combination of the short-term and long-term recall measure and appended this “blended” measure to the data set.
As per Durand, Deslauriers, and Valois (2015) and Dassonneville and Hooghe (2017), we found that for Life in Australia™ panelists, in aggregate, the accuracy of their recalled past vote—measured as the average error between the recalled vote and the actual election result—diminishes over time. The average absolute error of our short-term recall measure relative to the 2016 election outcome was 2.3 pp increasing to 3.0 pp for the long-term measure (data not shown).[3] This level of inconsistency between short-term and long-term past vote recall measures is of a similar magnitude to that reported in the literature cited by Durand, Deslauriers, and Valois (2015).
Approach to weighting
To investigate the relative impact of weighting by these three measures of past vote choice (short-term, long-term, and blended) on estimates of voting intentions, it was necessary to incorporate these measures into an appropriate weighting solution. Seven weighting scenarios were evaluated:
Weight 1: Age by education, sex, and geography (state by capital city/rest of state). This is our baseline weight and main point of comparison for the other six weights. Educational attainment was included in this baseline weighting solution because a failure to do so was seen as one of the contributing factors for the less accurate state-level polls in the 2016 US Presidential Elections (AAPOR 2017).
Weight 2: Age by education, sex, geography, and short-term recall of past vote.
Weight 3: Age by education, sex, geography, and long-term recall of past vote.
Weight 4: Age by education, sex, geography, and blended estimate of past vote.
Weights 5 to 7: Weighting only by short-term recall of past vote (Weight 5), only by the long-term recall measure of past vote (Weight 6), and only by the blended measure of past vote (Weight 7), as single factor weighting solutions.
Each weight was calculated using the rake procedure from the survey package in R (Lumley 2004, 2010, 2020).
The population benchmarks for the demographic variables (age, sex, geography, and educational attainment) were compiled from the Australian Bureau of Statistics 2016 Census counts and the March 2019 Estimated Residential Population figures. The past vote weighting benchmarks were compiled by the Australian Electoral Commission (see Appendix 3, Tables A1 and A2 for details).
Error metrics
Measures of bias. Two measures of bias were used. The first was to calculate the weighted average absolute error of the 2019 Australian national election primary vote estimates compared with the election results.[4] The second measure is peculiar to Australian polling and is in response to the preferential voting system used in Australia.[5] This measure captures the bias in the polls-based estimate by measuring the average absolute error of the two party-preferred vote (2PP).
Measure of variance. The variance introduced by the weights is measured using the design effect (i.e., deff) calculated by Taylor series linearization by the svymean procedure in the survey package in R (Lumley 2020).[6]
Overall error measure. Mean square error (MSE) (Korn and Graubard 1999) is a measure which combines bias and variance to assess the impact of weighting on the total survey error as follows:
MSE=B2+VMSE=B2+V
where B is the primary measure of bias (in this case the 2PP bias) and V is a measure of variance estimated from the data set. Korn and Graubard (1999) estimate the deff using the variance of the weights. However, given that the assessment of accuracy in this research is focused on a single measure (i.e., 2PP vote), the Taylor series linearized deff is used. Root mean squared error (RMSE) is used so that the result is on the original scale of the percentages.
Simulations
To obtain estimates of the degree to which the different voting intentions estimates produced by the different weighting solutions were due to sampling variation, 10,000 samples were obtained by random re-sampling with replacement of the original data to the same sample size. Each weighting scheme was calculated for each re-sample to obtain estimates for all weighting options. The reported standard errors represent the 95% confidence intervals of the re-samples, i.e.:
CI=t∗±1.96 . se∗CI=t∗±1.96 . se∗
where CI is the confidence interval, t* is the average estimate, and se* is the standard deviation of the 10,000 resamples.
Probabilities represent the proportion that one weighting scheme produces superior results to another weighting scheme adjusted to be two-tailed probabilities, i.e.:
p={p∗≤ .5, pp∗> .5, (1−p) 2
where p* is the proportion of re-samples where one weighting scheme is superior to the other. Probabilities were adjusted for multiple comparisons using the technique described by Benjamini and Hochberg (1995) using the p.adjust function in R.
Results
Table 1 shows the results for each error metric based on the original data, as well as the average for each metric from the simulations, alongside their 95% confidence intervals (with the simulated results in brackets). Table A3 (see Appendix 4) shows the probabilities associated with the null hypothesis which in this case tests whether the RMSE of the 2PP vote (i.e., the final column of Table 1) for one weighting solution is equal to the RSME of a comparative weighting solution. Table 1 in conjunction with Table A3 shows whether the survey estimates generated by the various weighting solutions meet the threshold for statistical significance adjusted for two-tailed probabilities.
Weight 1 (age by education, sex, and geography) produces a primary vote estimate with a weighted average absolute error of 2.58 pp and an average absolute 2PP error of 4.08 pp. This weighting solution increases the deff by a factor of 1.41 and has a RMSE (which encapsulates both bias and variance) of 4.29 pp.
Incorporating the short-term recall measure of past vote into Weight 1 reduces the weighted average absolute error on the primary vote to 1.41 pp, compared to 2.95 pp when the long-term measure is used. The short-term measure also outperforms the long-term past vote measure in terms of the average absolute 2PP estimate with an error of 2.41 pp for the short-term recall measure compared with 4.45 pp. The short-term past vote recall measure also has a significantly lower RMSE than the long-term recall measure (as shown in Table A3).
These comparisons demonstrate that in this instance adding a short-term past vote adjustment to the baseline weighting solution results in markedly less biased estimates than adding the long-term recall measure, and generally results in better estimates than the solution that does not include any past vote adjustment, although this later comparison fails to meet the p<.05 threshold of statistical significance (p=.075).
A comparison of our baseline weighting solution with Weight 4 (age by education, sex, geography, and the blended measure of past vote) shows that using the blended measure of past vote produces a prima facie (but not statistically significant) reduction in bias for the weighted average absolute primary vote, the absolute average 2PP vote, and the resultant RMSE but is significantly more biased than the solution using the short-term recall measure of past vote.
Just adding the short-term recall weighting adjustment on its own (Weight 5), results in the best overall weighting solution (RMSE 1.10 pp) with the blended estimate of past vote on its own (Weight 7) being the next best solution (RMSE 2.03 pp). The strong performance of these single factor weighting solutions gives pause for thought when the goal is to produce the estimate of voting intentions with the least possible bias (and preferably least variance).
Discussion
The results of our study replicate those of Dassonneville and Hooghe (2017), Durand, Deslauriers, and Valois (2015), and Wells (2019) in that how or when past vote data is collected makes a difference. Our findings show that adding a short-term recall measure of past vote choice to a standard weighting solution produced less biased estimates of voting intentions, with a tolerable increase in variance, compared to other past vote measures. The likely reason for this is that the short-term recall measure of past vote is less affected by respondent-related measurement error compared to both the long-term and blended recall measures of past vote.
A practical implication of this research is that panel providers have an important role to play by ensuring they capture the best possible measure of past vote choice from their panelists. This could be achieved if panel providers routinely collect past vote choice as a profiling variable for all active panelists very soon after each election and again when recruiting new panelists between elections. Panel providers could also consider quarantining a segment of their panel for pre-election polling and making extra efforts to reduce churn in this segment so as maximize the number of panel members for whom they have a short-term recall measure of past vote.
In conclusion, incorporating into standard election poll weighting solutions a past vote adjustment based on the short-term recall of a respondent’s vote choice at the previous election resulted in less biased estimates of voting intentions than using either a blended or a long-term recall measure of past vote and also resulted in better estimates than solutions that did not include any past vote adjustment. However, adding a blended or a long-term recall measure of past vote as part of a multi-factor weighting solution did not consistently produce a better outcome than weighting by age and education, sex, and geography. If a single-factor approach to weighting is to be contemplated, then just using a short-term or blended recall measure of past vote choice seems worthy of further exploration.