Research Background
Crowdsourcing is an emerging, non-probability-based sampling and recruitment approach, which seeks to leverage the reach and utility of “crowds” to accomplish data collection-related tasks. Crowdsourcing is defined as “the act of taking a job traditionally performed by a designated agent (usually an employee) and outsourcing it to an undefined, generally large group of people in the form of an open call” (Howe 2006). The rapid adoption of smartphone and related technologies provide survey researchers with a quick and convenient way of leveraging crowdsourcing approaches. For example, mobile panels can be used to answer surveys and collect information such as location or pictures. Moreover, the approach may be one way to better reach traditionally hard-to-reach demographics (i.e., younger and racial/ethnic groups).
Crowdsourcing is a method used by companies and organizations seeking solutions from the public for ideas on technical problems (i.e., computer programming) or marketing strategies (i.e., product campaign). In the past 2 years, survey researchers have begun to leverage crowdsourcing as a new tool to collect data from online users (Kittur, Chi, and Suh 2008; Kleemann, Voß, and Rieder 2008; Whitla 2009). For example, researchers posted surveys on a crowdsourcing website and invited the “cloud force” to complete a survey and get paid (Behrend et al. 2011). The big difference between crowdsourcing and a typical online optin panel is that with the former assignments can vary in nature and typically include surveys as well as other forms of data capture (such as going to a store and take a picture of a product). In contrast, online opt-in panelists are typically limited to completing surveys (a repetitive task). While crowdsourcing has considerable potential uses for survey research, little empirical work has been published in this area.
We present results of a pilot study conducted in June 2012, examining the viability of crowdsourcing TV viewing surveys via a mobile application called Gigwalk. To learn whether mobile crowdsourcing is a viable method of data collection, we evaluated the sample composition, data quality and overall compliance. Additionally, three experimental conditions were tested, including varying the length of the data collection as well as incentive and then examining the effects on respondents’ cooperation.
Methods
Gigwalk, a third-party opt-in panel recruitment and management vendor, was used in this test. They have built their own mobile application to allow their panelists to download and use it to complete various crowdsourcing tasks. In the Gigwalk application, registered users (panelists) are given various tasks or “gigs” to complete, mostly based on their geographical locations. Panelists can choose to opt in one or more “gigs” available to them. These gigs can come from various companies or researchers and involve many different types of tasks, survey or non-survey related. Once a panelist submits the completed task and the submission is accepted, the promised payment is transferred to his or her associated PayPal accounts.
In the Nielsen study, a national sample of 300 respondents was selected from the Gigwalk panel (though panelists were more likely to from the metropolitan areas) through a pre-qualification online survey collecting key demographic information. The pre-qualification survey was launched and available to all panelists 1 week prior to the data collection period. The qualified respondents were asked to report their viewing every time they watched a TV program more than 5 minutes. The TV viewing survey collected what and where respondents were watching and with whom they were watching (6–8 questions in total).
To test the effect of varying field period – incentive combinations on respondent cooperation, the 300 respondents were randomly assigned to each of the following three conditions (every condition group was assigned 100 respondents):
- Group 1: 1-Day condition with $5 incentive: report TV viewing for 1 day
(respondents could select any day during the specified week); - Group 2: 3-Day condition with $10 incentive: report TV viewing for 3 days
(respondents could select from Thu-Sat, Sat-Mon, or Mon-Wed during the
specified week); - Group 3: 7-Day condition with $15 incentive: report TV viewing for 7 days
in the specified week.
Results
1) Sample Composition. Sixty-seven percent of the recruited respondents completed at least one TV viewing survey during the study period. When comparing across three conditions, the 7-day group had the highest cooperation with 76% of the respondents completing at least one TV viewing survey followed by the 1-day group with 59% and 65% for the 3-day group.
Table 1 presents the demographic distributions for the 300 recruited respondents and the 200 respondents who completed at least one viewing survey. The overall sample composition skewed toward younger, welleducated and racially/ethnically diverse individuals. The same trends were also observed with respondents who completed one or more surveys. (Some variations exist but none is statistically significant.)
When comparing the response propensities associated with demographic characteristics across condition groups, results show that male, under age 35 respondents were more likely to cooperate in longer duration groups than they were in shorter duration groups, while Black respondents were more likely to cooperate in shorter duration day group than they were in longer duration groups. As for education level, there are no consistent trends showing its association with response propensities in different groups.
2. Number of completed TV viewing surveys across days. On average, 2.4 surveys were completed per day per respondent. When comparing the three conditions, the 7-day group had most surveys completed: 2.5 surveys completed per day per respondent compared to 2.2 for 1-day group and 2.3 for the 3-day group. Noted the effects of incentive and measurement length observed here are confound with demographic characteristics and possibly other latent variables. For the effects of field period, respondents tended to submit more surveys at the start of the week than the weekend (as shown in Figure 1). It should be noted the pattern in the 1-day group deviates most significantly from the overall level due most likely to the small cell sizes across days.
3. Comparison of viewing level between Gigwalk and Nielsen. To better understand the validity of the data collected, we compared the daily viewing hours reported in this study with what is collected from the Nielsen TV Diary (based on a probability sample of over 200+ designated market areas and use a paper diary for 1-week data collection). Table 2 shows the weekly viewing hours reported from 7-day condition group in this study and that from the Nielsen TV Diary.
The total viewing time reported in the 7-day group is 37% less than what had been reported in the diary. When controlling for age and race-ethnicity, all demographic groups have a lower viewing hours than that in diary except for those aged 35 years and under, where the average viewing hours is 9% higher than that in the diary. It is difficult to infer that the respondents were underreporting or overreporting their TV content consumptions because (1) sample size is small especially for certain subgroups such as aged 50 and above, which impacts the data reliability; and (2) other behavior factors that might associated with TV consumptions were not taken into account such as smartphone ownership. (Respondents who own smartphones use their time differently from people who do not own smartphones.) Note that no significant tests were done on the viewing hours difference due to the small sample size.
Discussion
The respondents recruited from the Gigwalk panel were generally very cooperative in responding to the survey task. In fact, the respondents in the 7-day condition were most compliant — though importantly there is potentially a strong incentive effect with the Gigwalk panelists typically picking the highestpaying “gigs” regardless of the perceived burden of the task. The discrepancies observed with the reported viewing level when compared with the benchmark data from Nielsen may also be due to differences in mode, sampling approach, and incentive amounts. It is unclear whether the latent behavioral variables such as smartphone ownership have contributed to the differences.
Based on the results presented here, we conclude that mobile crowdsourcing is a promising approach for research questions that can be answered with non-probability sampling approaches (Baker et al. 2013). It is certainly worth exploring further given that survey administration time and cost can be significantly lower than traditional methods. Also, the Gigwalk mobile panel does have broader coverage of younger cohort as well as Hispanic cohort. Some considerations are needed when trying to use mobile crowdsourcing for survey research: (1) the coverage of the platforms being considered, such as compatible devices and operation systems; (2) the demographic characteristics and geographic coverage of the panel, for example, in we found the panelists in our study tended to be young, well-educated and resided in metropolitan areas; (3) the competing tasks available on the platform that might impact the survey cooperation, in which researchers can leverage the historical data from the service vendor to determine a competitive and optimized combination of task burden and incentive amount; and (4) the capabilities of the platforms in terms of survey programing which would impact significantly on the questionnaire design.