Mechanical Turk: Potential Concerns and Their Solutions

How to Mitigate Concerns about Mechanical Turk

With contributions by Teresa Kline

While Mechanical Turk (“MTurk”) offers definite benefits to researchers over traditional respondent platforms—it’s a significantly faster and less expensive data-collection method—there are some concerns about data collected via MTurk. In this last installment of our series, we’ll explore three main issues:

Respondent “non-naivete”
Representativeness and selection bias
Response quality

Respondent Non-Naivete

One potential concern for data collection via MTurk (similar to online panels) is that respondents who complete many surveys or experiments can become bored and pay less attention to their tasks or answer questions in a way they believe the requester wants. This is referred to as respondent “non-naivete” or “professional respondents.” Professional respondents have the potential to bias the results.

Many Turkers have done hundreds of experiments and surveys; the MTurk website sets no limits on the number of human-intelligence tasks (“HITs”) a Turker can complete. Research has suggested that 10% of Turkers are responsible for 40% of all experimental responses.

Although respondent non-naivete is something to keep in mind, it’s not a reason to abandon MTurk. Researchers can exclude individuals with high complete HIT scores from the respondent pool to mitigate this. When researchers create a new task, the MTurk requester website page allows researchers to add qualifications for who they want to complete their task, including HIT scores. Additionally, researchers can monitor Turkers’ IP addresses via external survey platforms to prevent respondents from participating multiple times in a single survey or experiment.

Representativeness and Selection Bias

Although a common criticism of convenience samples—that is, samples drawn from easy-to-reach populations—is that they are not representative of the general population, Turkers are a very diverse group. Turkers in the United States are typically slightly younger and more educated than the U.S. population as a whole. Despite being more educated, Turkers tend to have lower incomes, although the distribution closely follows that of the U.S. population. One study compared a sample of 551 Turkers against the 2008–09 American National Election Studies (“ANES”) Panel Study and found that the MTurk sample was slightly more female (60% versus 58%), slightly less educated (14.9 years versus 16.2 years), and a similar percentage white (83.5% versus 83.0%). On the other hand, the MTurk sample was notably younger (32.3 years old versus 49.7 years old) than the ANES sample, and milestone achievements, such as marital status and home ownership, along with income, were also lower (Berinsky, Huber, & Lenz, 2012).

As noted in our first post, even with these demographic differences, the MTurk sample is more representative of the general U.S. population than the traditional university student recruits (Paolacci, Chandler, & Ipeirotis, 2010). Researchers can address selection bias by using geolocation to enforce geographic restrictions (to ensure that one geographic area does not self-select into the survey more than others) and tracking IP addresses (to ensure that an individual is not completing the survey more than once).

Response Quality

Another potential concern for using MTurk is the quality of responses received. A 2019 study found that about 20% of respondents circumvented the location requirements (for example, taking a survey aimed at New York residents while living in California) or took the survey multiple times. At least 5% to 7% appeared to engage in satisficing (doing the minimum required, such as selecting the first response option for a series of survey questions) or trolling (giving intentionally outrageous or misleading responses). This study found that low-quality responses had a small impact on the results of an experiment, reducing the effectiveness of an experimental treatment by about 10%. While low-quality responses are a concern for all crowdsourcing platforms, research shows that they are more prevalent on MTurk than other platforms and may be increasing.

To address these concerns, researchers using MTurk should include quality-control questions to test respondent attention and detect trolling and satisficing, implement quality filters and let workers know that they will receive a bonus payment for quality work, be mindful of compensation rates (there is potential concern with setting rates too low or too high) and Turker reputation, and include only workers with a high percentage of accepted HITs.

Overall, MTurk is a fast and inexpensive way to recruit survey or experiment participants. Summit has used this platform to help our clients reach a nationally representative respondent pool for surveys and experiments. As we discussed in our second post, MTurk has been used extensively in academic research and even in legal cases. As we discussed above, while there are some concerns to keep in mind about data collection via MTurk, the benefits—which include flexibility, speed, and low cost—outweigh the potential concerns if researchers are careful to take the mitigating steps we describe in this post.

Mechanical Turk: Potential Concerns and Their Solutions

Get Updates

Featured Articles

Categories