Mechanical Turk Use in Psychological Experiments and the Courts

November 18, 2020 Kami Ehrich

Summit Mechanical Turk series

With contributions by Teresa Kline

In this post, we’ll look at some examples of how Mechanical Turk (“MTurk”) has been used in academia and litigation. In the academic realm, MTurk has often been used for surveys, psychological experiments, and behavioral experiments.2020 mechanical turk robot only - wide imageAs we discussed in the first post of this series, MTurk has a large, diverse respondent pool that doesn’t have the same limitations as the traditional university-student subject pools. In fact, MTurk has been found to be more representative of the population and provide the same quality results as the university-student subject pool. Researchers have replicated several classic experiments in behavioral economics using MTurk, including the following:

  • The Prisoner’s Dilemma: Participants imagine they are one of two prisoners in solitary confinement with no means of communicating with the other. Prosecutors lack sufficient evidence to convict the pair on the principal charge, but they have enough to convict both on a lesser charge; they offer each prisoner a bargain. Each prisoner is given the opportunity either to betray the other by testifying that the other committed the crime or to cooperate with the other by remaining silent. The outcome of these options is that the prisoner chooses to protect him- or herself at the expense of the other prisoner.
  • Public-Goods Game: Participants are asked to contribute funds to a public pool, which is augmented by some value based on the amount contributed. The augmented funds are then distributed evenly among all participants. Participants are more likely to increase their initial contributions if their neighbors were also high contributors and more likely to decrease their initial contributions if their neighbors were low contributors.
  • The Everest Experiment: Participants are asked to estimate the height of Mount Everest as greater or less than a series of measurements. For instance, some are asked whether the mountain is greater or less than 2,000 feet, while others are asked if it is greater or less than 45,000 feet. People who are given the higher anchor (45,000 feet) tend to provide larger estimates than those who are given the lower anchor (2,000 feet).
Shifting to litigation, MTurk has been used in at least 10 cases over the past seven years. In Louisville Marketing v. Jewelry Candles, MTurk data was admitted as evidence in a market-research survey that asked participants about jewelry in candles.

Cases that have used MTurk span multiple industries and include large corporations such as Apple, Microsoft, Netflix, and LinkedIn as well as government agencies. Some decisions of note:
  • FTC v. National Urological Group
  • Brighton Collectibles v. Believe Production
  • Ebin v. Kangadis Food
  • Fagan v. Neutrogena
  • Saint-Jean v. Emigrant Mortgage
  • Wikimedia Foundation v. National Security Agency
While MTurk’s use in most of these cases has involved surveys on consumer preferences or brand recognition, experts have also used MTurk to test the validity of procedures such as police lineups. Judges and opposing experts in these cases have generally focused not on the characteristics of the MTurk participants and their representativeness, but on the practices used to implement the surveys or experiments. Overall, MTurk results have been treated in much the same way that surveys from other sources, such as the university-student pool, are treated.

In the third and final part of our blog series, we’ll walk through concerns specific to MTurk (such as the quality of the data and the representativeness of the population) and how to mitigate them.

Share This: