Predictive Analytics in Enforcement: Why the IRS Does Random Audits

August 20, 2015 Amy Deora

*This is the fifth installment of our new blog series: Predictive Analytics in Enforcement. See our previous posts: What is Predictive Analytics?Searching for Regulatory Violations, Technical Challenges with Non-Random Investigative Data, and Population Uncertainty and Lack of Information.*

Agencies charged with enforcing laws or regulations use many different methods to determine who or what to investigate. Good investigators will always rely on tips, hunches, and suspicious activities. But over the past decade, agencies have started using predictive models and other types of data analytics to determine what types of people, businesses, or other entities might be most likely to engage in all types of wrongdoing, including lawbreaking, noncompliance, or fraud.

In previous weeks, we discussed the technical challenges of using data from an investigative or enforcement process to model some types of compliance in an overall population. In this post, we explore why organizations use random sampling to gain a better overall picture of activity in the population.

Most agencies have years of detailed records on their investigative activities, including information on who was investigated and what their findings were. These records provide rich—but limited—sources of information. To address these limitations, many organizations use random investigations to learn more about the overall population that they are charged with monitoring.

An example enforcement scenario:  Let's say I am in charge of a state office charged with ensuring that gas stations are properly charging customers, based on their posted rates. My office investigates gas stations based on tips from consumers, identified geographic areas and chains that have a history of problems, and plain old investigator hunches. The governor wants to know: How widespread is the practice of overcharging customers in this state? And, how can we do a better job at catching the offenders?

Using statistical analysis of administrative data:  For example, if I have information about the entire population, I can use my investigative data to help me estimate overall noncompliance. For example, my investigative records show that 11% of very small gas retailers in a particular city are violators. I can apply that rate to all very small gas retailers in that city, then do something similar for all groups in the population. There are several fancy statistical techniques I can use to do this, but there are always some key assumptions. I assume (1) that there aren’t characteristics of my population that impact noncompliance that I’m not capturing, and (2) that my investigators are catching, at least to some extent, all types of noncompliance. This estimate is fairly cheap and easy, but there will be a lot of caveats around my result.

Using random sampling:  The most bulletproof way of estimating the overall rate of stations who overcharge customers for gas is to investigate a random sample of gas stations in the state. These investigations will not be based on any type of tip or suspicion. They will be purely random in terms of probability of violation, but part of a statistical sample designed to be representative of the overall population of gas stations. This will get me the answer that I feel most very confident about sharing with the governor. Information from this random sample can also:

  • Demonstrate how good investigators are at catching violators. If investigators are doing a very good job, we would expect a large difference between the overall level of noncompliance in the state (measured through a random process) and the amount of noncompliance investigators uncover through their efforts.
  • Develop predictive models that can help target investigative resources. As has been discussed in previous posts, when I develop a predictive model that predicts noncompliance, I need to consider the same issues that I do when I ask myself if my investigative data can be used to determine a population-level compliance estimate.

Random investigations can provide enforcement organizations with a key missing piece of the puzzle that can help them better leverage the investigative data they already have. So if you—a perfectly law-abiding citizen—are subject to, let’s say, a random IRS audit of your tax return? Just remember: yes, it’s annoying, but you’re doing your part to help catch the bad guys. 

Share This: