*This is the fourth installment of our new blog series: Predictive Analytics in Enforcement. See our previous posts: What is Predictive Analytics?, Searching for Regulatory Violations, and Technical Challenges with Non-Random Investigative Data*
Last week, we discussed the technical challenges with using non-random investigative data for enforcement. In this post, we will explore the challenges of population uncertainty and information gaps.
Challenge: Uncertainty about the population under regulatory enforcement
Many agencies only periodically identify and collect information on all organizations under their regulatory purview; or, they only do this for the specific organizations which they investigate. For instance, the agency may know that all businesses with employees fall under their regulatory enforcement, but may not know exactly how many businesses this includes, or the specific identity of these businesses. For these agencies, a predictive model that focuses only on estimating an organization's probability of violation will be of limited utility, because the model’s results cannot be used to target for investigation organizations that are not identified in the agency’s data.
Resolution: Leverage the explanatory power of the model’s predictors