Revolutionizing Survey Data Quality at the U.S. Department of Agriculture’s National Agricultural Statistics Service with IDEAL
Challenge: Survey data quality is crucial for any agency responsible for official statistics. At the U.S. Department of Agriculture (USDA) National Agricultural Statistics Service (NASS), improving survey data quality required hundreds of NASS analysts to spend hours hand-correcting survey responses and filling in missing data for more than 200 annual surveys. NASS faced significant challenges managing and improving survey data quality. Analysts were burdened with repetitive tasks such as replacing missing values with zeros and manually inputting data from one system to another. This process was time-consuming, inconsistent, and prone to errors.
Solution: Summit designed, developed, and implemented Imputation, Deterministic Edits, And Logic (IDEAL), an enterprise-wide application that automates the editing and imputation of survey responses as part of a once-in-a-generation modernization effort. IDEAL fills in missing data with probable values and resolves conflicting data with internally consistent values. The application includes a user interface that allows NASS to manage validation rules across multiple surveys. The IDEAL system was implemented using a robust and scalable technology stack:
-
- Open-source R packages: IDEAL leverages open-source R packages for statistical analyses, including a flexible estimation-maximization algorithm for imputation and a linear programming algorithm—developed by Statistics Netherlands—for error resolution.
- Cloud environment: IDEAL is deployed in USDA’s FedRAMP-compliant Azure Cloud environment, which ensures nationwide accessibility and the ability to handle computationally intensive tasks.
- Database services: IDEAL uses an on-premises Microsoft SQL Server database for initial data ingestion and Azure Data Factory for statistical analyses. The backend connects with the database via a web API, and the system includes an API exposed for providing details of the pipelines being run in Azure Data Factory.
- Cybersecurity: IDEAL includes role-based access controls to prevent unauthorized access, ensuring data privacy and cybersecurity.
Result: The development and deployment of IDEAL has been an overwhelming success. Within a short period, IDEAL matured from a proof of concept to a working prototype and was deployed to assist with the September 2024 Agricultural Product Survey. IDEAL improved the clean rate from a 50% baseline to 93% through automated edits, significantly reducing the time required by analysts.
In addition, IDEAL has received accolades for its success. Summit presented “Rule-Based Data Validation and Reconciliation of Survey Responses” at the 2023 Federal Committee on Statistical Methodology conference and showcased IDEAL at the 2024 Joint Statistical Meeting, highlighting NASS’s intellectual leadership.
The IDEAL project demonstrates Summit’s ability to deliver innovative solutions that enhance efficiency and accuracy in survey data management. By automating repetitive tasks and improving data quality, IDEAL allows NASS to redirect resources to high-value analyses, ultimately benefiting the entire agricultural sector with timely and accurate official statistics.