Case Study | Initial Assessment of Artificial Intelligence and Machine Learning in Third-Party Software for Government-Sponsored Enterprises

Initial Assessment of Artificial Intelligence and Machine Learning in Third-Party Software for Government-Sponsored Enterprises

Challenge: Government-sponsored enterprises are accustomed to understanding the scope of risks found within internally developed models. The landscape of business these days universally includes using software from third-party vendors. These software can contain artificial intelligence (AI) or machine learning (ML) models with similar, but different, risks than models evaluated by traditional model risk management (MRM) practices. Consequently, enterprises must define, inventory, and assess these risks and develop a standardized process that enables a thorough review and understanding of risks without stifling business usages and draining time and resources every time software is updated or a third-party company rolls out a new product. A significant challenge is that most models are proprietary and information is limited to what the vendors will allow or to performing a full validation yourself. Summit was tasked with assisting and providing feedback to the government-sponsored enterprise MRM team to shape the process.

Solution: To properly understand what the model is and identify any involved risks, Summit started by obtaining information on several aspects of the software. First and most important was determining whether the software truly contained an AI/ML model(s) within it. Many marketing teams attach buzzwords without a true justification. Once a model was identified, the next step was to determine the type of model. Different types of models inherently contain different types of risks, which helped reveal the types of questions to pose about the software or model’s usage. Examples of some models and their risks are models dealing with natural language processing that can amplify bias based on the language they were trained on. Alternatively, some model architecture uses boosting to correct itself over time, which presents a risk to where the company’s data is being housed and if this data could be extracted from those models.

Though the MRM team could glean inherent risks based on just the information listed above, the main understanding of the risks these models posed to the company stems from understanding how they were being used. A model being used for business process improvement (such as giving recommendations on the next word to use in a sentence) are lower risk due to a reliance on human involvement to evaluate and choose whether to apply the model’s output. A higher risk would be a model that triggers additional processes without human intervention, such as establishing if a user should be prompted for a two-factor authentication prompt if a model deemed certain criteria were met. Other reasons for a model having higher risk are that the output cannot be evaluated for accuracy easily (or at all), how the training data was collected and if legal risks could be found based on using such data, and whether government-sponsored enterprise data was being used for later training or could be exposed by downstream processes. Throughout this process, there is communication between the MRM team and model owners (employees whose responsibility is to be the steward and point person within the company to the vendor).

Once the software is thoroughly researched and an understanding of the risks determined, the findings are presented to the model owner and the VP over the section where this software is being deployed. Once all parties agree (including MRM and the owners), all information is formally confirmed and documented. If the initial assessment finds full validation is necessary, then additional steps are taken to understand the model’s full scope and evaluate the outputs. If the risk is low enough, then the initial assessment is enough for approval of usage with the understanding that changes to either the usage of the model or the model itself would require an additional assessment. The frequency of these assessments is mitigated by the categorization of the model’s capabilities as functional libraries. For example, a model’s functional library is that of optical character recognition. If the model goes through an update, but the underlying model does not perform a different task, then no additional assessment is necessary. Actions other than documentation will only need to be taken if additional methods are added to the functional library of the software.

Result: With the support of the Summit team, the government-sponsored enterprise MRM team now has the infrastructure in place to allow innovation within the business by establishing a process to quickly understand risks associated with AI/ML models contained within third-party software. This infrastructure decreases the unnecessary concerns with adopting incoming or upgrading software so the rest of the business can focus on implementation and the benefits the software brings. This process also alleviates potential anxiety around risks, as nearly all current models have been inventoried and rated. The next steps for the project are to continue supporting incoming work and to expand the findings and results of this project to third-party models that contain generative AI, which increases the complexity of evaluating the software for risks.