Auditing Fairness for AI-based Court Recommendations

Strategies for Accountable AI

Objective:

Put yourself in the shoes of an AUDITOR, assigned by New York City, whose role it is to evaluate the fairness implications of a machine learning system used in the NYC court systems to recommend whether defendants should be Released on Recognizance (ROR). A Release on Recognizance ruling allows defendants to be released from custody without posting bail, based on their promise to appear at future court proceedings.

The Machine Learning (ML)-based AI system, developed by an external technology vendor, generates recommendations, based on historical data provided by the court system, on whether defendants who appear before a judge should be released on recognizance. These recommendations are used by judges to make decisions about whether to grant ROR. Judges may choose to ignore the AI system's recommendations.

Your task is to audit this AI system for fairness. You are asked to rule on whether the system is delivering equitable outcomes across demographic groups. When you ask the courts for data, they provide you with two sheets of data. The purpose of this exercise is to help surface some of the significant and open conceptual and operational challenges of auditing machine learning systems.

The provided Excel workbook has the two sheets the court chooses to share with you:

The first summarizes the output from the AI system over the past twelve months -- it reports pivot tables that summarize the ROR recommendations (rates) that the algorithm makes by demographic group (Gender, Race, and Ethnicity).
The second illustrates the historical data that was sent to the vendor to train the AI system. It contains records and attributes of defendants who appeared before judges in NYC, as well the judge's ROR ruling.

Please consider how you might answer the following three sets of questions:

1. Consider the Baseline Fairness of the System:

Review the ML-recommended ROR_at_Arraign rates by Gender, Race, and Ethnicity that are summarized in the pivot tables.
Answer the following questions:
- Are these numbers good metrics for evaluating fairness in this context? If not, what might other metrics take into account that these don't?
- Beyond just the raw numbers, what contextual factors might need to be considered when evaluating fairness in this high-stakes criminal justice application?
- For this assessment, should it matter if the model's recommendations are more or less unequal than the historical judicial decisions (made without the benefit of the AI system)?

2. Identify Additional Data Needs for Determining Fairness:

Consider what additional data and information you might need to help you decide if this is a truly biased AI system or not.
- What types of data or information, if any, would you want the COURTS to provide?
- What types of data or information, if any, would you want the VENDOR to provide?
- Data privacy laws are implemented at the state level. If the technology vendor is located in another state (e.g. California), how might this complicate your work?

3. Where Should Accountability Lie?

Assume you deem this system to be unfair, WHO in this chain do you think should be penalized?
- The Vendor? They developed the algorithm and trained it using historical data.
- The Court System? They selected and implemented the vendor's technology. They also generated the historical data on which the model was trained.
- The Judges? They use the model's biased recommendations to make decisions.
Provide an argument for your choice, including:
- Who has the greatest influence over the fairness of the outcome?
- Who should bear the responsibility for how the system is used and why?
What types of remedies might you recommend, and to which party?