Lab · Accountable AI 30 MIN · BREAK
Hands-on

Auditing fairness for an
AI court-recommendation system.

Strategies for Accountable AI · Wharton Executive Education

You are an auditor, assigned by New York City, to evaluate a machine-learning system used by NYC courts to recommend whether defendants should be Released on Recognizance (ROR) — released without bail on a promise to appear.

The system was built by an outside vendor and trained on historical court data. Judges see its recommendations and may follow or ignore them. Your job: decide whether the system is delivering equitable outcomes across demographic groups, and what should happen if it isn't.

30MINUTES
02 MIN Look at the data below.
12 MIN Question 1. Are these the right fairness metrics?
12 MIN Question 2. What additional data would you need?
04 MIN Hold Question 3 for the group debrief when we return.
The data

What the court shared with you.

When you asked the court for data, they gave you exactly what's below — the AI system's ROR_at_Arraign recommendation rates over the past twelve months, alongside the historical rate at which judges granted ROR for the same demographic groups (the data used to train the model).

Release-on-Recognizance rates by race · last 12 months
Race AI recommendation rate Judges' historical rate
White 64% 68%
Asian / Pacific Islander 62% 65%
Hispanic 53% 51%
Black 49% 44%

Illustrative figures patterned on real ROR-disparity findings. Largest gap shown in red.

What to notice

The AI recommends ROR for Black defendants at 49% vs. 64% for White defendants — a 15-point gap. The judges' historical gap was larger (24 points). The AI narrowed the disparity, but did not eliminate it. Bring that observation into Question 1.

Pre-break

01Is this the right way to measure fairness?

12 MIN · INDIVIDUAL OR PAIR

Looking at the ROR_at_Arraign rates by Gender, Race, and Ethnicity in the AI recommendations sheet:

02What's missing to make the decision?

12 MIN · INDIVIDUAL OR PAIR

The table above isn't enough to render a defensible audit. What more would you ask for, and from whom?

For the group debrief — after the break

03Where should accountability lie?

GROUP DISCUSSION

Assume you've judged the system unfair. Who in this chain should be penalized?

Make your argument. Consider:

Want to dig deeper after class? The full dataset (≈7,000 defendant records with demographics, charges, and rulings) is available as an Excel workbook: Fairness_Assignment_Data.xlsx.

Prasanna Tambe · Wharton · Bias in Machine Learning Systems