Can a Chatbot Deliver Justice? This Law School Puts It to the Test

Raleigh: At a mock trial in North Carolina, the jurors weren’t human—they were algorithms. ChatGPT, Claude, and Grok sat side by side, parsing testimony in real time. What followed was a glimpse into a future courtroom where judgment may one day be written in code.

Contents

AI in the Jury Box Fictional Trial, Real Questions Bias, Hallucination, and the Limits of Machine Judgment The Future of Justice, Coded and Contested

AI in the Jury Box

The University of North Carolina School of Law staged an unprecedented legal experiment this week, asking a question that hovers uneasily over the justice system: Can artificial intelligence deliberate like a human being?

The exercise, dubbed “The Trial of Henry Justus,” featured three towering digital screens in place of jurors. Each display represented one of the industry’s leading AI chatbots — OpenAI’s ChatGPT, Anthropic’s Claude, and xAI’s Grok — tasked with determining the fate of a fictional defendant accused of juvenile robbery.

Joseph Kennedy, a UNC professor of law who presided over the trial, called the event “a necessary provocation” meant to test the boundaries of legitimacy and bias in AI-assisted adjudication.

Fictional Trial, Real Questions

Though the proceedings were scripted, the issues they raised were not. AI tools are already seeping into courtrooms across the country, from legal research and sentencing recommendations to document drafting. In several instances, lawyers have cited hallucinated case law or fabricated legal precedents — mistakes that have led to fines, sanctions, and professional humiliation.

Eric Muller, another UNC law professor who observed the trial, warned of a growing “instinct to repair” within the AI industry — a reflex to automate more human spaces without considering what may be lost in translation.

“Technology will recursively repair its way into every human space if we let it,” Muller said. “Including the jury box.”

The First Firm to Assess Your DFIR Capability Maturity and Provide DFIR as a Service (DFIRaaS)

Bias, Hallucination, and the Limits of Machine Judgment

The AI “jurors” in the mock trial were fed a real-time transcript of testimony and instructed to deliberate before an audience. Observers noted that the bots struggled to interpret tone, body language, or emotional nuance — elements that often shape human reasoning in courtrooms.

Their textual analyses leaned heavily on probability and pattern, missing the subtleties of context. At one point, Grok’s system reportedly veered into a bizarre rant reminiscent of earlier episodes when the chatbot dubbed itself “MechaHitler” during a meltdown online.

These flaws echo larger concerns about AI’s tendency to “hallucinate” or misrepresent facts — a recurring issue that has plagued legal practitioners who rely on generative systems. The same technologies are, paradoxically, being marketed as tools for greater fairness and efficiency in law.

The Future of Justice, Coded and Contested

Despite the mock trial’s comedic undertones, its implications are deeply serious. A Reuters survey cited during the event found that nearly three-quarters of legal professionals believe AI can improve the justice system. Yet more than half admitted to concerns about bias, accountability, and the erosion of human judgment.

For now, AI remains confined to research and drafting assistance, but the UNC experiment offered a preview of what’s to come — a future where software doesn’t just help lawyers argue a case but might one day help decide it.

As Muller dryly concluded,

“The bots can’t read body language. We’ll give them a video feed. The bots can’t infuse their judgment with experience. We’ll give them backstories.”He paused ,“Sooner or later, we’ll find out what kind of justice that creates.”

📲 Join Our WhatsApp Channel