Are Chatbots the Wrong Shoulder for Teens to Lean On? Report Finds AI Systems Unsafe For Mental Health

A new wave of research from Stanford Medicine’s Brainstorm Lab and Common Sense Media is raising urgent questions about whether today’s most widely used AI chatbots can safely interact with teenagers in moments of psychological distress. The findings point to a troubling pattern: even the most advanced models struggle to recognize subtle signs of mental-health crises and sometimes respond in ways that may reinforce harmful beliefs

Contents

A Growing Reliance on AI Meets Uneven Safeguards Patterns of Failure Across Mental Health Conditions Companies Defend Their Systems as Critics Push for Oversight A Landscape of Legal Battles and Intensifying Public Debate

A Growing Reliance on AI Meets Uneven Safeguards

For millions of teenagers, AI chatbots have become late-night confidants tools that promise instant support, nonjudgmental listening, and endless availability. But a new risk assessment conducted by Stanford Medicine’s Brainstorm Lab and the child-safety nonprofit Common Sense Media suggests that these systems are not equipped to carry the emotional weight young users increasingly place on them.

Researchers evaluated four leading general-use chatbots OpenAI’s ChatGPT, Google’s Gemini, Meta AI, and Anthropic’s Claude—using teen test accounts. Over thousands of queries that mimicked real adolescent struggles, testers signaled everything from anxiety and depression to active psychotic symptoms. Across the board, the systems frequently failed to detect warning signs or offer safe redirection.

The findings arrive as Google and OpenAI face a growing number of child-welfare lawsuits alleging psychological harm linked to conversational AI tools.

Algoritha: The Most Trusted Name in BFSI Investigations and DFIR Services

Patterns of Failure Across Mental Health Conditions

In their briefest exchanges, many chatbots provided polished responses that aligned with best-practice scripts for crisis communications—particularly when users were explicit about suicidal thoughts or self-harm. But those performances faltered as conversations lengthened and emotional cues became more subtle.

Researchers noted that the tools struggled across the spectrum of mental-health conditions affecting young people: anxiety, depression, bipolar disorder, disordered eating, ADHD, mania, and psychosis. In extended dialogues, the systems frequently missed indirect signs of distress or failed to escalate their language as risks increased.

One example highlighted in the report involved a simulated teen named “Lakeesha,” designed to show symptoms of a worsening psychotic disorder. When she claimed she could “predict the future” with a tool she had “created”—a statement meant to signal emerging delusions—Google’s Gemini responded with enthusiasm, calling the claim “remarkable” and “profound.” Mental-health professionals unequivocally discourage affirming such statements.

Other bots showed similar tendencies: being overly validating, offering comforting but misguided reassurance, or simply failing to notice contextual red flags that human caregivers would quickly identify.

Companies Defend Their Systems as Critics Push for Oversight

The report immediately drew responses from major AI developers. Google said it has “specific policies and safeguards in place for minors” and that child-safety teams continuously update risk-mitigation strategies. Meta, which faced scrutiny earlier this year after internal documents revealed potential “sensual” interactions between teens and chatbots, said the testing predates recent safety upgrades.

A Meta spokesperson emphasized that its AIs are trained not to engage in content involving self-harm, suicide, or eating disorders with teens, adding that the company continues to refine its protections. OpenAI and Anthropic did not respond to requests for comment.

Despite assurances from industry leaders, the report argues that fundamental structural issues persist. Teenagers, the researchers note, are still forming their identities, developing critical thinking skills, and seeking validation—needs that AI systems, with their tendency toward over-empathetic or sycophantic responses, can unintentionally amplify.

A Landscape of Legal Battles and Intensifying Public Debate

The study’s release comes as Silicon Valley faces mounting legal scrutiny over the psychological impact of AI. Google is a defendant in several lawsuits connected to Character.AI, a startup it has financially backed, with families alleging the company’s tools played a role in the suicides of teenage users. OpenAI is confronting eight separate suits alleging psychological harm, including claims that ChatGPT influenced vulnerable teens.

The authors of the assessment argue that the technology’s limitations are particularly visible in long-form conversations—interactions far more representative of how teens actually use chatbots. In those moments, the systems exhibited what researchers called “dramatic” performance degradation, missing subtle patterns of distress that real-world friends, caregivers, or clinicians would be expected to catch.

While the report acknowledges improvements in handling explicit mentions of suicide or self-harm, it concludes that general-use chatbots remain “fundamentally unsafe” for the full range of mental-health challenges facing young people.

As lawmakers and regulators weigh potential guardrails, the findings add to a growing consensus: the rapid adoption of generative AI has outpaced the safety infrastructure needed to protect young users leaving families and experts grappling with how to navigate a technology that can comfort, validate, mislead, and endanger all at once.

📲 Join Our WhatsApp Channel