The US government’s AI security centre will conduct pre-deployment evaluations of frontier artificial intelligence models from Google, Microsoft and xAI to assess whether their advanced capabilities pose cybersecurity risks. The plan, announced by the National Institute of Standards and Technology’s Center for AI Standards and Innovation, marks a significant step in federal efforts to examine powerful AI systems before public release.
Pre-Deployment Testing Planned
NIST said the new partnerships will help the agency and technology companies exchange information, support voluntary product improvements and give the government a clearer understanding of what advanced AI models can do.
FCRF Academy Launches Premier Anti-Money Laundering Certification Program
CAISI Director Chris Fall said independent and rigorous measurement science is essential to understanding frontier AI and its national security implications. An interagency task force at CAISI will allow officials from across the government to test the models, including in classified settings.
Industry and Government Collaboration Emphasised
Microsoft’s chief responsible AI officer, Natasha Crampton, said companies cannot carry out evaluations tied to national security and public safety entirely on their own. She said such assessments require close cooperation between industry and governments with technical and security expertise.
Crampton said Microsoft would apply what it learns directly to the way it designs, tests and deploys AI, while also sharing best practices to strengthen AI testing more broadly.
Shift Follows Concerns Over Claude Mythos
The move comes after Anthropic announced that its latest model, Claude Mythos, was too dangerous to publicly release because of its reported ability to find serious software vulnerabilities. The development prompted the Trump administration to reconsider its earlier hands-off approach to AI security reviews.
Alongside the voluntary CAISI evaluations, the administration is also considering mandatory government reviews of all new AI models. However, it remains unclear what standards CAISI will use for its assessments.
Devin Lynch, a former director for cyber policy and strategy implementation at the White House Office of the National Cyber Director, said capability assessments depend on the threat models behind them. He said CAISI would need to define and publish what it is testing for, not only the companies it is testing with.