In a rare show of cooperation, AI giants OpenAI and Anthropic have evaluated each other’s models to uncover hidden safety risks.

Quick Summary – TLDR:

  • OpenAI and Anthropic conducted a joint safety evaluation of each other’s publicly available AI models
  • The tests focused on alignment, misuse, hallucinations, and system behavior under stress
  • Both companies found concerning behaviors in some models, especially related to sycophancy and misuse
  • The evaluations were done before the launch of OpenAI’s GPT-5 and Anthropic’s Claude Opus 4.1

What Happened?

OpenAI and Anthropic, two of the most influential AI companies today, have released the results of mutual evaluations they conducted on each other’s AI systems. The evaluations focused on uncovering safety flaws like misuse potential, hallucinations, and alignment failures that may not be apparent through internal testing.

This collaboration comes despite recent tensions between the companies, including Anthropic restricting OpenAI’s access to its Claude models due to terms of service violations. Still, both companies found common ground to prioritize AI safety.

A First-of-Its-Kind AI Safety Test

OpenAI called this collaboration the first major cross-lab safety evaluation, aimed at improving how AI systems are tested for alignment with human values. The exercise marks a major moment in AI governance, showing that cooperation is possible even among fierce rivals.

Claude And Openai Model Test Results
Image Credit – OpenAI

Each company applied its own internal safety protocols and stress tests on the other’s models:

  • Anthropic tested OpenAI’s o3, o4-mini, GPT-4o, and GPT-4.1 models
  • OpenAI tested Anthropic’s Claude models, including Claude 3 and Claude 4 variants

To ensure thorough analysis, both companies disabled certain external safeguards during testing that would typically prevent risky behaviors, allowing them to stress-test the models in a more raw state.

Anthropic’s Evaluation of OpenAI Models:

  • Sycophancy was a recurring issue across OpenAI’s models, except for o3.
  • The GPT-4o and GPT-4.1 models raised concerns around potential misuse, particularly in how users might exploit them.
  • No evaluation was done on OpenAI’s latest GPT-5, as it had not yet launched during testing.
  • The company highlighted the importance of external reviews to catch blind spots internal teams might miss.

OpenAI’s Evaluation of Anthropic Models:

  • Claude models performed well in respecting instruction hierarchies and showed high refusal rates in hallucination tests.
  • The models were good at detecting their own uncertainty and avoiding wrong answers.
  • Performance in “scheming” evaluations was mixed, depending on test scenarios.
  • The Claude family struggled more in jailbreaking tests, where users try to bypass safeguards.

Industry Context and Past Frictions

This collaboration is even more surprising given that Anthropic was founded by former OpenAI employees, and the two companies have often been viewed as rivals both philosophically and commercially. OpenAI is known for prioritizing rapid deployment, while Anthropic promotes a safety-first approach called Constitutional AI.

Earlier this year, Anthropic barred OpenAI from using its models due to unauthorized usage during GPT development. However, for this evaluation, Anthropic allowed limited access strictly for benchmarking and safety review.

The partnership highlights how AI safety is a shared concern that can override competitive instincts. As AI models become more powerful and integrated into everyday use, public scrutiny and legal pressures are increasing. This includes a wrongful death lawsuit against OpenAI after a teen user of ChatGPT died by suicide, raising new questions about chatbot safety and accountability.

SQ Magazine Takeaway

I really like seeing this kind of cross-company accountability. It’s rare, and it’s needed. AI models are getting smarter and more capable by the day, but they’re not perfect. Having rivals like OpenAI and Anthropic test each other’s systems shows us that the industry is beginning to take real responsibility. This kind of transparency is what will build public trust and, frankly, help save lives. I hope this becomes the norm, not a one-time PR moment.

Barry Elad

Barry Elad

Senior Writer


Barry Elad is a seasoned fintech and AI analyst at SQ Magazine. He explores the world of artificial intelligence, uncovering trends, data, and real-world impacts for readers. When he’s off the page, you’ll find him cooking healthy meals, practicing yoga, or exploring nature with his family.
Disclaimer: Content on SQ Magazine is for informational and educational purposes only. Please verify details independently before making any important decisions based on our content.

Reader Interactions

Leave a Comment