Moonshot’s Kimi K2.5 Quietly Launches, Beats US AI Models on Key Tests

Updated on: January 27, 2026

Kimi K2.5, the latest AI model from Moonshot AI, has quietly entered the scene and already outpaced top-tier US models across several AI benchmarks.

Quick Summary – TLDR:

Kimi K2.5 introduces powerful new features including visual coding, multimodal reasoning, and agent swarm execution.
The model scored higher than GPT-5.2, Claude Opus 4.5, and Gemini 3 Pro in key benchmarks like Humanity’s Last Exam and BrowseComp.
With four performance modes and seamless tool integration, it transforms workflows for developers, coders, and office professionals.
Kimi K2.5 is fully open source and signals China’s rising dominance in AI innovation.

What Happened?

Moonshot AI has released Kimi K2.5 without the fanfare typically seen with major model launches. But the quiet rollout belies its capabilities. The model, now live on kimi.com, integrates visual analysis, agent-style reasoning, and tool support while scoring industry-leading results across multiple benchmarks.

🥝 Meet Kimi K2.5, Open-Source Visual Agentic Intelligence.

🔹 Global SOTA on Agentic Benchmarks: HLE full set (50.2%), BrowseComp (74.9%)
🔹 Open-source SOTA on Vision and Coding: MMMU Pro (78.5%), VideoMMMU (86.6%), SWE-bench Verified (76.8%)
🔹 Code with Taste: turn chats,… pic.twitter.com/wp6JZS47bN
— Kimi.ai (@Kimi_Moonshot) January 27, 2026

A Stronger, Smarter AI from Moonshot

Moonshot’s Kimi K2.5 represents a major upgrade over its predecessor, Kimi K2. This new version comes with native support for vision and multimodal capabilities, allowing users to upload images for tasks like 3D modeling or layout interpretation. Early user feedback describes the leap in reasoning performance as dramatic.

Some standout features include:

Visual analysis: K2.5 handles complex visual tasks such as converting apartment layouts into 3D models and interpreting dynamic images.
Coding capabilities: It excels in front-end and visual coding, even reconstructing websites from video clips and debugging visually.
Tool integration: From logic puzzles to advanced programming, K2.5 walks through step-by-step problem-solving using integrated tools.

Agent Swarm: A New Paradigm

The highlight of K2.5 is its Agent Swarm mode, a beta feature that enables the model to self-direct up to 100 sub-agents for parallel task execution. These agents specialize in roles like fact-checking, physics research, or content synthesis.

According to Moonshot AI, this swarm can decompose complex tasks and run subtasks simultaneously, improving speed and efficiency. Performance benchmarks reveal:

50.2% on Humanity’s Last Exam (HLE-Full)
74.9% on BrowseComp with context
77.1% on DeepSearchQA
78.4% on BrowseComp with Agent Swarm mode

This design drastically reduces task completion time, reportedly cutting down execution paths by up to 4.5 times compared to single-agent systems.

Coding and Productivity in Focus

Beyond benchmarks, K2.5 is engineered for professional productivity. It can create interactive websites, debug autonomously, and produce documents, spreadsheets, and presentations through natural conversation.

Paired with Kimi Code, developers can now use the model in their favorite IDEs like VSCode and Zed. K2.5 is particularly well-tuned for image and video inputs and supports real-time updates across workflows.

Notable capabilities include:

End-to-end office work handling (Word, Excel, LaTeX).
Producing long-form documents up to 10,000 words.
Visual inspection for autonomous code debugging.
Integrations through an open-source terminal interface.

China’s Open Source Momentum

Kimi K2.5 signals China’s growing influence in AI. Backed by Alibaba, Moonshot AI has pushed out a model that outperforms closed models like GPT-5.2 and Gemini 3 Pro, while remaining fully open source.

This challenges the notion that only proprietary systems can lead AI development. With benchmarks like 76.8% on SWE-Bench Verified and 73.0% on SWE-Bench Multilingual, Kimi K2.5 sets new records in open AI coding tools.

SQ Magazine Takeaway

I’m honestly impressed by how fast Moonshot AI is moving. The leap from Kimi K2 to K2.5 is massive. This model isn’t just trying to keep up with OpenAI or Anthropic. It’s competing head-on, and in many areas, pulling ahead. What excites me most is how accessible this power is. An open-source, benchmark-beating AI that can handle code, documents, visuals, and reasoning? That’s not just good news for China, it’s good news for everyone who wants AI to be more open, usable, and practical.