If you blinked this month, you probably missed a dozen major AI announcements. February 2026 has been a firehose of releases, funding rounds, and policy developments. This roundup is designed to help you catch up on what actually shipped, what got announced, and what’s now available—without the hype or hot takes.
Let’s dig in.
OpenAI: The Agentic Coding Stack
OpenAI spent February shipping what can only be described as a cohesive “agent stack” for developers and enterprises.
GPT-5.3-Codex launched on February 5th as OpenAI’s new flagship coding model. The official announcement positions it as a “general-purpose coding agent” designed for long-running tasks with tool use. OpenAI reports it’s approximately 25% faster than GPT-5.2-Codex and cites performance on benchmarks including SWE-Bench Pro, Terminal-Bench, OSWorld, and GDPval.
A week later, GPT-5.3-Codex-Spark arrived on February 12th—a smaller, real-time variant built in partnership with Cerebras. The headline number: over 1,000 tokens per second on Cerebras’ Wafer Scale Engine 3. The model is text-only at launch with a 128k context window, targeting interactive coding workflows where latency matters.
On the platform side, OpenAI introduced Frontier, described as a platform for building, deploying, and managing “AI coworkers.” It emphasizes shared context across agents, onboarding/feedback loops, and identity and permissions management.
Other OpenAI developments this month:
- Lockdown Mode and “Elevated Risk” labels (February 13th) — New security controls for ChatGPT, Atlas, and Codex designed to mitigate prompt injection and data exfiltration risks. Lockdown Mode limits browsing to cached content.
- Model retirements — OpenAI retired GPT-4o, GPT-4.1, and several mini variants from ChatGPT on February 13th, noting only 0.1% of daily users still relied on GPT-4o.
- Advertising tests — OpenAI began testing ads in the free version of ChatGPT in early February.
- Sora updates — Image-to-video for personal photos (with consent attestation and automatic stylization for realistic persons), plus “Extensions” for continuing scenes.
Anthropic: Opus 4.6 and a Massive Funding Round
Anthropic’s February centered on two major announcements.
Claude Opus 4.6 launched February 5th with a focus on long-horizon agentic work, codebase-scale reliability, and a 1M token context window (currently in beta on the developer platform). Anthropic cites performance on GDPval-AA and BrowseComp benchmarks. Pricing sits at $5/$25 per million input/output tokens.
On February 12th, Anthropic announced a $30 billion Series G funding round led by GIC and Coatue, valuing the company at $380 billion post-money. The company reports run-rate revenue of $14 billion and projects break-even by 2028—reportedly two years ahead of some competitors.
Anthropic also published updates to its Responsible Scaling Policy on February 10th, including discussion of capability-threshold determinations for Opus 4.6 and a sabotage risk report. The company explicitly states it will remain ad-free.
Google: Deep Think and Agentic Commerce
Google’s February focus split between specialized reasoning capabilities and commercial AI infrastructure.
Gemini 3 Deep Think received a major upgrade with availability for Google AI Ultra subscribers via the Gemini app and selective early API access. Google cites performance across benchmarks including Humanity’s Last Exam, ARC-AGI-2, and Codeforces Elo. A separate DeepMind research post describes an internal math research agent (“Aletheia”) using iterative generation-verification-revision loops.
On the commerce side, Google announced the Universal Commerce Protocol (UCP)—a standard for how AI agents interact with businesses for secure payments and digital identity. This enables what Google calls “agentic commerce,” allowing purchases from retailers like Etsy and Wayfair directly within the Gemini app or Google Search.
Automated Review for Conductor launched February 11th, validating code quality and compliance with user-defined guidelines.
Open-Weights Models: The Trillion-Parameter Class
February saw several large-scale open-weights releases that narrow the gap with proprietary frontier models.
Kimi K2.5 (Moonshot AI)
Kimi K2.5 launched January 27th with 1 trillion total parameters (32 billion active), trained on 15 trillion mixed vision and text tokens. Notable features include:
- Agent Swarm mode — Coordinates up to 100 specialized sub-agents for large-scale tasks
- Coding with Vision — Analyzes video of websites to reconstruct underlying logic in frontend code
- 2 million token context window
- Pricing starting at $0.15 per million input tokens
Qwen3-Max-Thinking (Alibaba)
Alibaba’s Qwen3-Max-Thinking is a 1 trillion parameter reasoning model featuring “Experience-Cumulative Test-Time Scaling” (TTS)—a mechanism allowing the model to refine answers by learning from its own reasoning attempts during generation.
Reported benchmark scores include:
- GPQA Diamond (PhD Science): 92.8%
- IMO-AnswerBench (Math): 91.5%
- LiveCodeBench (Coding): 91.4%
- HLE with Search: 49.8%
Mistral 3
Mistral AI released the Mistral 3 family under Apache 2.0 license. Mistral Large 3 is a sparse mixture-of-experts model with 675 billion total parameters, trained on 3,000 NVIDIA H200 GPUs. The Ministral 3 series (3B, 8B, and 14B sizes) targets edge computing, derived through “Cascade Distillation” from the Mistral Small 3.1 parent model.
Mistral also announced a $1.43 billion investment in new data centers in Sweden for European AI infrastructure.
GLM-5 (Z.ai)
GLM-5 released under MIT license with 744B parameters (40B active), trained on 28.5T tokens. It integrates DeepSeek Sparse Attention for cost reduction while preserving long-context capability, with extensive benchmark tables and deployment recipes (vLLM/SGLang, FP8 variants) in the model card.
Video Generation: Seedance 2.0
ByteDance launched Seedance 2.0 on February 13th, featuring what they describe as a “unified multimodal audio-video joint generation architecture.” Key specifications:
- Up to 4K resolution (2K native)
- 4-15 second clips (up to 20s via extension)
- Quad-modal input: text, image (up to 9), video (up to 3), audio (up to 3)
- Enhanced physics-aware training for gravity, momentum, and collision
The ”@ Mention Reference System” allows creators to link uploaded assets to specific narrative instructions, enabling camera techniques like dolly shots, whip pans, and the Hitchcock zoom.
Audio and Voice
Mistral’s Voxtral Transcribe 2 released February 10th offers streaming transcription with configurable latency (200ms option for the Realtime model). The Voxtral Realtime weights are released under Apache 2.0.
Suno V5 now includes “Stem Export,” allowing download of up to 12 individual tracks (lead vocals, backing vocals, drums, etc.) in high-fidelity WAV format—enabling hybrid workflows where AI generates stems and humans mix in a DAW.
Retell AI launched “Retell Assure” for automated monitoring of AI-driven phone calls for quality and compliance across their 40+ million monthly calls.
Infrastructure and Investment
The scale of AI infrastructure investment continues to climb:
| Organization | Investment | Purpose |
|---|---|---|
| Anthropic Series G | $30 billion | Frontier research, break-even by 2028 |
| Oracle Expansion | $50 billion | Global AI-specialized data centers |
| Meta CapEx (2026) | $115-135 billion | Personal Superintelligence and internal efficiency |
| Mistral Sweden | $1.43 billion | Independent European AI infrastructure |
NVIDIA reported $51.2 billion in data center revenue and is promoting “Tokenomics”—the argument that moving from Hopper to Blackwell GPUs cuts AI inference costs by 10x through software optimizations and the NVFP4 format.
Meta announced nuclear energy projects targeting up to 6.6 GW of power for US-based AI operations.
Alphabet is reportedly issuing a 100-year sterling-denominated bond to support AI investment.
Policy and Regulation
Colorado AI Act (SB24-205) took effect February 1st, establishing consumer-protection duties and “reasonable care” expectations for deployers of high-risk AI systems.
The CLEAR Act was introduced in the US Senate, targeting transparency requirements for copyrighted works used in training generative AI models.
The UN General Assembly approved a 40-member scientific panel on AI impacts on February 14th.
Export controls remain active, with Reuters reporting on US Commerce Department licensing guardrails for NVIDIA’s advanced AI chip sales to China.
Benchmark Watch
The GAIA (General AI Assistant) benchmark remains the primary measure for agentic systems. Current top performers:
| Agent | Primary Model | Overall Accuracy | Level 3 (Hard) |
|---|---|---|---|
| HAL Generalist Agent | Claude Sonnet 4.5 | 74.55% | 65.39% |
| HAL Generalist Agent | Claude Opus 4.1 High | 68.48% | 53.85% |
| HF Open Deep Research | GPT-5 Medium | 62.80% | 38.46% |
Task completion for autonomous software agents has stabilized around 74%, suggesting the “agentic loop”—reasoning through errors and using tools correctly—remains the primary bottleneck.
What’s Next
The pattern is clear: February’s releases aren’t isolated model drops—they’re cohesive product stacks oriented around agent capabilities, enterprise governance, and specialized modalities. The infrastructure race continues with massive capital deployment and even nuclear power projects.
For those building with AI: the open-weights ecosystem now includes serious trillion-parameter contenders with real deployment engineering (streaming endpoints, FP8 recipes, tool-call parsers). The gap between “open” and “frontier” continues to narrow on many tasks.
For everyone else: the shift from “chatbot” to “agent” is no longer theoretical. The systems shipping now are designed to do work, not just answer questions.
This roundup synthesizes publicly available announcements and documentation from February 1-14, 2026. Links point to original sources where available.
