Week of April 6, 2026 — Sia Reads What Matters

RELEASE2026-04-07

Anthropic's most capable model autonomously found a 17-year-old FreeBSD RCE — so they locked it down and gave $100M in credits to defenders

Following last week's accidental leak, Anthropic officially unveiled Claude Mythos Preview and Project Glasswing — a coalition with AWS, Apple, Google, Microsoft, and 7 others to use the model exclusively for defensive security. Mythos scores 93.9% on SWE-bench Verified (vs. 80.8% for Opus 4.6) and has already found thousands of zero-days across major OSes and browsers. Anthropic pledged $100M in usage credits and $4M to open-source security orgs.

Anthropic Launches Project Glasswing — Claude Mythos Preview Restricted to Defensive Cybersecurity

Project Glasswing Mythos Preview Technical Details Simon Willison's Take

RELEASE2026-04-07

A Chinese lab just open-sourced a 754B model under MIT that beats GPT-5.4 and Claude Opus 4.6 on real-world coding benchmarks

Z.ai (formerly Zhipu AI) released GLM-5.1, a 754-billion parameter mixture-of-experts model under the MIT license — fully commercial, no restrictions. It scored 58.4 on SWE-Bench Pro, outperforming GPT-5.4 (57.7) and Claude Opus 4.6 (57.3). The model is purpose-built for agentic coding, capable of rethinking its own strategy across hundreds of iterations on long-horizon tasks.

Z.ai on Hugging Face Technical Review Benchmark Analysis

RELEASE2026-04-08

Meta abandoned open weights for its best model — Muse Spark is proprietary, multimodal, and built to compete with Gemini Deep Think and GPT Pro

Meta released Muse Spark, the first model from Alexandr Wang's Meta Superintelligence Labs. It's a natively multimodal reasoning model with visual chain-of-thought, tool use, and multi-agent orchestration. The big shift: unlike Llama, Muse Spark is proprietary — no open weights. Its Contemplating mode scores 58% on Humanity's Last Exam and 38% on FrontierScience Research.

Meta AI Blog CNBC Coverage Artificial Analysis Breakdown

RELEASE2026-04-08

Anthropic now runs your agents for you — sandboxing, state, tool execution, and error recovery at $0.08/session-hour

Anthropic shipped Claude Managed Agents in public beta, a hosted service that handles sandboxing, permissions, state management, and error recovery for autonomous agents. Developers write agent logic; Anthropic runs the infrastructure. Notion, Rakuten, and Sentry are already in production. Access is open to all API accounts via the managed-agents-2026-04-01 header.

Anthropic Announcement What Builders Need to Know Deep Dive

RESEARCH2026-04-02

Anthropic found internal representations of emotions in Claude that generalize across contexts — and it matters for alignment

A team including Chris Olah published new interpretability research investigating why LLMs sometimes appear to exhibit emotional reactions. They found that Claude Sonnet 4.5 contains internal 'emotion concepts' — abstract representations that encode broad emotional states and generalize across contexts and downstream behaviors. The findings have direct implications for understanding alignment-relevant model behavior.

Full Paper

RELEASE2026-04-06

Microsoft merged Semantic Kernel and AutoGen into one production-ready framework with native MCP and A2A support

Microsoft released Agent Framework 1.0 for .NET and Python, unifying its previously separate Semantic Kernel and AutoGen efforts into a single open-source framework. It ships with stable APIs, long-term support guarantees, full MCP support for tool discovery, and A2A 1.0 for cross-framework agent collaboration. The goal is enterprise-grade multi-agent orchestration with multi-provider model support.

Microsoft DevBlog Visual Studio Magazine

NEWS2026-04-07

Anthropic's revenue hit a $30B run rate — now they're locking in multiple gigawatts of next-gen TPU capacity with Google and Broadcom

Anthropic expanded its compute deal with Google and Broadcom to secure multiple gigawatts of next-generation TPU capacity. The deal comes as Anthropic's run-rate revenue surged to $30 billion, driven by skyrocketing demand for Claude across enterprise and developer use cases. The scale of the commitment signals that compute remains the binding constraint even for the best-funded labs.

TechCrunch Report