
OpenAI's Codex now drives your Mac, not just your code
OpenAI shipped a Codex update that can pilot desktop apps with a cursor, generate images in-line, and run parallel agents. It's the opening move in a real Claude Code fight.
The model layer moves weekly. We follow capability jumps (SWE-bench, CursorBench, long-context), the regressions the marketing decks don’t mention, and the widening gap between what labs claim and what independent testers measure. We also cover the open-weights side closely — when a 35B MoE on a laptop out-draws a frontier API, that’s the kind of story you won’t read on a lab blog.
6 articles in this topic

OpenAI shipped a Codex update that can pilot desktop apps with a cursor, generate images in-line, and run parallel agents. It's the opening move in a real Claude Code fight.

Alibaba's Qwen 3.6-35B-A3B is a 35B-param mixture-of-experts with only 3B active. Apache 2.0, runs on consumer GPUs, and it's already winning real tasks.

Anthropic's Opus 4.7 is state-of-the-art on SWE-bench and CursorBench, but independent tests show regressions on long-context retrieval and thematic reasoning.

Google shipped a native Swift Gemini app for macOS with screen sharing, voice, and Deep Research. Here's what it does, what it doesn't, and how it stacks up.

OpenAI's new cybersecurity-tuned model can reverse-engineer binaries and analyze malware. It's restricted to verified defenders through the Trusted Access program.

Anthropic just shipped Routines: Claude Code sessions as cron jobs, webhooks, and GitHub-event reactors. Here's what they replace, what they don't, and one rule to follow.