devtake.dev
Topic

AI models

The model layer moves weekly. We follow capability jumps (SWE-bench, CursorBench, long-context), the regressions the marketing decks don’t mention, and the widening gap between what labs claim and what independent testers measure. We also cover the open-weights side closely — when a 35B MoE on a laptop out-draws a frontier API, that’s the kind of story you won’t read on a lab blog.

78 articles in this topic

Anthropic's announcement artwork for the Fable 5 and Mythos 5 access suspension, a soft gradient panel with the Claude wordmark.
AI·

Days after opening Fable 5 to the public, a US government order forced Anthropic to pull it

A Commerce Department export directive forced Anthropic to disable Fable 5 and Mythos 5 for all users, days after opening Fable 5 to the public.

Gemini Intelligence interface on an Android phone
Android·

Gemini Intelligence turns Android 17 into an agent that drives your apps

Google's Android Show pitched Gemini Intelligence and AppFunctions, an MCP-style way for the assistant to call inside your apps. Here's how it works and what to watch.

A MacBook Pro beside a Surface Book, both open on a white surface, USB-C ports in view
AI·

Running a coding agent fully on Apple Silicon, no cloud, is now an off-the-shelf stack

A popular Hacker News how-to walked through a fully local coding agent on Apple Silicon. Here's the realistic 2026 stack: runner, model, and harness.

Anthropic's announcement artwork for Claude Fable 5 and Claude Mythos 5, a soft gradient panel with the Claude wordmark.
AI·

Claude Fable 5 is Anthropic's first public Mythos-class model. It tops SWE-Bench Pro at 80.3%.

Claude Fable 5 hits 80.3% on SWE-Bench Pro and ships on Bedrock and Copilot at $10/$50 per million tokens, free on paid plans only through June 22.

A hand holds a smartphone showing the Claude Mythos app logo against a dark backdrop with Anthropic's orange burst symbol.
Policy·

Anthropic is sending Mythos 5, the model it called too dangerous, to cyberdefenders and the US government

Mythos 5 is the same model as Fable 5 with cyber safeguards lifted, going to Project Glasswing defenders and, Anthropic says, ~150 orgs across 15+ countries.

Apple Intelligence branding from Apple's WWDC 2026 announcement
Apple·

Apple rebuilt Siri on Google's Gemini and is paying $1 billion a year for it

At WWDC 2026 Apple shipped Siri AI, rebuilt on a custom Google Gemini model running on its own servers. Here are the catches behind the demo.

Abstract cybersecurity illustration of a glowing padlock over a circuit board, representing data protection
AI·

OpenAI added a Lockdown Mode to ChatGPT to blunt prompt-injection attacks

OpenAI shipped Lockdown Mode in ChatGPT to cut off the data-exfiltration step of prompt-injection attacks. Here's what it actually restricts and who should turn it on.

The South Facade of the White House in Washington, with the fountain and South Lawn in the foreground.
Policy·

Sriram Krishnan is leaving the White House AI job to build an outside policy institution

Sriram Krishnan, the a16z partner who co-wrote the AI Action Plan, leaves his White House senior AI advisor role at the end of June 2026. Here's what changes.

The White House in Washington, D.C., where the executive order was signed
Policy·

Trump dropped the mandatory AI model review after Silicon Valley pushed back

Trump's June 2 AI executive order asks for a voluntary 30-day model review, down from a mandatory 90-day one. Here's what got cut and who pushed.

OpenAI's Codex branding over a code background, illustrating Codex expanding across the ChatGPT app.
AI·

OpenAI is putting Codex in every ChatGPT app, with six business plugins for non-coders

On June 2 OpenAI said Codex is coming to the ChatGPT app everywhere within weeks, and shipped six role-specific plugins for sales, analytics, design, and finance teams.

The Stanford Law School building on Stanford University's campus
AI·

Stanford tested AI against law professors. The pros picked the AI 75% of the time.

A blinded Stanford Law study had 16 professors grade AI tutoring answers against their own. Here's what the 75% win rate actually measures, and what it doesn't.

Anthropic's announcement artwork for Claude Opus 4.8, a soft gradient panel with the Claude wordmark.
AI·

Claude Opus 4.8 flags the bugs it writes four times more often than Opus 4.7

Anthropic's Opus 4.8 posts 69.2% on SWE-Bench Pro, lets code flaws slip 4x less often, and ships parallel subagents in Claude Code. Here's what matters.

A source-code editor open to C++ code, evoking the debate over AI-written contributions to open source
Open Source·

SQLite won't accept AI-written code, but QEMU just opened the door to it

Two of the most cautious C projects split on AI contributions in the same week. The real fight is over copyright provenance and who cleans up the slop.

A developer's Emacs session in a Linux terminal, editing C source alongside a shell
AI·

Hacker News is obsessed with durable Postgres workflows and a game about clicking yes

Six dev-tooling and AI posts that climbed Hacker News in late May 2026: durable execution on plain Postgres, LLM code smells, a permission-fatigue game, Rust 1.96, and more.

DuckDuckGo's 'No AI' search promotion, the page the company points users to when they want AI features turned off.
Web·

Google said people love AI search. DuckDuckGo's installs jumped 30% the next week.

DuckDuckGo's US downloads climbed about 30% and its no-AI search page saw 28% more visits the week after Google's I/O push. The backlash is now measurable.

A software engineer at a laptop, the kind of AI-assisted coding workflow whose token costs blew through Uber's annual budget.
AI·

Uber blew its entire 2026 AI coding budget in four months. Its COO can't prove it paid off.

Uber exhausted its full-year Claude Code budget by April. Adoption hit 84%, heavy users burn $2,000 a month, and COO Andrew Macdonald can't connect the spend to shipped features.

DeepSeek social card with the company's wordmark on a navy background
AI·

DeepSeek locked in the 75% V4-Pro cut. The API now undercuts every Western frontier model.

On May 23 DeepSeek told customers the V4-Pro discount becomes its standard price after May 31. Output drops from $3.48 to $0.87 per million tokens.

Microsoft building exterior sign on a clear day.
AI·

Microsoft is canceling Claude Code for its engineers. They have until June 30 to switch to Copilot CLI.

Internal Claude Code licenses end June 30, 2026, for Microsoft's Experiences + Devices group. Engineers move to GitHub Copilot CLI instead.

Anthropic Project Glasswing announcement card with glasswing butterfly motif.
AI·

Anthropic's Glasswing logged 10,000 vulnerabilities in a month. Most are still waiting on a patch.

Anthropic says Project Glasswing's first month produced over 10,000 critical-and-high-severity vulns. Verification and patching is the limiting step.

Portrait of Andrej Karpathy, whose January 26 X thread on agentic coding was distilled into the viral CLAUDE.md file.
AI·

Karpathy posted four notes about Claude Code. The CLAUDE.md they spawned has 110K GitHub stars.

Forrest Chang turned Andrej Karpathy's January coding thread into a 70-line CLAUDE.md. It now has 110,000+ stars and has trended on GitHub for 28 weeks.

Diagram of an artificial neural network with input, hidden, and output layers
AI·

Andrej Karpathy joined Anthropic. The OpenAI founding member's job: use Claude to train Claude.

Karpathy started this week at Anthropic on Nick Joseph's pre-training team. His mandate is using Claude to accelerate Claude's own training.

Lead image from the Axios story about Anthropic's $15B SpaceX compute deal
AI·

SpaceX's S-1 revealed who's paying for Colossus. Anthropic just locked in $45B through 2029.

Anthropic is paying SpaceX $1.25 billion a month for Colossus 1 and 2 capacity. The contract runs through May 2029 and books about 83% of SpaceX's revenue.

An Alibaba booth at a Chinese technology trade expo, with the company's logo above a display floor.
Hardware·

Alibaba's new Zhenwu M890 chip is 3x faster and aimed straight at agent workloads

Alibaba showed the Zhenwu M890 at its Cloud Summit on May 19. 144 GB of memory, 800 GB/s interchip bandwidth, and Qwen3.7-Max riding on top.

A header image for The Android Show: XR Edition with the Android logo and an XR headset silhouette.
Hardware·

Google and Samsung set Fall 2026 for Android XR glasses. Gentle Monster and Warby Parker are doing the frames.

The Android Show confirmed Fall 2026 for Google and Samsung's first AR glasses, plus three new features for the Galaxy XR headset that launched in October.

An illustration of the Claude Code deeplink vulnerability, showing a malicious URL handler triggering a shell prompt.
Security·

A bad command-line parser turned every claude-cli:// link into a remote shell

Joernchen of 0day.click found a deeplink RCE in Claude Code. Anthropic shipped the fix in 2.1.118 the same week.

Elon Musk speaking at the World Economic Forum.
Policy·

A federal jury took two hours to throw out Elon Musk's lawsuit against Sam Altman and OpenAI.

On May 18 a nine-juror panel rejected every claim Musk filed against OpenAI in 2024. Judge Yvonne Gonzalez Rogers had told the courtroom she was ready to dismiss on the spot.

Anthropic announcement card with node shapes on coral background.
AI·

Anthropic bought Stainless, the startup that builds every official SDK for OpenAI and Google.

Anthropic announced May 18 it acquired SDK generator Stainless, reportedly for over $300M. The same toolchain still powers OpenAI's, Google's, and Cloudflare's official clients.

OpenAI's Codex inside the ChatGPT mobile app, showing a Codex review on a phone screen.
AI·

OpenAI's Codex moved into the ChatGPT mobile app. You can approve a diff from the train now.

OpenAI shipped Codex remote control inside the ChatGPT app for iPhone, iPad, and Android on May 14. Pair via QR; the agent runs on your laptop, the review moves to your phone.

Cerebras corporate site Open Graph card.
Hardware·

Cerebras priced its IPO at $185 and closed at $311. Andrew Feldman and Sean Lie became billionaires.

Cerebras raised $5.55 billion on May 14 and closed its first day at a $95 billion market cap. The wafer-scale AI chip maker shipped the year's biggest tech IPO.

GitHub Open Graph card for oven-sh/bun pull request #30412, the Rust rewrite merge.
Open Source·

Bun's million-line Rust rewrite is now mainline. 99.8% of tests pass and 13,000 unsafe blocks remain.

Jarred Sumner merged the Bun-in-Rust PR on May 14, ending Zig as Bun's runtime language. Binary shrinks 3-8 MB; one analysis counted 13,000 unsafe blocks.

Anthropic Object Store opengraph illustration in clay tones
AI·

Anthropic shipped Claude for Small Business with 15 prebuilt agents. Daniela Amodei is pitching the corner-store owner.

Anthropic announced Claude for Small Business on May 13 with QuickBooks, HubSpot, Canva, and DocuSign hooks. The pitch: 15 ready-to-run agents and a 10-city tour.

Google Googlebook laptop promotional thumbnail showing the device and Gemini branding
AI·

Google's Magic Pointer turns the cursor into a Gemini prompt. The first Googlebooks ship this fall.

Google announced Googlebook on May 12: a premium laptop tier above Chromebook, with a Gemini-aware cursor called Magic Pointer. Acer, ASUS, Dell, HP, and Lenovo are in.

Cactus Compute YouTube thumbnail showing the team behind Needle
AI·

Cactus Compute distilled Gemini into a 26M tool-calling model. The trick: no feed-forward layers.

Needle is a 26M-parameter function caller distilled from Gemini 3.1 Flash-Lite. The Simple Attention Network drops MLPs and runs at 6,000 tok/s prefill on edge silicon.

Cyera Research disclosure illustration for the Bleeding Llama vulnerability in Ollama's model execution pipeline
Security·

A crafted Ollama model file leaks the whole server's memory. 300,000 instances are exposed.

Cyera disclosed CVE-2026-7482 on May 1, a CVSS 9.1 unauthenticated heap read in Ollama. Three API calls dump prompts, env vars, and API keys from any open instance.

The Register's coverage of Bun's experimental Zig-to-Rust port
Open Source·

Jarred Sumner rewrote 960,000 lines of Bun from Zig to Rust in six days. He might throw it all away.

Bun's creator used Claude to port the JavaScript runtime from Zig to Rust, hitting 99.8% test compatibility. He says there's a 'very high chance' it gets scrapped.

Illustration accompanying ChinaTalk's investigation into grey-market Claude API proxy networks
AI·

Chinese proxy networks sell Claude API access at 90% off. They harvest every prompt that passes through.

A ChinaTalk investigation reveals how 'transfer stations' resell Anthropic API access using stolen credentials, model substitution, and prompt harvesting.

The DELEGATE-52 project repository on GitHub, showing Microsoft's benchmark for testing LLM document editing fidelity
AI·

Microsoft tested 19 LLMs as document editors. Even the best ones corrupted 25% of the content.

The DELEGATE-52 benchmark tests AI editing across 52 professional domains. Frontier models corrupt a quarter of document content over long workflows.

Illustration representing DOGE and government technology
Policy·

A judge killed DOGE's grant purge. The 'review process' was asking ChatGPT 'Is this DEI?'

A federal judge restored $100M+ in grants after two DOGE staffers used ChatGPT to flag 97% of NEH grants as DEI, including an HVAC repair and Holocaust research.

An iPhone displaying AI-related features
Apple·

Apple is turning iOS 27 into an AI model marketplace. ChatGPT loses its exclusive slot.

Bloomberg reports Apple will let users choose Gemini, Claude, or ChatGPT across Siri, Writing Tools, and Image Playground via a new Extensions framework this fall.

A mathematics lecture hall with equations on blackboards
AI·

Timothy Gowers gave GPT 5.5 an open math problem. It returned a novel proof in 17 minutes.

The 1998 Fields Medal winner reports GPT 5.5 Pro produced a novel proof for an unsolved math problem in 17 minutes, and says the era of owning theorems is ending.

Aerial view of farmland where a data center project is planned
Policy·

A Michigan town voted against a $16B data center. The lawsuit was filed two days later.

Saline Township rejected rezoning for a 1.4 GW OpenAI-Oracle data center. Related Digital sued in 48 hours, and construction is underway.

Cartoon Claude Code terminal flexing two muscular arms against a terracotta background
AI·

Anthropic doubled Claude Code's limits by renting 220,000 GPUs from xAI

Anthropic doubled Claude Code's 5-hour limits, killed peak-hours throttling, and raised Opus API tiers. The capacity comes from xAI's Colossus 1, via a SpaceX deal.

A smartphone screen showing the Snapchat app interface
AI·

Perplexity's $400M Snapchat search deal is dead. Snap pulled it from guidance.

Snap revealed in its Q1 2026 earnings that its November $400M deal to put Perplexity inside Snapchat 'amicably ended' before any broader rollout shipped.

Stylized GitHub Copilot mascot melting into glowing puddles in front of a wall of flames — a visual metaphor for the steep multiplier hike on annual plans.
AI·

GitHub Copilot's Claude Opus multiplier jumps to 27x on June 1. Monthly plans dodge the hike.

GitHub's new model multiplier table for Copilot Pro and Pro+ annual plans lands June 1. Opus 4.6 goes 3 to 27. Sonnet 4.6 goes 1 to 9.

Anthropic CEO Dario Amodei photographed at Bloomberg House during the World Economic Forum.
AI·

Anthropic is fielding offers at a $900B valuation. The round closes in two weeks and tops OpenAI.

Preemptive bids put Anthropic at $850B-$900B with a $50B raise. Run rate hit $30B in March, up from $9B at year-end 2025.

Google logo press image used by 9to5Google for Alphabet's Q1 2026 earnings coverage
AI·

Alphabet hit $109.9B in Q1 and is starting to sell TPUs to outside data centers

Alphabet posted $109.9B Q1 2026 revenue with Cloud up 63% and a $460B backlog. Sundar Pichai said Google will sell TPUs to select customers running them in their own data centers.

Title card for Boris Cherny's 'Mastering Claude Code in 30 Minutes' Anthropic workshop talk.
AI·

Anthropic just dropped its Claude Code workshop tapes. The playbook is better than the marketing.

Boris Cherny on Claude Code, Applied AI on prompting, Erik Schluntz on vibe coding in prod. Three Code with Claude tapes hit YouTube ahead of the 2026 conference.

Warp terminal product screenshot from the company's website.
Open Source·

Warp's terminal is now open source. The cloud agent platform Oz is the actual product.

Warp released its 36k-star Rust client on GitHub under AGPLv3 on April 28. OpenAI is the founding sponsor and Oz keeps the bills paid.

AWS marketing illustration of an interconnected machine-learning workflow.
AI·

OpenAI's models are on AWS Bedrock the day after Microsoft lost exclusivity

Amazon shipped Bedrock Managed Agents powered by OpenAI on April 28, plus Codex on Bedrock. Altman tells Stratechery the runtime matters as much as the model.

Anthropic Claude generic brand graphic shown in promotional material for enterprise customers.
AI·

Disney built an AI leaderboard. One employee called Claude 460,000 times in nine days.

Leaked internal Disney screenshots show 4,800 product and tech staff burning 3.1 billion Claude tokens and 13.3 billion Cursor tokens across nine April workdays.

GitHub Octocat mark on a dark gradient, the cover graphic on the GitHub Blog post announcing the Copilot billing change.
AI·

GitHub Copilot kills premium requests on June 1. Token billing arrives, fallback models do not.

On June 1 every Copilot plan switches to GitHub AI Credits priced per token. Code completions stay free. Fallback models and credit rollover do not.

Microsoft and OpenAI logos paired on a navy gradient backdrop.
AI·

Microsoft and OpenAI just rewrote their deal. Exclusivity is dead, and so is the AGI clause.

Microsoft loses exclusive rights to OpenAI's models. The revenue share now caps at 2030 and stops depending on AGI. Here's what actually changed and who it benefits.

Arcee AI Trinity branding from the Trinity-Large-Thinking blog post.
Open Source·

Arcee's Trinity-Large-Thinking is a 399B open MoE that costs 96% less than Opus

Arcee released Trinity-Large-Thinking on April 1: a 399B-param sparse MoE with 13B active, Apache 2.0 weights, $0.88 per million output tokens, and PinchBench just behind Opus 4.6.

A malicious GGUF file owns your SGLang server: CVE-2026-5760 is an unpatched 9.8
Security·

A malicious GGUF file owns your SGLang server: CVE-2026-5760 is an unpatched 9.8

SGLang's reranker renders chat templates without a sandbox. Load a hostile GGUF, hit /v1/rerank, and the attacker has Python on your inference box. No patch yet.

OpenAI just retired SWE-bench Verified. The headline coding benchmark of 2025 is officially saturated.
AI·

OpenAI just retired SWE-bench Verified. The headline coding benchmark of 2025 is officially saturated.

OpenAI says SWE-bench Verified is saturated and contaminated, and 60% of remaining problems are unsolvable. Here's what comes next, and why every coding leaderboard is suspect.

A padlock chained to a smartphone displaying a lock icon, illustrating data privacy.
AI·

OpenAI's Privacy Filter is a 1.5B PII redactor that ships under Apache 2.0. Here's what it actually does.

OpenAI released Privacy Filter on April 22 as an open-weight on-device model for masking eight types of PII. F1 of 96%. Runs in a browser. Here's the catch.

Illustration of an AI-driven chip design process from IEEE Spectrum's coverage.
AI·

An AI agent built a working RISC-V CPU from a 219-word prompt in 12 hours. Here's what it actually did.

Verkor's Design Conductor agent went from a 219-word spec to a tape-out-ready RISC-V core called VerCore in 12 hours. The catch: it's still a Celeron.

Anthropic Project Glasswing branding from Anthropic's news page.
AI·

A Discord group guessed Anthropic's URL pattern and walked into Claude Mythos

Bloomberg reports a small group accessed Anthropic's locked-down Mythos model the same day it launched, using credentials from a third-party contractor and educated URL guessing.

Google Cloud Next 2026 keynote branding from the Google Cloud blog.
Apple·

Google's Cloud chief just confirmed the deal: Gemini will power the new Siri this year

At Cloud Next 2026, Thomas Kurian named Apple as a customer and Gemini as the engine behind 'a more personalized Siri coming later this year.' Apple has stayed silent.

Cerebras Systems brand image from the Cerebras website.
Hardware·

Cerebras files for an IPO again, this time with $510M in revenue and a $10B OpenAI deal in its pocket

Cerebras filed an S-1 on April 17 listing as 'CBRS,' targeting roughly $23B at the prior private mark. The OpenAI inference deal is the line item that changed the story.

Aikido Security illustration of the GPT-Proxy backdoor.
Security·

Malicious npm and PyPI packages turn dev servers into Chinese LLM proxies

Aikido found a stage-2 Go binary inside two health-check-themed packages that runs an OpenAI-compatible router routing Claude, GPT, and Gemini traffic through Chinese aggregators.

DeepSeek social card from the V4 API documentation release post.
AI·

DeepSeek V4 lands: 1.6T-param open MoE, 1M-token context, and SWE-bench within 0.2 of Opus 4.6

DeepSeek shipped V4-Pro and V4-Flash under MIT on April 24. V4-Pro hits 80.6% on SWE-bench Verified. V4-Flash is $0.14 in / $0.28 out.

Anthropic brand illustration used on the Anthropic newsroom.
AI·

Google is putting up to $40B into Anthropic. That's five days after Amazon's $5B.

Google committed $10B upfront and up to $40B total at a $350B valuation, plus five gigawatts of Google Cloud capacity. It's Anthropic's second nine-figure deal in a week.

Anthropic Engineering postmortem cover image.
AI·

Anthropic admits three Claude Code bugs quietly tanked quality for six weeks

Anthropic's April 23 postmortem names three bugs that degraded Claude Code between March 4 and April 20. Usage limits are being reset for every subscriber.

OpenAI's GPT-5.5 model launch with ChatGPT and Codex interfaces
AI·

OpenAI shipped GPT-5.5 seven weeks after 5.4. API tokens now cost twice as much.

OpenAI released GPT-5.5 (codename Spud) on April 23. The API runs at $5/$30 per million tokens, double GPT-5.4, with Pro at $30/$180.

OpenAI workspace agents launch graphic
AI·

OpenAI's Workspace Agents kill Custom GPTs and take the fight straight to Claude Code

Workspace Agents for ChatGPT Business, Enterprise, Edu, plus Teachers launched April 22. Team-shared, cloud-run, Codex-powered. Free until May 6, then credit-based.

Mozilla Firefox 150 security announcement cover graphic
Open Source·

Mozilla fixed 271 Firefox bugs that Claude Mythos found. Its own tests caught 22.

Firefox 150 shipped Monday with 271 security fixes from Anthropic's Project Glasswing. Mozilla CTO Bobby Holley says Mythos matches elite human researchers.

GitHub Copilot announcement cover graphic
AI·

GitHub Copilot paused new signups and kicked Opus out of Pro. Here's what actually changed.

GitHub froze Copilot Pro/Pro+/Student signups on April 20 and moved Claude Opus 4.7 behind the $39 Pro+ tier. Agent workflows broke the old math.

Anthropic illustration from the Amazon compute deal announcement.
AI·

Amazon puts another $5B into Anthropic. Anthropic promises $100B back to AWS.

Amazon added $5B (up to $20B) to its Anthropic stake. Anthropic committed $100B+ to AWS over 10 years and 5 GW of Trainium capacity.

Illustration for Anthropic's Project Glasswing, a cybersecurity program powered by Claude Mythos Preview
AI·

NSA is running Anthropic's Mythos. The Pentagon says Anthropic is a supply-chain risk.

Axios reports the NSA is using Anthropic's unreleased Mythos model even though the Defense Department has blacklisted Anthropic. One government, two positions.

Cloudflare Unweight tensor compression announcement social graphic
Open Source·

Cloudflare open-sourced a lossless LLM compressor that shaves 22% off model weights

Unweight is Cloudflare Research's new BF16 weight compressor. 22% smaller bundles, 13% smaller inference footprint, 30-40% throughput overhead, BSD license.

Anthropic's Claude Design announcement illustration, a quill on a cactus-green background
AI·

Anthropic shipped Claude Design. Figma stock dropped 7% the same day.

Anthropic launched Claude Design on April 17, a prompt-to-prototype tool that exports to Canva, not Figma. Figma's stock closed down 7% on the same day.

Screenshot of the updated OpenAI Codex Mac app with background computer-use panel
AI·

OpenAI's Codex now drives your Mac, not just your code

OpenAI shipped a Codex update that can pilot desktop apps with a cursor, generate images in-line, and run parallel agents. It's the opening move in a real Claude Code fight.

Header card from Simon Willison's 'Qwen3.6 beats Opus' post comparing pelican SVGs
AI·

Qwen 3.6-35B-A3B: the open MoE beating Opus 4.7 on Simon Willison's laptop

Alibaba's Qwen 3.6-35B-A3B is a 35B-param mixture-of-experts with only 3B active. Apache 2.0, runs on consumer GPUs, and it's already winning real tasks.

Claude Opus 4.7 launch artwork from the Anthropic news post
AI·

Claude Opus 4.7 is here, and the long-context benchmarks got worse

Anthropic's Opus 4.7 is state-of-the-art on SWE-bench and CursorBench, but independent tests show regressions on long-context retrieval and thematic reasoning.

Google Gemini app running on a Mac desktop showing the mini chat interface
AI·

Google Gemini finally has a Mac app, and it's gunning for ChatGPT's desktop lead

Google shipped a native Swift Gemini app for macOS with screen sharing, voice, and Deep Research. Here's what it does, what it doesn't, and how it stacks up.

Abstract visualization of cybersecurity and AI defense systems
AI·

OpenAI launches GPT-5.4-Cyber for defensive security, opens access to thousands

OpenAI's new cybersecurity-tuned model can reverse-engineer binaries and analyze malware. It's restricted to verified defenders through the Trusted Access program.

Claude wordmark on Anthropic's introducing-Routines announcement
AI·

Claude Code Routines: what they actually do, and when to use them over GitHub Actions

Anthropic just shipped Routines: Claude Code sessions as cron jobs, webhooks, and GitHub-event reactors. Here's what they replace, what they don't, and one rule to follow.