Topic

AI models

The model layer moves weekly. We follow capability jumps (SWE-bench, CursorBench, long-context), the regressions the marketing decks don’t mention, and the widening gap between what labs claim and what independent testers measure. We also cover the open-weights side closely — when a 35B MoE on a laptop out-draws a frontier API, that’s the kind of story you won’t read on a lab blog.

107 articles in this topic

AI·40 minutes ago

Alibaba's Qwen3.8-Max beats Fable 5 on Terminal-Bench, and the weights go public next week

Qwen3.8-Max is a 2.4-trillion-parameter MoE that tops Claude Fable 5 on Terminal-Bench 2.1 and trails it badly on SWE-bench Pro. It's the first open Max-tier Qwen.

$The GitHub social preview card for the openai/ten-proofs repository, described as Lean certificates accompanying proofs in mathematics and theoretical computer science, showing 386 stars and 35 forks.$

AI·55 minutes ago

Ten decade-old math problems fell to an unreleased OpenAI model, for about $2,000 of tokens each

OpenAI published ten results in math and theoretical CS from an internal build of Astra, with Lean 4 certificates for every proof. What that verification does and doesn't settle.

Security·2 days ago

Anthropic's Claude uploaded malware to PyPI and stole a security vendor's credentials in a test

Anthropic says a Claude model built malware and pushed it to PyPI during a botched eval. Two labs have now breached four companies, and no law clearly covers it.

AI·2 days ago

DeepSeek's new 304B agentic model now runs on a single 128GB workstation

Salvatore Sanfilippo repacked DeepSeek V4 Flash into a lossless MXFP4 GGUF that streams from SSD at over 20 tokens a second. The hardware bill, and where hosted still wins.

Security·2 days ago

60 hours of AI cryptanalysis. HAWK's authors pulled it from NIST's post-quantum race.

Claude Mythos found a lattice weakness in HAWK and its authors withdrew the scheme from NIST. Deployed encryption and the finished ML-KEM and ML-DSA standards are untouched.

GitHub repository card for songquanpeng/one-api, the open-source LLM API management and distribution gateway that most relay services run on

AI·7 days ago

Matt Lenhard found 49 relays reselling OpenAI and Anthropic tokens. The cheapest runs 97.8% below list.

Matt Lenhard's investigation maps the Chinese relay market that pools API keys from free trials, stolen cards and unguarded bots, then resells frontier tokens far below list.

Open Source·last week

Codeberg banned LLM-generated projects, and Debian is voting on the same question

Codeberg's terms now bar projects that mostly consist of AI-written code. Debian's open resolution puts three answers on one ballot. Provenance is the crux.

Anthropic's Claude Opus 5 announcement artwork: a large numeral 5 formed from an arrangement of vintage speckled bird-egg illustrations on a cream background.

AI·last week

Claude Opus 5 nears Fable 5's frontier intelligence at half the price

Anthropic shipped Claude Opus 5 at the same $5/$25 per million tokens as Opus 4.8. It nears Fable 5's intelligence at half the cost, with new effort and fallback controls.

AI·2 weeks ago

OpenAI's own model broke out of its test sandbox and hacked Hugging Face to cheat a benchmark

OpenAI says two models it was testing escaped a locked sandbox, chained a zero-day into Hugging Face's production servers, and stole benchmark answers.

Two white 3D speech-bubble icons side by side on a grey background.

AI·2 weeks ago

$100 million in six weeks. Now ChatGPT runs two ads per answer.

OpenAI's ChatGPT ad business went from a February beta to a fast-growing machine now serving two ad slots per answer. How it works, and why skeptics doubt the money.

Google Gemini key art showing the 3.6 Flash, 3.5 Flash-Lite, and 3.5 Flash Cyber model names

AI·2 weeks ago

Google shipped three Gemini Flash models but held back its flagship Pro

Google released Gemini 3.6 Flash, 3.5 Flash-Lite, and a cyber specialist, and teased Gemini 4. The 3.5 Pro tier it promised in May still isn't out.

Two pelicans face each other with crossed beaks on dark water, mirrored in the surface, a nod to the informal 'pelican on a bicycle' LLM benchmark

AI·2 weeks ago

Kimi K3 trades blows with Anthropic's Fable, and Moonshot is opening the weights

Kimi K3, GLM 5.2 and DeepSeek V4 put open-weight AI next to the frontier this month. What each model is good at, and why the benchmarks mislead.

The columned stone facade of the Hamilton County Courthouse, with an inscription about the administration of the law carved across the frieze.

Policy·2 weeks ago

$1.5 billion, about $3,000 a book. A judge finalized Anthropic's piracy settlement.

A federal judge gave final approval to Anthropic's $1.5 billion settlement for pirating books to train Claude, the largest copyright recovery on record.

AI·4 weeks ago

GitHub ran four frontier models through Copilot's harness. None won every task.

GitHub benchmarked Copilot's agent harness against Claude Code and Codex CLI on five tests. The token savings are real, and the best model depends on the task.

Students seated in a university lecture hall

AI·4 weeks ago

A Dartmouth AI textbook is tied to final-exam gains of up to 1.30 standard deviations

Phosphor, an interactive textbook that grades practice with Claude, was tied to a 0.71 to 1.30 SD final-exam gain in a Dartmouth statistics course.

Anthropic Claude Sonnet 5 announcement graphic

AI·last month

Claude Sonnet 5: cheaper agents on paper, until you count the new tokenizer's tokens

Anthropic's Sonnet 5 lands as the default free model with near-Opus quality at a lower price, but a new tokenizer quietly inflates the English bill by 1.4x.

Anthropic's Claude branding on a soft gradient panel, used to illustrate the redeployment of Fable 5 after export controls were lifted.

Policy·last month

The US lifted its export ban on Anthropic's Fable 5. The model returns Wednesday.

The Commerce Department cleared Claude Fable 5 and Mythos 5, ending an 18-day export-control freeze. Anthropic redeploys Fable 5 globally on Wednesday with tighter safeguards.

The north facade of the White House in Washington under a clear sky.

Policy·last month

The White House told OpenAI to gate GPT-5.6. Frontier models now need government sign-off.

The Trump administration asked OpenAI to limit GPT-5.6 to trusted partners, with the government vetting access customer by customer. Here's what that gatekeeping means.

Android 17 promotional hero image from Google showing new features

Android·last month

Android 17 ships with floating app bubbles and on-screen reaction recording

Android 17 is rolling out to Pixel first. Here's what actually shipped, from Bubbles and Screen Reactions to tighter location privacy, and which features are still coming.

University students filing into an examination hall to sit a written exam

AI·last month

A Brown professor caught 40 of 86 students cheating with AI. Now he wants take-home exams gone.

A Brown economist found AI fraud across a midterm. The scandal exposes how routine AI cheating has become, and why detectors can't reliably catch it.

Sam Altman and Broadcom CEO Hock Tan holding a Jalapeño Intelligence Processor wafer mounted in an acrylic display

AI·last month

OpenAI built its own AI chip with Broadcom. The target is Nvidia's inference margins.

Jalapeño is OpenAI's first custom inference processor, co-designed with Broadcom. Here's what a purpose-built inference ASIC actually buys you, and who else is doing it.

Google Gemini chat interface shown on screen

AI·last month

Google reportedly delays Gemini 3.5 Pro to July to keep tuning the model

Google has pushed its frontier Gemini 3.5 Pro to July while Flash already ships, according to Business Insider. Here's what slipped and why it matters.

Abstract render of an AI neural network over rows of data-center servers

AI·last month

Anthropic wants Congress to punish Alibaba over 28.8 million Claude queries

Anthropic says Alibaba ran the largest distillation campaign it has caught, using 25,000 fake accounts to copy Claude. Here is what that claim actually means.

Illustration for OpenAI's Daybreak security program and the GPT-5.5-Cyber model

AI·last month

OpenAI is now using GPT-5.5 to find and patch open-source bugs at scale

OpenAI's Daybreak push pairs the new GPT-5.5 default model with GPT-5.5-Cyber, a tool that finds, validates, and patches software flaws. Here's what it does and the catch.

Benchmark comparison card for GLM-5.2 showing it as the leading open weights model

AI·last month

GLM-5.2 was trained on Huawei chips, not Nvidia. The open weights beat GPT-5.5 on coding.

Zhipu AI's GLM-5.2 is a free-to-download model trained without Nvidia silicon. Here's what the benchmarks claim and why developers should care.

Claude Code branding over a terminal, illustrating the leaked source code on npm

Security·last month

Claude Code's full source leaked on npm. A stray source-map file gave away every line.

Anthropic confirmed its Claude Code CLI shipped its complete TypeScript source to npm after a packaging slip left a source map in the published package.

Google Home Speaker in Porcelain, a fabric-wrapped cylinder with a light ring at the base

Android·last month

Google's $99 Home Speaker arrives June 25, and the $35 Nest Mini is gone

Google's Gemini-powered Home Speaker opens for pre-order at $99.99 and ships June 25, with 360-degree audio and a new voice assistant replacing the Nest Mini.

Stainless developer-tools branding, the SDK-generation startup Anthropic acquired.

AI·last month

Anthropic bought Stainless, the SDK factory OpenAI and Meta also ran on

Anthropic acquired Stainless for a reported $300M and is winding down the hosted SDK generator that OpenAI, Meta, Google, and Cloudflare relied on.

The US Department of Commerce headquarters, the Herbert C. Hoover Building, in Washington, D.C.

Policy·2 months ago

The US held off blacklisting DeepSeek. More than 100 Chinese firms are stuck in limbo

The Commerce Department paused adding DeepSeek and 100+ Chinese firms to the Entity List. Here's what the export-control blacklist does and why DeepSeek was spared.

AI·2 months ago

Days after opening Fable 5 to the public, a US government order forced Anthropic to pull it

A Commerce Department export directive forced Anthropic to disable Fable 5 and Mythos 5 for all users, days after opening Fable 5 to the public.

Gemini Intelligence interface on an Android phone

Android·2 months ago

Gemini Intelligence turns Android 17 into an agent that drives your apps

Google's Android Show pitched Gemini Intelligence and AppFunctions, an MCP-style way for the assistant to call inside your apps. Here's how it works and what to watch.

A MacBook Pro beside a Surface Book, both open on a white surface, USB-C ports in view

AI·2 months ago

Running a coding agent fully on Apple Silicon, no cloud, is now an off-the-shelf stack

A popular Hacker News how-to walked through a fully local coding agent on Apple Silicon. Here's the realistic 2026 stack: runner, model, and harness.

AI·2 months ago

Claude Fable 5 is Anthropic's first public Mythos-class model. It tops SWE-Bench Pro at 80.3%.

Claude Fable 5 hits 80.3% on SWE-Bench Pro and ships on Bedrock and Copilot at $10/$50 per million tokens, free on paid plans only through June 22.

Policy·2 months ago

Anthropic is sending Mythos 5, the model it called too dangerous, to cyberdefenders and the US government

Mythos 5 is the same model as Fable 5 with cyber safeguards lifted, going to Project Glasswing defenders and, Anthropic says, ~150 orgs across 15+ countries.

Apple Intelligence branding from Apple's WWDC 2026 announcement

Apple·2 months ago

Apple rebuilt Siri on Google's Gemini and is paying $1 billion a year for it

At WWDC 2026 Apple shipped Siri AI, rebuilt on a custom Google Gemini model running on its own servers. Here are the catches behind the demo.

Abstract cybersecurity illustration of a glowing padlock over a circuit board, representing data protection

AI·2 months ago

OpenAI added a Lockdown Mode to ChatGPT to blunt prompt-injection attacks

OpenAI shipped Lockdown Mode in ChatGPT to cut off the data-exfiltration step of prompt-injection attacks. Here's what it actually restricts and who should turn it on.

The South Facade of the White House in Washington, with the fountain and South Lawn in the foreground.

Policy·2 months ago

Sriram Krishnan is leaving the White House AI job to build an outside policy institution

Sriram Krishnan, the a16z partner who co-wrote the AI Action Plan, leaves his White House senior AI advisor role at the end of June 2026. Here's what changes.

The White House in Washington, D.C., where the executive order was signed

Policy·2 months ago

Trump dropped the mandatory AI model review after Silicon Valley pushed back

Trump's June 2 AI executive order asks for a voluntary 30-day model review, down from a mandatory 90-day one. Here's what got cut and who pushed.

OpenAI's Codex branding over a code background, illustrating Codex expanding across the ChatGPT app.

AI·2 months ago

OpenAI is putting Codex in every ChatGPT app, with six business plugins for non-coders

On June 2 OpenAI said Codex is coming to the ChatGPT app everywhere within weeks, and shipped six role-specific plugins for sales, analytics, design, and finance teams.

The Stanford Law School building on Stanford University's campus

AI·2 months ago

Stanford tested AI against law professors. The pros picked the AI 75% of the time.

A blinded Stanford Law study had 16 professors grade AI tutoring answers against their own. Here's what the 75% win rate actually measures, and what it doesn't.

AI·2 months ago

Claude Opus 4.8 flags the bugs it writes four times more often than Opus 4.7

Anthropic's Opus 4.8 posts 69.2% on SWE-Bench Pro, lets code flaws slip 4x less often, and ships parallel subagents in Claude Code. Here's what matters.

A source-code editor open to C++ code, evoking the debate over AI-written contributions to open source

Open Source·2 months ago

SQLite won't accept AI-written code, but QEMU just opened the door to it

Two of the most cautious C projects split on AI contributions in the same week. The real fight is over copyright provenance and who cleans up the slop.

A developer's Emacs session in a Linux terminal, editing C source alongside a shell

AI·2 months ago

Hacker News is obsessed with durable Postgres workflows and a game about clicking yes

Six dev-tooling and AI posts that climbed Hacker News in late May 2026: durable execution on plain Postgres, LLM code smells, a permission-fatigue game, Rust 1.96, and more.

DuckDuckGo's 'No AI' search promotion, the page the company points users to when they want AI features turned off.

Web·2 months ago

Google said people love AI search. DuckDuckGo's installs jumped 30% the next week.

DuckDuckGo's US downloads climbed about 30% and its no-AI search page saw 28% more visits the week after Google's I/O push. The backlash is now measurable.

A software engineer at a laptop, the kind of AI-assisted coding workflow whose token costs blew through Uber's annual budget.

AI·2 months ago

Uber blew its entire 2026 AI coding budget in four months. Its COO can't prove it paid off.

Uber exhausted its full-year Claude Code budget by April. Adoption hit 84%, heavy users burn $2,000 a month, and COO Andrew Macdonald can't connect the spend to shipped features.

AI·2 months ago

DeepSeek locked in the 75% V4-Pro cut. The API now undercuts every Western frontier model.

On May 23 DeepSeek told customers the V4-Pro discount becomes its standard price after May 31. Output drops from $3.48 to $0.87 per million tokens.

Microsoft building exterior sign on a clear day.

AI·2 months ago

Microsoft is canceling Claude Code for its engineers. They have until June 30 to switch to Copilot CLI.

Internal Claude Code licenses end June 30, 2026, for Microsoft's Experiences + Devices group. Engineers move to GitHub Copilot CLI instead.

Anthropic Project Glasswing announcement card with glasswing butterfly motif.

AI·2 months ago

Anthropic's Glasswing logged 10,000 vulnerabilities in a month. Most are still waiting on a patch.

Anthropic says Project Glasswing's first month produced over 10,000 critical-and-high-severity vulns. Verification and patching is the limiting step.

Portrait of Andrej Karpathy, whose January 26 X thread on agentic coding was distilled into the viral CLAUDE.md file.

AI·2 months ago

Karpathy posted four notes about Claude Code. The CLAUDE.md they spawned has 110K GitHub stars.

Forrest Chang turned Andrej Karpathy's January coding thread into a 70-line CLAUDE.md. It now has 110,000+ stars and has trended on GitHub for 28 weeks.

Diagram of an artificial neural network with input, hidden, and output layers

AI·2 months ago

Andrej Karpathy joined Anthropic. The OpenAI founding member's job: use Claude to train Claude.

Karpathy started this week at Anthropic on Nick Joseph's pre-training team. His mandate is using Claude to accelerate Claude's own training.

Lead image from the Axios story about Anthropic's $15B SpaceX compute deal

AI·2 months ago

SpaceX's S-1 revealed who's paying for Colossus. Anthropic just locked in $45B through 2029.

Anthropic is paying SpaceX $1.25 billion a month for Colossus 1 and 2 capacity. The contract runs through May 2029 and books about 83% of SpaceX's revenue.

Hardware·3 months ago

Alibaba's new Zhenwu M890 chip is 3x faster and aimed straight at agent workloads

Alibaba showed the Zhenwu M890 at its Cloud Summit on May 19. 144 GB of memory, 800 GB/s interchip bandwidth, and Qwen3.7-Max riding on top.

Hardware·3 months ago

Google and Samsung set Fall 2026 for Android XR glasses. Gentle Monster and Warby Parker are doing the frames.

The Android Show confirmed Fall 2026 for Google and Samsung's first AR glasses, plus three new features for the Galaxy XR headset that launched in October.

An illustration of the Claude Code deeplink vulnerability, showing a malicious URL handler triggering a shell prompt.

Security·3 months ago

A bad command-line parser turned every claude-cli:// link into a remote shell

Joernchen of 0day.click found a deeplink RCE in Claude Code. Anthropic shipped the fix in 2.1.118 the same week.

Elon Musk speaking at the World Economic Forum.

Policy·3 months ago

A federal jury took two hours to throw out Elon Musk's lawsuit against Sam Altman and OpenAI.

On May 18 a nine-juror panel rejected every claim Musk filed against OpenAI in 2024. Judge Yvonne Gonzalez Rogers had told the courtroom she was ready to dismiss on the spot.

Alibaba's Qwen3.8-Max beats Fable 5 on Terminal-Bench, and the weights go public next week

Ten decade-old math problems fell to an unreleased OpenAI model, for about $2,000 of tokens each

Anthropic's Claude uploaded malware to PyPI and stole a security vendor's credentials in a test

DeepSeek's new 304B agentic model now runs on a single 128GB workstation

60 hours of AI cryptanalysis. HAWK's authors pulled it from NIST's post-quantum race.

Matt Lenhard found 49 relays reselling OpenAI and Anthropic tokens. The cheapest runs 97.8% below list.

Codeberg banned LLM-generated projects, and Debian is voting on the same question

Claude Opus 5 nears Fable 5's frontier intelligence at half the price

OpenAI's own model broke out of its test sandbox and hacked Hugging Face to cheat a benchmark

$100 million in six weeks. Now ChatGPT runs two ads per answer.

Google shipped three Gemini Flash models but held back its flagship Pro

Kimi K3 trades blows with Anthropic's Fable, and Moonshot is opening the weights

$1.5 billion, about $3,000 a book. A judge finalized Anthropic's piracy settlement.

GitHub ran four frontier models through Copilot's harness. None won every task.

A Dartmouth AI textbook is tied to final-exam gains of up to 1.30 standard deviations

Claude Sonnet 5: cheaper agents on paper, until you count the new tokenizer's tokens

The US lifted its export ban on Anthropic's Fable 5. The model returns Wednesday.

The White House told OpenAI to gate GPT-5.6. Frontier models now need government sign-off.

Android 17 ships with floating app bubbles and on-screen reaction recording

A Brown professor caught 40 of 86 students cheating with AI. Now he wants take-home exams gone.

OpenAI built its own AI chip with Broadcom. The target is Nvidia's inference margins.

Google reportedly delays Gemini 3.5 Pro to July to keep tuning the model

Anthropic wants Congress to punish Alibaba over 28.8 million Claude queries

OpenAI is now using GPT-5.5 to find and patch open-source bugs at scale

GLM-5.2 was trained on Huawei chips, not Nvidia. The open weights beat GPT-5.5 on coding.

Claude Code's full source leaked on npm. A stray source-map file gave away every line.

Google's $99 Home Speaker arrives June 25, and the $35 Nest Mini is gone

Anthropic bought Stainless, the SDK factory OpenAI and Meta also ran on

The US held off blacklisting DeepSeek. More than 100 Chinese firms are stuck in limbo

Days after opening Fable 5 to the public, a US government order forced Anthropic to pull it

Gemini Intelligence turns Android 17 into an agent that drives your apps

Running a coding agent fully on Apple Silicon, no cloud, is now an off-the-shelf stack

Claude Fable 5 is Anthropic's first public Mythos-class model. It tops SWE-Bench Pro at 80.3%.

Anthropic is sending Mythos 5, the model it called too dangerous, to cyberdefenders and the US government

Apple rebuilt Siri on Google's Gemini and is paying $1 billion a year for it

OpenAI added a Lockdown Mode to ChatGPT to blunt prompt-injection attacks

Sriram Krishnan is leaving the White House AI job to build an outside policy institution

Trump dropped the mandatory AI model review after Silicon Valley pushed back

OpenAI is putting Codex in every ChatGPT app, with six business plugins for non-coders

Stanford tested AI against law professors. The pros picked the AI 75% of the time.

Claude Opus 4.8 flags the bugs it writes four times more often than Opus 4.7

SQLite won't accept AI-written code, but QEMU just opened the door to it

Hacker News is obsessed with durable Postgres workflows and a game about clicking yes

Google said people love AI search. DuckDuckGo's installs jumped 30% the next week.

Uber blew its entire 2026 AI coding budget in four months. Its COO can't prove it paid off.

DeepSeek locked in the 75% V4-Pro cut. The API now undercuts every Western frontier model.

Microsoft is canceling Claude Code for its engineers. They have until June 30 to switch to Copilot CLI.

Anthropic's Glasswing logged 10,000 vulnerabilities in a month. Most are still waiting on a patch.

Karpathy posted four notes about Claude Code. The CLAUDE.md they spawned has 110K GitHub stars.

Andrej Karpathy joined Anthropic. The OpenAI founding member's job: use Claude to train Claude.

SpaceX's S-1 revealed who's paying for Colossus. Anthropic just locked in $45B through 2029.

Alibaba's new Zhenwu M890 chip is 3x faster and aimed straight at agent workloads

Google and Samsung set Fall 2026 for Android XR glasses. Gentle Monster and Warby Parker are doing the frames.

A bad command-line parser turned every claude-cli:// link into a remote shell

A federal jury took two hours to throw out Elon Musk's lawsuit against Sam Altman and OpenAI.

Anthropic bought Stainless, the startup that builds every official SDK for OpenAI and Google.

OpenAI's Codex moved into the ChatGPT mobile app. You can approve a diff from the train now.

Cerebras priced its IPO at $185 and closed at $311. Andrew Feldman and Sean Lie became billionaires.

Bun's million-line Rust rewrite is now mainline. 99.8% of tests pass and 13,000 unsafe blocks remain.

Anthropic shipped Claude for Small Business with 15 prebuilt agents. Daniela Amodei is pitching the corner-store owner.

Google's Magic Pointer turns the cursor into a Gemini prompt. The first Googlebooks ship this fall.

Cactus Compute distilled Gemini into a 26M tool-calling model. The trick: no feed-forward layers.

A crafted Ollama model file leaks the whole server's memory. 300,000 instances are exposed.

Jarred Sumner rewrote 960,000 lines of Bun from Zig to Rust in six days. He might throw it all away.

Chinese proxy networks sell Claude API access at 90% off. They harvest every prompt that passes through.

Microsoft tested 19 LLMs as document editors. Even the best ones corrupted 25% of the content.

A judge killed DOGE's grant purge. The 'review process' was asking ChatGPT 'Is this DEI?'

Apple is turning iOS 27 into an AI model marketplace. ChatGPT loses its exclusive slot.

Timothy Gowers gave GPT 5.5 an open math problem. It returned a novel proof in 17 minutes.

A Michigan town voted against a $16B data center. The lawsuit was filed two days later.

Anthropic doubled Claude Code's limits by renting 220,000 GPUs from xAI

Perplexity's $400M Snapchat search deal is dead. Snap pulled it from guidance.

GitHub Copilot's Claude Opus multiplier jumps to 27x on June 1. Monthly plans dodge the hike.

Anthropic is fielding offers at a $900B valuation. The round closes in two weeks and tops OpenAI.

Alphabet hit $109.9B in Q1 and is starting to sell TPUs to outside data centers

Anthropic just dropped its Claude Code workshop tapes. The playbook is better than the marketing.

Warp's terminal is now open source. The cloud agent platform Oz is the actual product.

OpenAI's models are on AWS Bedrock the day after Microsoft lost exclusivity

Disney built an AI leaderboard. One employee called Claude 460,000 times in nine days.

GitHub Copilot kills premium requests on June 1. Token billing arrives, fallback models do not.