#open-weights

Alibaba's Qwen3.8-Max beats Fable 5 on Terminal-Bench, and the weights go public next week

Qwen3.8-Max is a 2.4-trillion-parameter MoE that tops Claude Fable 5 on Terminal-Bench 2.1 and trails it badly on SWE-bench Pro. It's the first open Max-tier Qwen.

AI·2 days ago

DeepSeek's new 304B agentic model now runs on a single 128GB workstation

Salvatore Sanfilippo repacked DeepSeek V4 Flash into a lossless MXFP4 GGUF that streams from SSD at over 20 tokens a second. The hardware bill, and where hosted still wins.

Two pelicans face each other with crossed beaks on dark water, mirrored in the surface, a nod to the informal 'pelican on a bicycle' LLM benchmark

AI·2 weeks ago

Kimi K3 trades blows with Anthropic's Fable, and Moonshot is opening the weights

Kimi K3, GLM 5.2 and DeepSeek V4 put open-weight AI next to the frontier this month. What each model is good at, and why the benchmarks mislead.

Benchmark comparison card for GLM-5.2 showing it as the leading open weights model

AI·last month

GLM-5.2 was trained on Huawei chips, not Nvidia. The open weights beat GPT-5.5 on coding.

Zhipu AI's GLM-5.2 is a free-to-download model trained without Nvidia silicon. Here's what the benchmarks claim and why developers should care.

The US Department of Commerce headquarters, the Herbert C. Hoover Building, in Washington, D.C.

Policy·2 months ago

The US held off blacklisting DeepSeek. More than 100 Chinese firms are stuck in limbo

The Commerce Department paused adding DeepSeek and 100+ Chinese firms to the Entity List. Here's what the export-control blacklist does and why DeepSeek was spared.

A MacBook Pro beside a Surface Book, both open on a white surface, USB-C ports in view

AI·2 months ago

Running a coding agent fully on Apple Silicon, no cloud, is now an off-the-shelf stack

A popular Hacker News how-to walked through a fully local coding agent on Apple Silicon. Here's the realistic 2026 stack: runner, model, and harness.

Cactus Compute YouTube thumbnail showing the team behind Needle

AI·3 months ago

Cactus Compute distilled Gemini into a 26M tool-calling model. The trick: no feed-forward layers.

Needle is a 26M-parameter function caller distilled from Gemini 3.1 Flash-Lite. The Simple Attention Network drops MLPs and runs at 6,000 tok/s prefill on edge silicon.

Arcee AI Trinity branding from the Trinity-Large-Thinking blog post.

Open Source·3 months ago

Arcee's Trinity-Large-Thinking is a 399B open MoE that costs 96% less than Opus

Arcee released Trinity-Large-Thinking on April 1: a 399B-param sparse MoE with 13B active, Apache 2.0 weights, $0.88 per million output tokens, and PinchBench just behind Opus 4.6.

A padlock chained to a smartphone displaying a lock icon, illustrating data privacy.

AI·3 months ago

OpenAI's Privacy Filter is a 1.5B PII redactor that ships under Apache 2.0. Here's what it actually does.

OpenAI released Privacy Filter on April 22 as an open-weight on-device model for masking eight types of PII. F1 of 96%. Runs in a browser. Here's the catch.

DeepSeek social card from the V4 API documentation release post.

AI·3 months ago

DeepSeek V4 lands: 1.6T-param open MoE, 1M-token context, and SWE-bench within 0.2 of Opus 4.6

DeepSeek shipped V4-Pro and V4-Flash under MIT on April 24. V4-Pro hits 80.6% on SWE-bench Verified. V4-Flash is $0.14 in / $0.28 out.

Header card from Simon Willison's 'Qwen3.6 beats Opus' post comparing pelican SVGs

AI·4 months ago

Qwen 3.6-35B-A3B: the open MoE beating Opus 4.7 on Simon Willison's laptop

Alibaba's Qwen 3.6-35B-A3B is a 35B-param mixture-of-experts with only 3B active. Apache 2.0, runs on consumer GPUs, and it's already winning real tasks.