Arcee's Trinity-Large-Thinking is a 399B open MoE that costs 96% less than Opus
Arcee released Trinity-Large-Thinking on April 1: a 399B-param sparse MoE with 13B active, Apache 2.0 weights, $0.88 per million output tokens, and PinchBench just behind Opus 4.6.
Arcee, a 30-person U.S. startup, just released a 399-billion-parameter open-weights reasoning model that rents for less than a dollar per million output tokens. It’s called Trinity-Large-Thinking, and the weights have been on Hugging Face under Apache 2.0 since April 1.
That’s the news. The why-it-matters is the price-performance line: a model that ranks #2 on PinchBench, behind only Claude Opus 4.6, at roughly 96% lower output cost than Opus, and you can self-host it.
What we know
- Architecture: sparse Mixture-of-Experts, 399B total parameters with 13B active per token, using a 4-of-256 expert routing strategy. Pre-trained on 17 trillion tokens with the Muon optimizer plus a load-balancing technique Arcee calls SMEBU. Interleaved local/global attention with gating for long context.
- Context window: 512k tokens via the model card, 262,144 served by OpenRouter.
- Pricing on Arcee’s API: $0.23 per million input tokens, $0.88 per million output tokens, per Artificial Analysis. Anthropic charges $5/$25 for Opus 4.6. The blended rate works out to roughly 96% cheaper.
- Performance: PinchBench rank #2, just behind Opus 4.6. On the Artificial Analysis Intelligence Index, it scores 32 (median for comparable open-weights models is 28). Output speed is 118 tokens/second, time-to-first-token 1.10 seconds.
- License: Apache 2.0. Weights downloadable, commercial use allowed, fine-tunable, self-hostable.
- Training compute: 2,048 Nvidia B300 Blackwell GPUs for pretraining (a single 33-day run). Post-training on 1,152 H100s.
- Predecessor traction: Trinity-Large-Preview served 3.37 trillion tokens on OpenRouter in its first two months, briefly the #1 most-used open model in the U.S., per VentureBeat.
What we don’t know
- The full benchmark suite. Arcee’s blog leans on PinchBench. Detailed AIME, GPQA Diamond, and SWE-bench Pro numbers aren’t in the launch post. Artificial Analysis ran 10 evals but the public scorecards aren’t all visible yet.
- Hosting cost on customer hardware. A 13B-active MoE with 399B total weights needs serious VRAM to load even at low precision. The single-node story for self-hosting will depend on quantization, and Arcee hasn’t published a recommended deployment recipe.
- Whether the cheap API price holds. $0.88 per million output tokens is closer to a serving-cost recovery than a margin. If demand spikes, Arcee may need to raise it to keep up. The Trinity-Large-Preview ramp suggests demand is real.
- How well the 17T-token Muon-trained base generalizes outside the benchmark suites. Long-horizon agent claims need real-world tool-call traces, not just PinchBench scores. The model is “very verbose,” generating 150M output tokens during evaluation versus a 43M median, per Artificial Analysis. That hits your bill fast on output-priced runs.
Source attribution
The release and architecture details come from Arcee’s launch post. VentureBeat’s Carl Franzen positioned the U.S.-made open-weights angle, and TechCrunch’s follow-up on April 7 put the headcount and budget on record: 30 employees, $20M committed to a single 33-day training run. Independent benchmarking is from Artificial Analysis.
What this means for you
If you’re picking a frontier-class model for production work and you care about cost or sovereignty, Trinity-Large-Thinking deserves a spot on the eval shortlist alongside DeepSeek V4 and Qwen 3.6. The combination of Apache 2.0, 512k context, and sub-dollar output pricing is rare, and the U.S. provenance matters for buyers who can’t ship workloads to Chinese-licensed weights.
The catch is the same catch as with every open-weights frontier release: PinchBench rank doesn’t equal “drop-in replacement for Opus.” Run it on your own task suite. Pay attention to verbosity, because the output-token bill compounds. And remember that Arcee is a 30-person company; if your business depends on Trinity, treat the support contract like you’d treat a dependency on a tiny vendor, not on Anthropic.
Worth flagging the field shape. A year ago, “open-weights frontier” meant Llama plus a couple of Chinese MoEs. As of April 2026, the list is Llama, DeepSeek V4, Qwen 3.6, and Trinity, with each one optimized for a different cost/quality envelope. The interesting question is whether Arcee can keep shipping at this pace on a 30-person team, or whether the next training run requires a fundraise that changes the company. Watch the next 90 days. If they ship a Trinity-Large-Thinking-V2 without raising, the lean-MoE thesis just got real.
Share this article
Sources
- Trinity-Large-Thinking: Scaling an Open Source Frontier Agent — Arcee AI
- Arcee's new, open source Trinity-Large-Thinking is the rare, powerful U.S.-made AI model that enterprises can download and customize — VentureBeat
- I can't help rooting for tiny open source AI model maker Arcee — TechCrunch
- Trinity Large Thinking - Intelligence, Performance & Price Analysis — Artificial Analysis