Nvidia's RTX Spark laptops run 120-billion-parameter models locally, no cloud needed

Nvidia's RTX Spark laptops put a Grace Blackwell superchip and 128GB of unified memory in a notebook, aimed at running 120B-parameter models offline.

Nvidia and Microsoft just put a data-center-class superchip in a laptop. The pitch to developers is blunt: stop renting GPUs and run a 120-billion-parameter model on the machine in your bag.

The two companies announced the RTX Spark family on May 31, fusing a Grace CPU and a Blackwell GPU into one part with a single shared memory pool. If you’ve ever watched a local model crawl because your GPU ran out of VRAM, that shared pool is the spec that matters here, and it’s why this is more than another gaming-laptop refresh.

Until now, running a frontier-class open model meant a desktop tower stuffed with GPUs or a recurring cloud bill. RTX Spark is Nvidia’s bet that the third option, a laptop you can run those models on in airplane mode, is finally viable, and it’s pulling every major Windows OEM along with it. For developers who handle private data or work under compliance rules that forbid shipping prompts to a third party, that’s the whole story. Whether the price and the real-world speed back up the pitch is the open question this announcement leaves wide open.

What we know

Here are the confirmed specs, straight from Nvidia’s newsroom and corroborated by hands-on coverage:

The chip pairs a 20-core Grace CPU with a Blackwell GPU carrying 6,144 CUDA cores and fifth-generation Tensor Cores, per Nvidia.
It tops out at 128GB of unified memory, addressed by both the CPU and GPU, with memory bandwidth around 300 GB/s.
Nvidia rates it at roughly 1 petaflop of FP4 AI compute, which it markets as about 1,000 TOPS.
The headline use case: hosting reasoning models up to 120 billion parameters with up to a 1-million-token context, locally, no cloud API required.
These are Windows on Arm machines. The Grace CPU is Arm-based, co-designed with MediaTek, and CPU-to-GPU traffic rides NVLink-C2C, The Register reports.
First systems ship fall 2026 from ASUS, Dell, HP, Lenovo, Microsoft Surface and MSI, with Acer and GIGABYTE to follow.

Jensen Huang framed it as a reset of what a PC is. “The PC is being reinvented. For forty years, you launched apps. Click. Type. With RTX Spark and Windows, you ask, and the PC does the work,” the Nvidia CEO said in the announcement. Microsoft CEO Satya Nadella tied it to Windows directly: the goal, he said, is “to deliver unmetered intelligence to every home and every desk.”

What we don’t know

Price is the big blank. Nvidia announced no figure, and history isn’t encouraging: the desktop DGX Spark that this notebook descends from launched near $4,000, so a loaded 128GB laptop won’t be a casual buy.

A few other open questions:

Real-world tokens per second. A petaflop of FP4 with sparsity is a best-case lab number. Sustained inference speed on a battery, under a laptop’s thermal limits, is the figure that actually decides whether 120B models are usable or just loadable.
Battery and heat. Nobody has published runtime or fan behavior under a long agent run.
The Arm tax. Windows on Arm still breaks some x86 developer tooling. How much of your stack runs natively versus through emulation is unanswered.
Memory tiers and pricing per tier. Nvidia says “up to” 128GB; the base configs reportedly start lower (16GB), and where the price breaks fall is unclear.

Hype versus who actually benefits

The 1,000-TOPS number invites a misleading comparison. Coverage noting RTX Spark’s AI compute next to Apple’s M4 Max (1,000 vs. 38 TOPS) also flags the caveat: “the conditions are very different” because the figures use different precision and sparsity assumptions. Treat the headline spec as marketing math, not a head-to-head benchmark.

The genuine advantage isn’t peak FLOPS anyway. It’s the 128GB unified pool. A 120B model quantized to 4-bit needs roughly 60-70GB just for weights; a typical gaming laptop with 16GB of VRAM can’t hold it at all, so today you either pay for cloud inference or shard across machines. RTX Spark holds the whole thing in one address space. That’s the same architectural bet Apple made with unified memory on the Mac, scaled up for inference. For the broader money behind this race, see our coverage of the hyperscalers planning $700B in AI capex this year and the chip-stock swings that pivot on it.

What this means for you

If you build with local LLMs, this is the first laptop that can plausibly hold a frontier-class open model in memory without a desktop rig or a cloud bill. That’s real, and it’s narrow. You’re the target customer if you run agents on private data, iterate on prompts offline, or can’t send code and documents to a third-party API for compliance reasons. For everyone else, including most people who just want a fast laptop, this is overkill you’ll pay a steep premium for.

My read: don’t preorder. Wait for the price, then wait for one independent reviewer to publish real tokens-per-second on a 70B and a 120B model under sustained load. If those numbers hold and the machine lands under the desktop DGX Spark’s price, it’s the local-inference box to beat. If they don’t, a 128GB Mac or a cloud GPU you rent by the hour will still be the saner buy.

Nvidia's RTX Spark laptops run 120-billion-parameter models locally, no cloud needed

What we know

What we don’t know

Hype versus who actually benefits

What this means for you

Share this article

Quick reference

Sources

Mentioned in this article