devtake.dev

A malicious GGUF file owns your SGLang server: CVE-2026-5760 is an unpatched 9.8

SGLang's reranker renders chat templates without a sandbox. Load a hostile GGUF, hit /v1/rerank, and the attacker has Python on your inference box. No patch yet.

Luca Reinhardt · · 5 min read · 3 sources

If you run an SGLang inference server and you load community GGUFs, stop reading and pull the box off the open internet. CERT/CC published VU#915947 on April 20, assigning CVE-2026-5760 with a CVSS of 9.8. The bug lets a malicious model file pop a shell on your server, and as of disclosure the upstream project hasn’t shipped a fix.

This is a textbook AI-supply-chain RCE. The vehicle is a .gguf file, the kind of weights you grab off Hugging Face by the hundreds, and the trigger is the everyday /v1/rerank endpoint that anyone running a Qwen3 reranker has exposed.

What we know

  • Affected component: SGLang’s reranker pipeline at /v1/rerank. The vulnerable code path is in serving_rerank.py, which loads chat templates from the model file and renders them.
  • Root cause: the rendering code uses jinja2.Environment() without sandboxing. As The Hacker News reports, this lets an attacker abuse Jinja2’s server-side template injection (SSTI) primitives to call arbitrary Python.
  • Trigger: a GGUF whose tokenizer.chat_template field contains a Jinja2 SSTI payload plus the trigger phrase "The answer can only be 'yes' or 'no'". That phrase activates SGLang’s Qwen3 reranker detection and routes the input through the rendering call.
  • Impact per CERT: arbitrary code execution in the SGLang service context, host compromise, lateral movement, data exfiltration, and denial of service.
  • CVE ID: CVE-2026-5760. CVSS v3.1: 9.8 (network, no auth, full confidentiality + integrity + availability impact).
  • Patch status: none. CERT’s coordination outreach got no vendor response, per the cyberpress writeup.

What we don’t know

  • Whether it’s exploited in the wild. No public IOC list, no incident report. Given how trivial the payload is to construct and how many SGLang servers sit on the public internet, treat “not yet” as wishful.
  • The full version range. The advisory doesn’t enumerate every fixed and broken commit. Anyone running SGLang from main should assume vulnerable until upstream says otherwise.
  • Whether other reranker endpoints are exposed. The trigger phrase is Qwen3-specific, but the underlying Jinja2 misuse may apply to other chat-template paths in the codebase. A fuller audit of serving_*.py is what nobody has published yet.
  • Whether Hugging Face or any registry is scanning for malicious chat_template payloads. As of disclosure: not announced.

Source attribution

CERT/CC published the advisory (VU#915947) on April 20, 2026. Coverage followed from The Hacker News, Cyberpress, and gbhackers, all of whom point at the same root cause: jinja2.Environment() rendering attacker-controlled templates without sandbox.

The fix is one line of code: use ImmutableSandboxedEnvironment instead of Environment. CERT recommends it explicitly. SGLang’s maintainers have not commented publicly on a release timeline at the time of writing.

What this means for you

If you run SGLang in production, do three things this week. First, audit which GGUFs your boxes load and where they came from; assume any community-sourced model is untrusted. Second, put the inference servers behind authenticated reverse proxies and don’t expose /v1/rerank to the open internet, full stop. Third, patch the framework yourself. Replacing jinja2.Environment() with ImmutableSandboxedEnvironment is a one-liner; if you can’t wait for upstream, vendor a fix and pin the version.

This is the second AI-infrastructure supply-chain bomb in two weeks, after the Mercor / LiteLLM compromise and the ongoing Shai-Hulud worm in npm. The pattern is consistent: the parts of the stack that make it easy to plug models into apps are also the parts where security review hasn’t caught up. Treat any line of code that renders attacker-controlled template strings as an open door until proven sandboxed. The model file is the new email attachment, and “trust the GGUF” is exactly the wrong default.

If your SOC isn’t watching outbound connections from your inference fleet, start now. The cheapest detection isn’t a YARA rule; it’s a netflow alert when an inference box phones home to somewhere it shouldn’t.

Share this article

Sources

Mentioned in this article