devtake.dev

OpenAI's Privacy Filter is a 1.5B PII redactor that ships under Apache 2.0. Here's what it actually does.

OpenAI released Privacy Filter on April 22 as an open-weight on-device model for masking eight types of PII. F1 of 96%. Runs in a browser. Here's the catch.

Dieter Morelli · · 6 min read · 4 sources
A padlock chained to a smartphone displaying a lock icon, illustrating data privacy.
Book Catalog / CC BY 2.0 via Wikimedia Commons · Source

OpenAI quietly published a model card on April 22 and shipped weights on Hugging Face under Apache 2.0. The product is called Privacy Filter, and on paper it’s a small thing: a 1.5-billion-parameter LLM that detects and masks personally identifiable information in unstructured text. In context, it’s the first open-weight model OpenAI has released since GPT-2, the first one positioned as defensive infrastructure rather than a chatbot, and a quiet swerve into the ground Microsoft Presidio and AWS Comprehend have held for the better part of a decade.

What Privacy Filter is

Privacy Filter is an open-weight LLM specialized for one task: read text, find PII, decide whether to redact or mask it. The model card lists eight categories of detection: names, addresses, emails, phone numbers, URLs, dates, account numbers, and secrets such as passwords or API keys. It’s a 1.5B-parameter model with a mixture-of-experts architecture that activates only ~50M parameters per inference, which is what lets it run on consumer hardware.

The deployment story is the genuinely interesting part. Help Net Security’s reporting on the launch emphasizes that the model is “small enough to be run locally,” which matters because the whole point of a privacy filter is that the data hasn’t been filtered yet. Sending unredacted PII to a cloud API to ask “is this PII?” is a structural problem; running a local model that decides before the data leaves the machine is a different shape entirely.

OpenAI ships the weights under Apache 2.0, the most permissive standard OSS license. You can fine-tune it, redistribute it, embed it in a commercial product, deploy it in a regulated workflow. The license isn’t the catch. The catches are elsewhere.

How it actually works

The model takes a span of text and emits two things: a category label per detected PII span, and a span boundary. Whether you replace the span with <NAME>, an empty string, a hash, or leave it in but tag it for downstream review is the application’s call. The model just identifies and labels.

The performance numbers OpenAI publishes are on PII-Masking-300k, a public benchmark that’s become the de-facto common eval for this category of model. Privacy Filter posts 94.04% precision and 98.04% recall, for an F1 of 96%. On a revised version of the dataset (the company doesn’t fully document what was revised), the F1 climbs to 97.43%. Both numbers beat published Presidio and AWS Comprehend baselines on the same eval, but only by a few points.

Where context-aware detection actually pays off is the hard cases benchmarks don’t fully test:

  • A name that’s also a noun. “Carter went to the meeting” vs. “carter pulled the lever.” Regex layers struggle; an LLM that has seen the surrounding sentence doesn’t.
  • A date inside prose. “On the 22nd” vs. “April 22, 2026” vs. “next Tuesday.” All three are dates; the first two are explicit, the third is contextual.
  • Account numbers without obvious patterns. A 16-digit string is a credit card number; a similar-length string with hyphens is sometimes an SSN, sometimes a license plate, sometimes a tracking number.
  • Secrets in code. API keys, JWTs, and password fields look very different from natural-language PII. Privacy Filter handles them as a single category.

Why you’re hearing about this now

Three things converged in the last six months. The first: enterprises adopting LLM workflows discovered that their input data is often a worse privacy hazard than the model output. ChatGPT logs from employee usage have leaked customer names, internal account numbers, even bare API keys. The second: regulators in California and the EU have been pushing on inference-time data minimization. If you can’t promise your model didn’t see the PII, the next-best is to promise the PII never reached the model uncovered. The third: OpenAI itself has been taking heat for training and inference data hygiene, and Privacy Filter is part of the answer. It’s a deflection-shaped product, but the deflection happens to be useful.

The launch lands a week after OpenAI’s Workspace Agents rollout, which made team-shared, cross-document AI workflows a default in ChatGPT Business. Workspace Agents pull from Drives, calendars, and Slack channels by design. Without a redaction layer in front of those pipes, every agent is a potential exfiltration vector.

How it stacks up against the alternatives

This isn’t a greenfield product category. The state of play before April 22:

  • Microsoft Presidio. Open-source, regex-plus-transformer pipeline, mature, free. Excellent for structured PII (emails, phone numbers, SSNs). Mediocre for ambiguous mentions in prose. Apache 2.0.
  • AWS Comprehend. Hosted only. Strong precision, opaque about what’s under the hood. Charged per character. The wrong shape if your concern is data leaving your network.
  • Google Cloud DLP. Hosted, similar to Comprehend.
  • Privacy Filter. Open-weight, on-device, single-model. Strong on context-dependent cases. Weaker than Presidio on short, low-context strings where regex wins.

The honest comparison is that Privacy Filter doesn’t replace Presidio for most teams; it complements it. Presidio’s regex layer is faster and more deterministic for the easy cases (email addresses, phone numbers, SSNs in known formats). Privacy Filter’s LLM layer is better for the cases Presidio misses. The right pipeline runs both: Presidio first for the cheap, deterministic catches, Privacy Filter second for the residual.

What it doesn’t do, in OpenAI’s own words

OpenAI is unusually direct about the model’s limits. Quoting the model card:

Privacy Filter can make mistakes. It may miss uncommon identifiers or ambiguous references, and it can over- or under-redact information when context is limited, especially in shorter text. In high-sensitivity areas such as legal, medical, and financial workflows, human review and domain-specific evaluation and fine-tuning remain important.

That’s a useful disclaimer because it tells you where to deploy and where not to. Three failure modes worth highlighting:

  • Short text is hard. A single SMS-length string with no surrounding context gives the model nothing to work with. Recall drops.
  • Domain-specific identifiers. A medical record number, an aircraft tail number, a court docket ID: these are PII in their domains but won’t necessarily fire the generic model.
  • Multilingual. OpenAI doesn’t publish per-language F1. The benchmark is English-heavy. Treat any non-English deployment as needing your own eval.

Which is why the Robinson+Cole privacy-law write-up ends with what you’d expect a privacy law firm to write: this doesn’t get you to compliance on its own. It’s a tool. The compliance program is the thing the tool feeds into.

How to actually deploy it

The path of least resistance, for a typical team:

  1. Pull the weights. huggingface-cli download openai/privacy-filter, or load via transformers. ~3 GB on disk.
  2. Stand up a sidecar service. Run the model behind a tiny HTTP API that takes text and returns spans + labels. Most teams don’t want every microservice loading the weights itself.
  3. Wire it into your LLM pipeline. Two patterns. Pre-LLM (redact PII before it reaches GPT/Claude/Gemini) is the safer default. Post-LLM (redact PII before logs, before storage, before showing to humans-not-on-the-need-to-know-list) is also useful and easy to bolt on.
  4. Layer it on Presidio, not under it. Run Presidio first for cheap deterministic catches, then Privacy Filter on the residual. Cheaper and higher-recall than either alone.
  5. Eval on your data. Don’t trust the 96% F1 number. Sample 200 of your own examples, hand-label, run them through, measure. Most surprises live in your data, not in the benchmark.

What this means for you

If you’re a team running employee-facing LLM tools (ChatGPT Enterprise, Claude for Work, Cursor with org context), this is the layer that’s been missing from your pipeline. Apache 2.0 means there’s no licensing argument; on-device means there’s no data-egress argument; the model card is honest about limits, which makes it easier to scope. Drop it in front of any LLM ingest path that touches employee text and stop forwarding raw customer data.

If you build privacy tooling for a living, the competitive picture just shifted. Microsoft Presidio is fine. AWS Comprehend is fine. But “OpenAI’s open-weight redaction model” is going to be the default thing engineers reach for, and the differentiation game now is fine-tuning for vertical PII (medical, legal, financial) and the integrations layer (sidecars, observability, audit logs), not the base model.

My read: this is the kind of product launch that’s small in scope but consequential in shape. The shape is OpenAI publishing open weights again, for a defensive use case, with honest limits. That’s a useful precedent for the next time someone in the safety community asks why models aren’t built for being audited from the outside. Privacy Filter is a model that asks to be audited. More like this, please.

Sources

Frequently Asked

Where can I download Privacy Filter?
On Hugging Face at openai/privacy-filter under an Apache 2.0 license. The 1.5B-parameter model has roughly 50M active parameters and runs locally on a standard laptop or directly in a browser.
What types of PII does Privacy Filter detect?
Eight categories: names, addresses, emails, phone numbers, URLs, dates, account numbers, and secrets such as passwords or API keys. It performs context-aware detection rather than purely regex-based matching.
How does it compare to existing tools like Microsoft Presidio or AWS Comprehend?
Privacy Filter is a single open-weight LLM rather than a pipeline of regex patterns plus a transformer. The trade-off: better at ambiguous mentions (a name that's also a noun, a date in prose), worse at short or context-free strings where Presidio's regex layer is faster and more precise.
Does it handle medical, legal, or financial PII?
Yes, but OpenAI explicitly says human review and domain-specific evaluation are still required for those workflows. The model wasn't fine-tuned for HIPAA or GLBA compliance and shouldn't be deployed as the only layer in those domains.
What's the F1 score and what benchmark is it on?
F1 of 96% (94.04% precision, 98.04% recall) on the public PII-Masking-300k benchmark, rising to 97.43% on a revised version of the dataset. That's the headline; real-world performance depends heavily on whether your data looks like the benchmark.

Mentioned in this article