PIIGhost¶
Transparent PII anonymization for LLM agents.
piighost is a Python library that automatically detects, anonymizes, and deanonymizes sensitive entities (names, locations, etc.) in AI agent conversations. It integrates via a LangChain middleware without modifying your existing agent code.
Features¶
- 4-stage pipeline: Detect → Expand → Map → Replace — covers every occurrence of each entity, not just the first
- Bidirectional: reliable deanonymization via reverse spans, plus fast string-based reanonymization
- Session caching:
PlaceholderStoreprotocol for cross-session persistence (SHA-256 keyed) - LangChain middleware: transparent hooks on
abefore_model,aafter_model, andawrap_tool_call— zero changes to your agent code - Protocol-based DI: every pipeline stage is a swappable protocol — detector, occurrence finder, placeholder factory, span validator
- Immutable data models: frozen dataclasses throughout (
Entity,Placeholder,Span,AnonymizationResult)
Installation¶
Quick example¶
from gliner2 import GLiNER2
from piighost.anonymizer import Anonymizer, GlinerDetector
model = GLiNER2.from_pretrained("fastino/gliner2-multi-v1")
detector = GlinerDetector(model=model, threshold=0.5, flat_ner=True)
anonymizer = Anonymizer(detector=detector)
result = anonymizer.anonymize(
"Patrick lives in Paris. Patrick loves Paris.",
labels=["PERSON", "LOCATION"],
)
print(result.anonymized_text)
# <<PERSON_1>> lives in <<LOCATION_1>>. <<PERSON_1>> loves <<LOCATION_1>>.
original = anonymizer.deanonymize(result)
print(original)
# Patrick lives in Paris. Patrick loves Paris.
Model download
The GLiNER2 model is downloaded from HuggingFace on first use (~500 MB).
Navigation¶
| Section | Description |
|---|---|
| Getting started | Installation and first steps |
| Architecture | Pipeline and flow diagrams |
| Examples | Basic usage and LangChain integration |
| Pre-built detectors | Ready-to-use regex patterns for common PII (US & Europe) |
| Extending PIIGhost | Build your own modules |
| API Reference | Full API documentation |