Reference Anonymizer¶
Module: piighost.anonymizer
Anonymizer¶
Performs span-based text replacement using a placeholder factory. Stateless class no internal state between calls.
Constructor¶
| Parameter | Type | Description |
|---|---|---|
ph_factory |
AnyPlaceholderFactory |
Placeholder factory for token generation (required) |
Methods¶
anonymize(text, entities) -> str¶
Replaces each detection in text with its entity's token. Replacements are applied right to left so that earlier span positions remain valid.
from piighost.anonymizer import Anonymizer
from piighost.placeholder import LabelCounterPlaceholderFactory
from piighost.models import Detection, Entity, Span
anonymizer = Anonymizer(LabelCounterPlaceholderFactory())
entity = Entity(detections=(
Detection(text="Patrick", label="PERSON", position=Span(0, 7), confidence=0.9),
))
result = anonymizer.anonymize("Patrick is nice", [entity])
# '<<PERSON:1>> is nice'
| Parameter | Type | Description |
|---|---|---|
text |
str |
Source text |
entities |
list[Entity] |
Entities whose detections should be replaced |
Returns: str the anonymized text
deanonymize(anonymized_text, entities) -> str¶
Restores the original text by replacing tokens with original detection texts, using the original positions to handle entities with multiple spelling variants.
| Parameter | Type | Description |
|---|---|---|
anonymized_text |
str |
Text containing placeholder tokens |
entities |
list[Entity] |
Same entities used during anonymization |
Returns: str the restored original text
AnyAnonymizer (Protocol)¶
Interface for all anonymizer implementations.
class AnyAnonymizer(Protocol):
ph_factory: AnyPlaceholderFactory
def anonymize(self, text: str, entities: list[Entity]) -> str: ...
def deanonymize(self, anonymized_text: str, entities: list[Entity]) -> str: ...
Detectors¶
Module: piighost.detector
AnyDetector (Protocol)¶
Interface for all entity detectors. All detectors are async.
GlinerDetector¶
Implementation using the GLiNER2 model.
| Parameter | Type | Default | Description |
|---|---|---|---|
model |
GLiNER2 |
GLiNER2 model instance (required) | |
labels |
list[str] |
Entity types to detect (required) | |
threshold |
float |
0.5 |
Minimum confidence score (0.0–1.0) |
flat_ner |
bool |
True |
Flat NER mode (no nested entities) |
from gliner2 import GLiNER2
from piighost.detector.gliner2 import Gliner2Detector
model = GLiNER2.from_pretrained("fastino/gliner2-multi-v1")
detector = Gliner2Detector(model=model, labels=["PERSON", "LOCATION"], threshold=0.5)
detections = await detector.detect("Patrick lives in Paris")
# [Detection(text='Patrick', label='PERSON', position=Span(0, 7), confidence=0.97),
# Detection(text='Paris', label='LOCATION', position=Span(17, 22), confidence=0.99)]
ExactMatchDetector¶
Detects entities by exact word matching using word-boundary regex. Useful for tests.
from piighost.detector import ExactMatchDetector
detector = ExactMatchDetector([("Patrick", "PERSON"), ("Paris", "LOCATION")])
detections = await detector.detect("Patrick lives in Paris")
RegexDetector¶
Pattern-based detection with one regex per label.
from piighost.detector import RegexDetector
detector = RegexDetector(patterns={"FR_PHONE": r"\b(?:\+33|0)[1-9](?:[\s.\-]?\d{2}){4}\b"})
detections = await detector.detect("Call me at 06 12 34 56 78")
CompositeDetector¶
Chains multiple detectors and merges their results.
from piighost.detector import CompositeDetector
detector = CompositeDetector(detectors=[gliner_detector, regex_detector])
Placeholder Factories¶
Module: piighost.placeholder
AnyPlaceholderFactory (Protocol)¶
Interface for all placeholder factories.
class AnyPlaceholderFactory(Protocol):
def create(self, entities: list[Entity]) -> dict[Entity, str]: ...
LabelCounterPlaceholderFactory¶
Default implementation: generates sequential <<LABEL:N>> tags.
from piighost.placeholder import LabelCounterPlaceholderFactory
from piighost.models import Detection, Entity, Span
factory = LabelCounterPlaceholderFactory()
person = Entity(detections=(Detection("Patrick", "PERSON", Span(0, 7), 0.9),))
location = Entity(detections=(Detection("Paris", "LOCATION", Span(17, 22), 0.9),))
tokens = factory.create([person, location])
# {person: '<<PERSON:1>>', location: '<<LOCATION:1>>'}
LabelHashPlaceholderFactory¶
Generates deterministic, opaque hash-based tags.
from piighost.placeholder import LabelHashPlaceholderFactory
factory = LabelHashPlaceholderFactory(hash_length=8)
tokens = factory.create([person])
# {person: '<<PERSON:a1b2c3d4>>'}
| Parameter | Default | Description |
|---|---|---|
hash_length |
8 |
Number of hex characters from the SHA-256 digest |
LabelPlaceholderFactory¶
All entities with the same label share the same <<LABEL>> token. No counter, no distinction between entities.
from piighost.placeholder import LabelPlaceholderFactory
factory = LabelPlaceholderFactory()
tokens = factory.create([person, location])
# {person: '<<PERSON>>', location: '<<LOCATION>>'}
Deanonymization with LabelPlaceholderFactory
Since multiple entities share the same token, deanonymization relies on original position order. This works correctly with the Anonymizer.deanonymize() method.
Data models¶
Module: piighost.models
Detection¶
A single NER result from the text.
@dataclass(frozen=True)
class Detection:
text: str # Surface form: "Patrick"
label: str # Entity type: "PERSON"
position: Span # Where it was found
confidence: float # Score (0.0–1.0)
Entity¶
Group of detections referring to the same PII.
@dataclass(frozen=True)
class Entity:
detections: tuple[Detection, ...] # All detections for this entity
# label: str (property, from first detection)
Span¶
Character position in source text.
@dataclass(frozen=True)
class Span:
start_pos: int # Inclusive start index
end_pos: int # Exclusive end index
# overlaps(other: Span) -> bool
Immutability
All models are frozen dataclasses (frozen=True) they are thread-safe and hashable.
Exceptions¶
Module: piighost.exceptions
CacheMissError¶
Raised when a cache lookup finds no entry for the given key. Used by the middleware to fall back from pipeline.deanonymize() to pipeline.deanonymize_with_ent().