Pre-built detectors usage¶
piighost ships ready-to-use regex pattern sets for the most common PII: emails, IPs, URLs, API keys, phone numbers, SSNs, IBANs... You can use them as-is, combine them together, or extend them with your own patterns.
This page walks through the usage recipes. For the full catalog of available labels (Common, US, Europe), see Reference Pre-built detectors.
Single region¶
from examples.detectors.common import create_detector
from piighost.anonymizer import Anonymizer
from piighost.linker.entity import ExactEntityLinker
from piighost.resolver import MergeEntityConflictResolver, ConfidenceSpanConflictResolver
from piighost.pipeline import AnonymizationPipeline
from piighost.placeholder import LabelCounterPlaceholderFactory
detector = create_detector()
entity_linker = ExactEntityLinker()
entity_resolver = MergeEntityConflictResolver()
span_resolver = ConfidenceSpanConflictResolver()
ph_factory = LabelCounterPlaceholderFactory()
anonymizer = Anonymizer(ph_factory=ph_factory)
pipeline = AnonymizationPipeline(
detector=detector,
span_resolver=span_resolver,
entity_linker=entity_linker,
entity_resolver=entity_resolver,
anonymizer=anonymizer,
)
anonymized, _ = await pipeline.anonymize("Email me at alice@example.com, server 192.168.1.42.")
print(anonymized)
# Email me at <<EMAIL:1>>, server <<IP_V4_1>>.
Combine common + regional patterns¶
from examples.detectors.us import create_full_detector
detector = create_full_detector()
# create_full_detector() merges common + US patterns via CompositeDetector
span_resolver = ConfidenceSpanConflictResolver()
entity_linker = ExactEntityLinker()
entity_resolver = MergeEntityConflictResolver()
anonymizer = Anonymizer(LabelCounterPlaceholderFactory())
pipeline = AnonymizationPipeline(
detector=detector,
span_resolver=span_resolver,
entity_linker=entity_linker,
entity_resolver=entity_resolver,
anonymizer=anonymizer,
)
anonymized, _ = await pipeline.anonymize(
"SSN 123-45-6789, email john@example.com, card 4532-1234-5678-9012."
)
print(anonymized)
# SSN <<US_SSN:1>>, email <<EMAIL:1>>, card <<CREDIT_CARD:1>>.
Mix-and-match with PATTERNS dicts¶
from piighost.detector import RegexDetector
from examples.detectors.common import PATTERNS as COMMON
from examples.detectors.europe import PATTERNS as EU
# Cherry-pick only what you need
my_patterns = {
"EMAIL": COMMON["EMAIL"],
"URL": COMMON["URL"],
"EU_IBAN": EU["EU_IBAN"],
"FR_PHONE": EU["FR_PHONE"],
}
detector = RegexDetector(patterns=my_patterns)
Combine with an NER (NER + regex)¶
from gliner2 import GLiNER2
from piighost.detector import CompositeDetector
from piighost.detector.gliner2 import Gliner2Detector
from examples.detectors.common import create_detector as create_regex
model = GLiNER2.from_pretrained("fastino/gliner2-multi-v1")
ner_detector = Gliner2Detector(model=model, labels=["PERSON", "LOCATION"], threshold=0.5)
regex_detector = create_regex() # emails, IPs, URLs, API keys, etc.
detector = CompositeDetector(detectors=[ner_detector, regex_detector])
span_resolver = ConfidenceSpanConflictResolver()
entity_linker = ExactEntityLinker()
entity_resolver = MergeEntityConflictResolver()
anonymizer = Anonymizer(LabelCounterPlaceholderFactory())
pipeline = AnonymizationPipeline(
detector=detector,
span_resolver=span_resolver,
entity_linker=entity_linker,
entity_resolver=entity_resolver,
anonymizer=anonymizer,
)
anonymized, _ = await pipeline.anonymize("Patrick at alice@example.com, IP 10.0.0.1.")
print(anonymized)
# <<PERSON:1>> at <<EMAIL:1>>, IP <<IP_V4_1>>.
Adding your own patterns¶
The pattern sets are plain dictionaries, extend them or create your own:
from examples.detectors.common import PATTERNS as COMMON
my_patterns = {
**COMMON,
"LICENSE_PLATE_FR": r"\b[A-Z]{2}-\d{3}-[A-Z]{2}\b",
"CUSTOM_ID": r"\bCUST-\d{6}\b",
}
See also Extending PIIGhost for creating fully custom detector classes.