Pre-built regex detectors¶
PIIGhost ships with ready-to-use RegexDetector pattern sets in examples/detectors/. Each file exposes a PATTERNS dictionary and a create_detector() helper so you can plug them in with zero configuration.
Available pattern sets¶
Common (universal)¶
File: examples/detectors/common.py
| Label | Example match |
|---|---|
EMAIL |
alice@example.com |
IP_V4 |
192.168.1.42 |
IP_V6 |
2001:0db8:85a3::8a2e:0370:7334 |
URL |
https://api.example.com/v1 |
CREDIT_CARD |
4532-1234-5678-9012 |
PHONE_INTERNATIONAL |
+33 6 12 34 56 78 |
OPENAI_API_KEY |
sk-proj-abc123xyz456789ABCDEF |
AWS_ACCESS_KEY |
AKIAIOSFODNN7EXAMPLE |
GITHUB_TOKEN |
ghp_ABCDEFGHIJKLMNOPQRSTUVWXYZ… |
STRIPE_KEY |
sk_live_ABCDEFGHIJKLMNOPQR… |
US-specific¶
File: examples/detectors/us.py
| Label | Example match | Format |
|---|---|---|
US_SSN |
123-45-6789 |
XXX-XX-XXXX |
US_PHONE |
(555) 867-5309 |
With optional +1 prefix |
US_PASSPORT |
C12345678 |
Letter + 8 digits |
US_ZIP_CODE |
90210-1234 |
ZIP or ZIP+4 |
US_EIN |
12-3456789 |
Employer Identification Number |
US_BANK_ROUTING |
021000021 |
9-digit ABA routing number |
Europe¶
File: examples/detectors/europe.py
| Label | Example match | Country |
|---|---|---|
EU_IBAN |
FR7630006000011234567890189 |
Pan-EU |
EU_VAT |
FR12345678901 |
Pan-EU |
FR_SSN |
185017512345612 |
France (INSEE) |
FR_PHONE |
06 12 34 56 78 |
France |
FR_ZIP |
75001 |
France |
DE_PHONE |
030 1234567 |
Germany |
DE_ZIP |
10115 |
Germany |
UK_NINO |
AB123456C |
UK (National Insurance) |
UK_NHS |
943-476-5919 |
UK (NHS number) |
UK_POSTCODE |
SW1A 1AA |
UK |
Quick start¶
Single region¶
from examples.detectors.common import create_detector
from piighost.anonymizer import Anonymizer
detector = create_detector()
anonymizer = Anonymizer(detector=detector)
result = anonymizer.anonymize("Email me at alice@example.com, server 192.168.1.42.")
print(result.anonymized_text)
# Email me at <<EMAIL_1>>, server <<IP_V4_1>>.
Combine common + regional patterns¶
from examples.detectors.us import create_full_detector
from piighost.anonymizer import Anonymizer
# create_full_detector() merges common + US patterns via CompositeDetector
detector = create_full_detector()
anonymizer = Anonymizer(detector=detector)
result = anonymizer.anonymize(
"SSN 123-45-6789, email john@example.com, card 4532-1234-5678-9012."
)
print(result.anonymized_text)
# SSN <<US_SSN_1>>, email <<EMAIL_1>>, card <<CREDIT_CARD_1>>.
Mix-and-match with PATTERNS dicts¶
from piighost.anonymizer import Anonymizer
from piighost.anonymizer.detector import RegexDetector
from examples.detectors.common import PATTERNS as COMMON
from examples.detectors.europe import PATTERNS as EU
# Cherry-pick only what you need
my_patterns = {
"EMAIL": COMMON["EMAIL"],
"URL": COMMON["URL"],
"EU_IBAN": EU["EU_IBAN"],
"FR_PHONE": EU["FR_PHONE"],
}
detector = RegexDetector(patterns=my_patterns)
anonymizer = Anonymizer(detector=detector)
Combine with GLiNER2 (NER + regex)¶
from gliner2 import GLiNER2
from piighost.anonymizer import Anonymizer, GlinerDetector
from piighost.anonymizer.detector import CompositeDetector
from examples.detectors.common import create_detector as create_regex
model = GLiNER2.from_pretrained("fastino/gliner2-multi-v1")
detector = CompositeDetector(
detectors=[
GlinerDetector(model=model, labels=["PERSON", "LOCATION"], threshold=0.5, flat_ner=True),
create_regex(), # emails, IPs, URLs, API keys, etc.
]
)
anonymizer = Anonymizer(detector=detector)
result = anonymizer.anonymize("Patrick at alice@example.com, IP 10.0.0.1.")
print(result.anonymized_text)
# <<PERSON_1>> at <<EMAIL_1>>, IP <<IP_V4_1>>.
Adding your own patterns¶
The pattern sets are plain dictionaries — extend them or create your own:
from examples.detectors.common import PATTERNS as COMMON
my_patterns = {
**COMMON,
"LICENSE_PLATE_FR": r"\b[A-Z]{2}-\d{3}-[A-Z]{2}\b",
"CUSTOM_ID": r"\bCUST-\d{6}\b",
}
See also Extending PIIGhost for creating fully custom detector classes.