Skip to content

Getting started

Installation

Requirements

  • Python 3.12+
  • uv (recommended) or pip

Basic installation

uv add piighost
pip install piighost

Development installation

git clone https://github.com/Athroniaeth/piighost.git
cd piighost
uv sync

Usage 1 — Standalone anonymization

The simplest usage: create an Anonymizer and call it directly.

from gliner2 import GLiNER2
from piighost.anonymizer import Anonymizer, GlinerDetector

# 1. Load the NER model
model = GLiNER2.from_pretrained("fastino/gliner2-multi-v1")

# 2. Create the detector
detector = GlinerDetector(model=model, threshold=0.5, flat_ner=True)

# 3. Create the anonymizer
anonymizer = Anonymizer(detector=detector)

# 4. Anonymize
result = anonymizer.anonymize(
    "Patrick lives in Paris. Patrick loves Paris.",
    labels=["PERSON", "LOCATION"],
)

print(result.anonymized_text)
# <<PERSON_1>> lives in <<LOCATION_1>>. <<PERSON_1>> loves <<LOCATION_1>>.

# 5. Deanonymize
original = anonymizer.deanonymize(result)
print(original)
# Patrick lives in Paris. Patrick loves Paris.

Available labels

The supported labels depend on the GLiNER2 model. "fastino/gliner2-multi-v1" supports "PERSON", "LOCATION", "ORGANIZATION", "EMAIL", "PHONE", among others.


Usage 2 — Session pipeline with caching

AnonymizationPipeline wraps the Anonymizer with a session cache to reuse placeholders across multiple messages.

import asyncio
from piighost.pipeline import AnonymizationPipeline

pipeline = AnonymizationPipeline(
    anonymizer=anonymizer,
    labels=["PERSON", "LOCATION"],
)

async def main():
    # Anonymize (async, with caching)
    result = await pipeline.anonymize("Patrick lives in Paris.")
    print(result.anonymized_text)
    # <<PERSON_1>> lives in <<LOCATION_1>>.

    # Synchronous deanonymization via string replacement
    restored = pipeline.deanonymize_text("<<PERSON_1>> lives in <<LOCATION_1>>.")
    print(restored)
    # Patrick lives in Paris.

    # Reanonymize (inverse: original → placeholder)
    reanon = pipeline.reanonymize_text("Result for Patrick in Paris")
    print(reanon)
    # Result for <<PERSON_1>> in <<LOCATION_1>>

asyncio.run(main())
SHA-256 caching

The pipeline computes a SHA-256 hash of the source text. If the same text is submitted multiple times, the cached result is returned immediately without calling the NER model.


Usage 3 — LangChain middleware

To integrate anonymization into a LangGraph agent, use PIIAnonymizationMiddleware:

from langchain.agents import create_agent
from langchain_core.tools import tool

from piighost.anonymizer import Anonymizer, GlinerDetector
from piighost.middleware import PIIAnonymizationMiddleware
from piighost.pipeline import AnonymizationPipeline

@tool
def send_email(to: str, subject: str, body: str) -> str:
    """Send an email to the given address."""
    return f"Email sent to {to}."

# Build the anonymization stack
detector = GlinerDetector(model=model, threshold=0.5, flat_ner=True)
anonymizer = Anonymizer(detector=detector)
pipeline = AnonymizationPipeline(anonymizer=anonymizer, labels=["PERSON", "LOCATION"])
middleware = PIIAnonymizationMiddleware(pipeline=pipeline)

# Create the agent with the middleware
agent = create_agent(
    model="openai:gpt-4o-mini",
    system_prompt="You are a helpful assistant.",
    tools=[send_email],
    middleware=[middleware],
)

The middleware automatically intercepts every agent turn — the LLM only sees anonymized text, tools receive real values, and user-facing messages are deanonymized.


Development commands

uv sync                      # Install dependencies
make lint                    # Format (ruff) + lint (ruff) + type-check (pyrefly)
uv run pytest                # Run all tests
uv run pytest tests/ -k "test_name"  # Run a single test