Proxy docs · v2.2

Promptolian Proxy

A transparent proxy between your agent and Anthropic or OpenAI. One line change — no agent logic to touch. Tool caching and context compression run automatically.

How it works

The proxy sits between your code and the API. It intercepts every call, injects cache_control blocks on tool schemas automatically, and optionally compresses conversation history through the KV-sandwich engine before forwarding.

Your agent sees the exact same response it would get from Anthropic or OpenAI directly. No response fields are modified.

One line change: swap base_url. Your API key goes in the standard header — the proxy forwards it directly and never stores it.

Local proxy

Run on your own machine. No account needed. Sessions stored in SQLite at ~/.promptolian/sessions.db.

bash
# Install
pip install "promptolian[proxy]"

# Tool caching only
promptolian proxy

# Tool caching + context compression
promptolian proxy --compress

# Custom port
promptolian proxy --port 8080

Point your client at it

import anthropic

client = anthropic.Anthropic(
    base_url="http://localhost:3002",  # ← only change
)

# Everything else unchanged
resp = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    tools=[...],
    messages=[...],
)
import openai

client = openai.OpenAI(
    base_url="http://localhost:3002/v1",  # ← only change
    api_key="your-openai-key",
)

resp = client.chat.completions.create(
    model="gpt-4o",
    tools=[...],
    messages=[...],
)

Cloud proxy

Skip self-hosting. proxy.promptolian.com runs 24/7 on PostgreSQL with context compression enabled. Requires a Solo or Team plan key.

client = anthropic.Anthropic(
    base_url="https://proxy.promptolian.com",
    default_headers={"X-Promptolian-Key": "pk_..."},
)
client = openai.OpenAI(
    base_url="https://proxy.promptolian.com/v1",
    api_key="your-openai-key",
    default_headers={"X-Promptolian-Key": "pk_..."},
)
curl -X POST https://proxy.promptolian.com/v1/messages \
  -H "x-api-key: YOUR_ANTHROPIC_KEY" \
  -H "X-Promptolian-Key: pk_..." \
  -H "anthropic-version: 2023-06-01" \
  -H "Content-Type: application/json" \
  -d '{"model":"claude-sonnet-4-6","max_tokens":1024,"messages":[...]}'

Tool schema caching

Every API call normally re-sends the full tool schema even when nothing changed. The proxy stores schemas per session and injects Anthropic cache_control blocks automatically. Cached tokens are billed at 10%.

How to use sessions

Pass X-Session to group calls. On the first call, include your tools as usual. On subsequent calls you can omit tools entirely — the proxy re-injects them.

python
# Call 1 — tools sent and stored by the proxy
client.messages.create(
    ...,
    tools=[search_tool, read_tool, sql_tool],
    extra_headers={"X-Session": "session-abc"},
)

# Call 2+ — omit tools, proxy re-injects from cache (~90% cheaper)
client.messages.create(
    ...,
    extra_headers={"X-Session": "session-abc"},
)
Anthropic's prompt cache TTL is 5 minutes. If a session is idle longer, the proxy re-warms it automatically on the next call — one cold call, then back to 10% billing.

Context compression

Enabled with --compress locally, or always-on in the cloud proxy. The engine runs before every call, invisibly. No model changes required.

KV-sandwich architecture

structure
HEAD   → first 2 turns  → VERBATIM    task framing, constraints
MIDDLE → turns 3 to N-4 → COMPRESSED  weighted by information density
TAIL   → last 4 turns   → VERBATIM    current working state

Middle turns are scored by how much new information they add — new entities, vocabulary, delta from prior turns. Pure acknowledgements ("ok", "noted") and reformulations of earlier content are pruned first. High-density turns survive.

Pipeline stages

StageWhat it does
delta_pruneDrop turns whose facts are already in later history
entity_registerRepeated values (URLs, keys, names) → §E1, §E2… — expanded back verbatim before forwarding
relevance_pruneBM25 + entity density scoring against current query
summarizeTF-IDF extractive summary — no LLM, no hallucination risk
session_huffmanRepeated bigrams/trigrams → §H1, §H2… symbols

Benchmark results

SystemQuality scoreCompressionApproach
Promptolian4.26 / 521.8%Extractive · KV-sandwich
Anthropic built-in3.44 / 598.7%LLM summarization
OpenAI built-in3.35 / 599.3%LLM summarization

Factory.ai 6-dimension methodology · 25 sessions · 5 task domains · May 2026. Full methodology →

Sensitive data detection

Every request is scanned for credentials and data dumps before being forwarded. Detection adds <1 ms overhead and never blocks a call. Only the category and risk level are stored — no message content, no reconstructable data.

CategoryRiskWhat it matches
CONNECTION_STRINGHIGHpostgres://, mysql://, mongodb://, redis:// with credentials
API_KEYHIGHsk-, AKIA, ghp_, gho_, xoxb-, AIza patterns
PRIVATE_KEYHIGHRSA / EC / OPENSSH PEM headers
JWTHIGHThree-part eyXXX.eyXXX.XXX bearer tokens
ENV_FILEHIGH3+ consecutive KEY=value lines
SQL_DUMPMEDIUM3+ consecutive INSERT INTO statements
STACK_TRACEMEDIUMPython Traceback header
CSV_DATAMEDIUM3+ rows × 5+ columns CSV
LARGE_JSONMEDIUMArray of 10+ JSON objects

Hits return X-Promptolian-Sensitive: HIGH|MEDIUM in the response header. Events are visible in the dashboard and via GET /proxy/pii-events.

Sessions

Pass X-Session to group API calls. The proxy uses the session ID to look up cached tool schemas and maintain compression state. If omitted, a new session ID is generated and returned in X-Promptolian-Session.

POST /v1/messages

Proxies to Anthropic's messages API. All Anthropic parameters forwarded unchanged. Tool caching and context compression applied transparently.

POST/v1/messages
HeaderRequiredDescription
x-api-keyYesYour Anthropic API key — forwarded directly
anthropic-versionYese.g. 2023-06-01
X-Promptolian-KeyCloud onlyYour plan key (pk_...)
X-SessionNoSession ID for tool caching. Auto-generated if omitted.
anthropic-betaNoForwarded to Anthropic as-is

POST /v1/responses

Proxies to OpenAI's responses API. Same tool caching logic applies. Session falls back to previous_response_id if no X-Session header.

POST/v1/responses
HeaderRequiredDescription
AuthorizationYesBearer YOUR_OPENAI_KEY
X-Promptolian-KeyCloud onlyYour plan key
X-SessionNoSession ID

GET /proxy/health

Returns proxy status and storage mode. Use to verify the proxy is reachable.

GET/proxy/health
{ "status": "ok", "mode": "local", // or "cloud" "storage": "sqlite", // or "postgresql" "sessions_cached": 3, "db_path": "~/.promptolian/sessions.db" }

GET /proxy/sessions

Lists all active sessions with cached tool schemas, warm/cold cache state, and token counts.

GET/proxy/sessions
{ "session-id-abc": { "tool_count": 3, "tool_names": ["search_web", "read_file", "run_sql"], "tokens_cached": 360, "cache_warm": true, "last_call_ago_s": 42 } }

Delete a session:

DELETE/proxy/sessions/{session_id}

GET /proxy/dashboard

Returns savings stats for the authenticated account. Team plan includes per-project breakdown in the projects array.

GET/proxy/dashboard
{ "email": "you@example.com", "plan": "team", "status": "active", "sessions_active": 4, "tokens_saved": 1840000, "dollar_saved": 5.52, "key_limit": 10, "keys_used": 3, "projects": [/* Team only */] }

GET /proxy/pii-events

Sensitive data detection log for your account. Scoped to your keys only — you never see other accounts' data.

GET/proxy/pii-events?limit=100
# Cloud
curl https://proxy.promptolian.com/proxy/pii-events \
  -H "X-Promptolian-Key: pk_..."

# Local
curl http://localhost:3002/proxy/pii-events
events = httpx.get(
    "https://proxy.promptolian.com/proxy/pii-events",
    headers={"X-Promptolian-Key": "pk_..."},
    params={"limit": 50},
).json()["events"]
{ "count": 1, "events": [{ "session_id": "a3f9c1d2", "api_key_hint": "pk_xYz1...", "timestamp": 1748131200.0, "risk_level": "HIGH", "categories": ["CONNECTION_STRING"] }] }

Key management

All key endpoints require X-Promptolian-Key.

Rotate a key

POST/proxy/keys/rotate

Generates a new key immediately. The old key stops working at once — update your config before rotating.

{ "api_key": "pk_newkey...", "project_name": "default" }

List keys

GET/proxy/keys

Returns all keys on your account (first 8 chars only). Team plan shows per-project breakdown.

Create a project key Team only

POST/proxy/keys/new
curl
curl -X POST https://proxy.promptolian.com/proxy/keys/new \
  -H "X-Promptolian-Key: pk_..." \
  -H "Content-Type: application/json" \
  -d '{"project_name": "staging-agent"}'
{ "api_key": "pk_newkey...", "project_name": "staging-agent" }
Save the returned key immediately — it is not retrievable after this response. Up to 10 project keys on Team plan.

Response headers

Every proxied response includes these headers so you can observe caching behaviour programmatically.

HeaderValueDescription
X-Promptolian-SessionstringSession ID used for this call
X-Promptolian-Cache-Hittrue / falseWhether tool schemas were served from cache
X-Promptolian-Tokens-SavedintegerTool tokens saved this call (0 on cache miss)
X-Promptolian-NotestringHuman-readable cache hit summary
X-Promptolian-SensitiveHIGH / MEDIUMPresent only when sensitive data was detected

Errors

Proxy errors are JSON with an error field. Non-proxy errors (400, 429, etc.) are passed through from Anthropic or OpenAI unchanged.

StatusCause
401Missing or invalid X-Promptolian-Key (cloud), or missing Anthropic/OpenAI key
403Subscription inactive or plan key limit reached
500Proxy or upstream error — check /proxy/health