Proxy docs · v2.2
Promptolian Proxy
A transparent proxy between your agent and Anthropic or OpenAI. One line change — no agent logic to touch. Tool caching and context compression run automatically.
How it works
The proxy sits between your code and the API. It intercepts every call, injects cache_control blocks on tool schemas automatically, and optionally compresses conversation history through the KV-sandwich engine before forwarding.
Your agent sees the exact same response it would get from Anthropic or OpenAI directly. No response fields are modified.
base_url. Your API key goes in the standard header — the proxy forwards it directly and never stores it.
Local proxy
Run on your own machine. No account needed. Sessions stored in SQLite at ~/.promptolian/sessions.db.
# Install
pip install "promptolian[proxy]"
# Tool caching only
promptolian proxy
# Tool caching + context compression
promptolian proxy --compress
# Custom port
promptolian proxy --port 8080
Point your client at it
import anthropic
client = anthropic.Anthropic(
base_url="http://localhost:3002", # ← only change
)
# Everything else unchanged
resp = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
tools=[...],
messages=[...],
)
import openai
client = openai.OpenAI(
base_url="http://localhost:3002/v1", # ← only change
api_key="your-openai-key",
)
resp = client.chat.completions.create(
model="gpt-4o",
tools=[...],
messages=[...],
)
Cloud proxy
Skip self-hosting. proxy.promptolian.com runs 24/7 on PostgreSQL with context compression enabled. Requires a Solo or Team plan key.
client = anthropic.Anthropic(
base_url="https://proxy.promptolian.com",
default_headers={"X-Promptolian-Key": "pk_..."},
)
client = openai.OpenAI(
base_url="https://proxy.promptolian.com/v1",
api_key="your-openai-key",
default_headers={"X-Promptolian-Key": "pk_..."},
)
curl -X POST https://proxy.promptolian.com/v1/messages \
-H "x-api-key: YOUR_ANTHROPIC_KEY" \
-H "X-Promptolian-Key: pk_..." \
-H "anthropic-version: 2023-06-01" \
-H "Content-Type: application/json" \
-d '{"model":"claude-sonnet-4-6","max_tokens":1024,"messages":[...]}'
Tool schema caching
Every API call normally re-sends the full tool schema even when nothing changed. The proxy stores schemas per session and injects Anthropic cache_control blocks automatically. Cached tokens are billed at 10%.
How to use sessions
Pass X-Session to group calls. On the first call, include your tools as usual. On subsequent calls you can omit tools entirely — the proxy re-injects them.
# Call 1 — tools sent and stored by the proxy
client.messages.create(
...,
tools=[search_tool, read_tool, sql_tool],
extra_headers={"X-Session": "session-abc"},
)
# Call 2+ — omit tools, proxy re-injects from cache (~90% cheaper)
client.messages.create(
...,
extra_headers={"X-Session": "session-abc"},
)
Context compression
Enabled with --compress locally, or always-on in the cloud proxy. The engine runs before every call, invisibly. No model changes required.
KV-sandwich architecture
HEAD → first 2 turns → VERBATIM task framing, constraints
MIDDLE → turns 3 to N-4 → COMPRESSED weighted by information density
TAIL → last 4 turns → VERBATIM current working state
Middle turns are scored by how much new information they add — new entities, vocabulary, delta from prior turns. Pure acknowledgements ("ok", "noted") and reformulations of earlier content are pruned first. High-density turns survive.
Pipeline stages
| Stage | What it does |
|---|---|
| delta_prune | Drop turns whose facts are already in later history |
| entity_register | Repeated values (URLs, keys, names) → §E1, §E2… — expanded back verbatim before forwarding |
| relevance_prune | BM25 + entity density scoring against current query |
| summarize | TF-IDF extractive summary — no LLM, no hallucination risk |
| session_huffman | Repeated bigrams/trigrams → §H1, §H2… symbols |
Benchmark results
| System | Quality score | Compression | Approach |
|---|---|---|---|
| Promptolian | 4.26 / 5 | 21.8% | Extractive · KV-sandwich |
| Anthropic built-in | 3.44 / 5 | 98.7% | LLM summarization |
| OpenAI built-in | 3.35 / 5 | 99.3% | LLM summarization |
Factory.ai 6-dimension methodology · 25 sessions · 5 task domains · May 2026. Full methodology →
Sensitive data detection
Every request is scanned for credentials and data dumps before being forwarded. Detection adds <1 ms overhead and never blocks a call. Only the category and risk level are stored — no message content, no reconstructable data.
| Category | Risk | What it matches |
|---|---|---|
| CONNECTION_STRING | HIGH | postgres://, mysql://, mongodb://, redis:// with credentials |
| API_KEY | HIGH | sk-, AKIA, ghp_, gho_, xoxb-, AIza patterns |
| PRIVATE_KEY | HIGH | RSA / EC / OPENSSH PEM headers |
| JWT | HIGH | Three-part eyXXX.eyXXX.XXX bearer tokens |
| ENV_FILE | HIGH | 3+ consecutive KEY=value lines |
| SQL_DUMP | MEDIUM | 3+ consecutive INSERT INTO statements |
| STACK_TRACE | MEDIUM | Python Traceback header |
| CSV_DATA | MEDIUM | 3+ rows × 5+ columns CSV |
| LARGE_JSON | MEDIUM | Array of 10+ JSON objects |
Hits return X-Promptolian-Sensitive: HIGH|MEDIUM in the response header. Events are visible in the dashboard and via GET /proxy/pii-events.
Sessions
Pass X-Session to group API calls. The proxy uses the session ID to look up cached tool schemas and maintain compression state. If omitted, a new session ID is generated and returned in X-Promptolian-Session.
POST /v1/messages
Proxies to Anthropic's messages API. All Anthropic parameters forwarded unchanged. Tool caching and context compression applied transparently.
| Header | Required | Description |
|---|---|---|
| x-api-key | Yes | Your Anthropic API key — forwarded directly |
| anthropic-version | Yes | e.g. 2023-06-01 |
| X-Promptolian-Key | Cloud only | Your plan key (pk_...) |
| X-Session | No | Session ID for tool caching. Auto-generated if omitted. |
| anthropic-beta | No | Forwarded to Anthropic as-is |
POST /v1/responses
Proxies to OpenAI's responses API. Same tool caching logic applies. Session falls back to previous_response_id if no X-Session header.
| Header | Required | Description |
|---|---|---|
| Authorization | Yes | Bearer YOUR_OPENAI_KEY |
| X-Promptolian-Key | Cloud only | Your plan key |
| X-Session | No | Session ID |
GET /proxy/health
Returns proxy status and storage mode. Use to verify the proxy is reachable.
GET /proxy/sessions
Lists all active sessions with cached tool schemas, warm/cold cache state, and token counts.
Delete a session:
GET /proxy/dashboard
Returns savings stats for the authenticated account. Team plan includes per-project breakdown in the projects array.
GET /proxy/pii-events
Sensitive data detection log for your account. Scoped to your keys only — you never see other accounts' data.
# Cloud
curl https://proxy.promptolian.com/proxy/pii-events \
-H "X-Promptolian-Key: pk_..."
# Local
curl http://localhost:3002/proxy/pii-events
events = httpx.get(
"https://proxy.promptolian.com/proxy/pii-events",
headers={"X-Promptolian-Key": "pk_..."},
params={"limit": 50},
).json()["events"]
Key management
All key endpoints require X-Promptolian-Key.
Rotate a key
Generates a new key immediately. The old key stops working at once — update your config before rotating.
List keys
Returns all keys on your account (first 8 chars only). Team plan shows per-project breakdown.
Create a project key Team only
curl -X POST https://proxy.promptolian.com/proxy/keys/new \
-H "X-Promptolian-Key: pk_..." \
-H "Content-Type: application/json" \
-d '{"project_name": "staging-agent"}'
Response headers
Every proxied response includes these headers so you can observe caching behaviour programmatically.
| Header | Value | Description |
|---|---|---|
| X-Promptolian-Session | string | Session ID used for this call |
| X-Promptolian-Cache-Hit | true / false | Whether tool schemas were served from cache |
| X-Promptolian-Tokens-Saved | integer | Tool tokens saved this call (0 on cache miss) |
| X-Promptolian-Note | string | Human-readable cache hit summary |
| X-Promptolian-Sensitive | HIGH / MEDIUM | Present only when sensitive data was detected |
Errors
Proxy errors are JSON with an error field. Non-proxy errors (400, 429, etc.) are passed through from Anthropic or OpenAI unchanged.
| Status | Cause |
|---|---|
| 401 | Missing or invalid X-Promptolian-Key (cloud), or missing Anthropic/OpenAI key |
| 403 | Subscription inactive or plan key limit reached |
| 500 | Proxy or upstream error — check /proxy/health |