Smart Token Discipline
for AI Agents.

Drop your tool-schema token overhead instantly. Promptolian compresses context, detects stuck loops, routes to the right model, and proxies to DeepSeek or GLM when Opus is overkill.

Try it free See the numbers

Published on Dev.to: Everyone compresses their agent's context. Nobody measures what it forgets.

Benchmark results
4.26/5
Context quality score
Measured across 25 sessions using Factory.ai's 6-dimension probe methodology. Anthropic built-in scores 3.44/5. Promptolian keeps 78% of tokens vs 1.3% for native compression.
34.6%
Tool output savings
Average token reduction on tool results across 9 agentic sessions (49 tool calls). Exact repeats become a 5-token reference. Similar content becomes a compact diff.
~90%
Schema savings
Tool schemas are re-sent on every API call by default. Promptolian adds Anthropic cache headers so they are billed at 10% of normal price after the first call.
21x
DeepSeek vs Opus
Cost ratio between DeepSeek V4 Flash ($0.07/1M tokens) and Claude Opus ($15/1M tokens). Promptolian routes to DS4 or any local model with one flag: no SDK change needed.
99%
Fact retention
Percentage of facts preserved after tool result compression. Measured against ground truth across 49 tool results. Native compression typically retains under 50% of specific facts.
Live
GitHub stars
Stars on github.com/Maurizio-L/promptolian-public. Updated live from the GitHub API.
Developer visits
Total unique developer sessions recorded on promptolian.com. Tracked via anonymous session IDs, no personal data stored.

How Promptolian works

Six persistent, expensive problems. Fixed transparently, with no code changes required in your agent.

Problem 1

Tool schema re-sending

Every API call re-sends the full JSON tool definitions. 5 tools = ~600 tokens wasted per call. This happens silently on every request, across every session.

Without Promptolian: Call 1: [system] + [tools: 600 tok] + [message] → $0.0018 Call 2: [system] + [tools: 600 tok] + [message] → $0.0018 ← wasted Call 3: [system] + [tools: 600 tok] + [message] → $0.0018 ← wasted
Fix: Proxy caches schemas, re-injects with cache_control
With Promptolian (from call 2): Call 2: [system] + [tools: 60 tok cached] + [message] → $0.00018
~90% session average savings $24/mo saved at 500 calls/day
Problem 2

Context quality loss

Built-in compression (Anthropic/OpenAI) scores 3.44/5: it loses facts to save tokens. LLM summarisers write "database connection was discussed" instead of keeping postgres://db.prod/main.

Fix: KV-sandwich architecture
HEAD turns 1–2
Session framing, constraints, personas
VERBATIM
MIDDLE turns 3–N-4
Entity-encoded, filler pruned
COMPRESSED 22%
TAIL last 4 turns
Current task state
VERBATIM
4.26/5 quality score 14.8% fact-loss rate
Problem 3

Repeated tool outputs

Agentic workflows read the same files, run the same bash commands, and fetch the same API responses across multiple turns. Each repeat costs full tokens.

Without Promptolian: Call 1: read_file("config.yaml") → 400 tokens Call 2: read_file("config.yaml") → 400 tokens ← wasted Call 3: read_file("config.yaml") → 400 tokens ← wasted
Fix: Tool result deduplication (REF + DIFF)
With Promptolian: Call 2: [TOOL_CACHE_REF: same as call #1] → 5 tokens Call 3: [TOOL_CACHE_DIFF from call #1: +port: 5433] → 12 tokens
34.6% token savings on tool outputs 99% fact retention Free: no API key needed
Problem 4

Local inference (DS4 / DeepSeek) burns context fast

Running DS4 locally is free: but a 284B model at 26 tokens/sec hits the 128K context wall in ~15 agentic turns. Thinking tokens from DeepSeek's reasoning mode accumulate silently, eating 2000+ tokens per turn. And when the session dies, the agent forgets everything.

Without Promptolian: turn 10 context breakdown: Thinking tokens (accumulated): 18,000 tok ← reasoning scratch work, useless now Actual conversation: 2,000 tok Repeated file reads: 12,000 tok ← same file read 6 times
Fix: --upstream + thinking compression + working memory
With Promptolian (--upstream http://localhost:8080): Thinking → compressed: 180 tok ✓ "[Thinking: using RS256 · callback URL must match]" Conversation: 2,000 tok ✓ File reads → deduplicated: 60 tok ✓ [TOOL_CACHE_REF: same as call #1] Working memory (next session): 80 tok ✓ "Fixed: token bug · Todo: migration script"
96% compression on code search 99% compression on logs 100% local: no data leaves your machine
Problem 5

Agents stuck in loops waste turns and hit the context wall

An agent that cannot find a file will try again. And again. And again. Each failed attempt costs tokens, pollutes the context, and brings you closer to the context wall: without making any progress. Most frameworks have no mechanism to detect or break this.

Without Promptolian: Turn 4: read_file("config.json") → file not found ← retried Turn 5: read_file("config.json") → file not found ← retried Turn 6: read_file("config.json") → file not found ← retried
Fix: Stuck-loop detection + ranked recovery strategies (no LLM)
With Promptolian: after 3 identical calls: [STUCK DETECTION: "read_file" called 3 times with identical inputs] Suggested strategies: 80% list the directory to see what files actually exist 65% check if a previous step was supposed to create this file 55% search for a similar name (.yml, .toml, .env)
Rule-based: zero LLM cost 7 error categories covered Free: no API key needed
Problem 6

Over-paying for model capacity

Most agent calls don't need Claude Opus. Simple lookups and drafting tasks get routed to the same expensive model as your hardest reasoning problems. And when you want cheaper alternatives like DeepSeek, switching SDKs breaks your whole stack.

Without Promptolian: All calls → claude-opus-4-8 $15/M tokens ← overkill for 80% of tasks Switch SDK → deepseek-chat ← breaks tool calls, streaming, auth
Fix: Complexity routing + multi-provider gateway
With Promptolian: score: 18 → haiku $0.80/M ✓ simple Q&A score: 42 → sonnet $3.00/M ✓ multi-turn score: 88 → opus $15.00/M ✓ deep reasoning model: deepseek-chat → same Anthropic SDK $0.14/M
21x cheaper with DeepSeek vs Opus No SDK change: just swap the model name
Measured, not estimated

Context quality across three systems

Factory.ai 6-dimension probe scoring · 25 sessions · May 2026

System Quality Compression Fact-loss rate
Promptolian
4.26 / 5
22% 14.8%
Anthropic built-in
3.44 / 5
98.7% 31.2%
OpenAI built-in
3.35 / 5
99.3% 33.0%
If a context failure costs your team more than 3.5 minutes to debug, Promptolian is cheaper than Anthropic built-in in total cost.

See the cost model →
Tool schema caching: interactive

See what happens on each call

Before: raw JSON schema
{ "name": "get_weather", "description": "Get current weather for a city", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": "City name or coordinates" }, "units": { "type": "string", "enum": ["celsius","fahrenheit"] } }, "required": ["location"] } }
After: Promptolian DSL 69% fewer tokens
FN get_weather: "Get current weather for a city" location:str* units:str["celsius","fahrenheit"]
Before: re-sent every call
(already sent this session: re-sending full JSON again) { "name": "get_weather", ...600 tokens of JSON... }
After: cached reference ~90% session average
TOOLS:[get_weather]
Pricing

Start free. Scale when you need it.

Tool deduplication and schema caching are free. KV-sandwich context compression requires an API key.

Free
$0
Self-hosted · no key needed
  • Tool result dedup (34.6% savings)
  • Tool schema caching (~90% savings)
  • DS4 / local model support (--upstream)
  • Working memory across sessions
  • Thinking compression + transparency
  • SQLite sessions
  • Claude + OpenAI compatible
Get started
Team
$49/mo
Up to 10 keys · per-project analytics
  • Everything in Solo
  • Up to 10 API keys
  • Per-project cost breakdown
  • Priority support
Subscribe

Full pricing details, ROI calculator → pricing.html

Quick Start

One line. Works with any agent.

  1. Install
    pip install "promptolian[proxy]"
  2. Start the proxy
    python -m promptolian.proxy
  3. Point your agent at it
    Change base_url to http://localhost:3002
  4. That's it
    Tool caching + context compression enabled. No other changes to your agent code.
# Before client = anthropic.Anthropic() # After: one line change client = anthropic.Anthropic( base_url="http://localhost:3002" ) # Tool dedup + schema caching: automatic (free) # Context compression: add PROMPTOLIAN_API_KEY for cloud KV-sandwich