Promptolian: Context Reliability Proxy for AI Agents

Benchmark results

4.26/5

Context quality score

34.6%

Tool output savings

~90%

Schema savings

21x

DeepSeek vs Opus

99%

Fact retention

Live

—

GitHub stars

—

Developer visits

Six problems,
one proxy

How Promptolian works

Six persistent, expensive problems. Fixed transparently, with no code changes required in your agent.

Problem 1

Tool schema re-sending

Every API call re-sends the full JSON tool definitions. 5 tools = ~600 tokens wasted per call. This happens silently on every request, across every session.

Without Promptolian: Call 1: [system] + [tools: 600 tok] + [message] → $0.0018 Call 2: [system] + [tools: 600 tok] + [message] → $0.0018 ← wasted Call 3: [system] + [tools: 600 tok] + [message] → $0.0018 ← wasted

Fix: Proxy caches schemas, re-injects with cache_control

With Promptolian (from call 2): Call 2: [system] + [tools: 60 tok cached] + [message] → $0.00018 ✓

~90% session average savings $24/mo saved at 500 calls/day

Problem 2

Context quality loss

Built-in compression (Anthropic/OpenAI) scores 3.44/5: it loses facts to save tokens. LLM summarisers write "database connection was discussed" instead of keeping postgres://db.prod/main.

Fix: KV-sandwich architecture

HEAD turns 1–2

Session framing, constraints, personas

VERBATIM

MIDDLE turns 3–N-4

Entity-encoded, filler pruned

COMPRESSED 22%

TAIL last 4 turns

Current task state

VERBATIM

4.26/5 quality score 14.8% fact-loss rate

Problem 3

Repeated tool outputs

Agentic workflows read the same files, run the same bash commands, and fetch the same API responses across multiple turns. Each repeat costs full tokens.

Without Promptolian: Call 1: read_file("config.yaml") → 400 tokens Call 2: read_file("config.yaml") → 400 tokens ← wasted Call 3: read_file("config.yaml") → 400 tokens ← wasted

Fix: Tool result deduplication (REF + DIFF)

With Promptolian: Call 2: [TOOL_CACHE_REF: same as call #1] → 5 tokens ✓ Call 3: [TOOL_CACHE_DIFF from call #1: +port: 5433] → 12 tokens ✓

34.6% token savings on tool outputs 99% fact retention Free: no API key needed

Problem 4

Local inference (DS4 / DeepSeek) burns context fast

Running DS4 locally is free: but a 284B model at 26 tokens/sec hits the 128K context wall in ~15 agentic turns. Thinking tokens from DeepSeek's reasoning mode accumulate silently, eating 2000+ tokens per turn. And when the session dies, the agent forgets everything.

Without Promptolian: turn 10 context breakdown: Thinking tokens (accumulated): 18,000 tok ← reasoning scratch work, useless now Actual conversation: 2,000 tok Repeated file reads: 12,000 tok ← same file read 6 times

Fix: --upstream + thinking compression + working memory

With Promptolian (--upstream http://localhost:8080): Thinking → compressed: 180 tok ✓ "[Thinking: using RS256 · callback URL must match]" Conversation: 2,000 tok ✓ File reads → deduplicated: 60 tok ✓ [TOOL_CACHE_REF: same as call #1] Working memory (next session): 80 tok ✓ "Fixed: token bug · Todo: migration script"

96% compression on code search 99% compression on logs 100% local: no data leaves your machine

Problem 5

Agents stuck in loops waste turns and hit the context wall

An agent that cannot find a file will try again. And again. And again. Each failed attempt costs tokens, pollutes the context, and brings you closer to the context wall: without making any progress. Most frameworks have no mechanism to detect or break this.

Without Promptolian: Turn 4: read_file("config.json") → file not found ← retried Turn 5: read_file("config.json") → file not found ← retried Turn 6: read_file("config.json") → file not found ← retried

Fix: Stuck-loop detection + ranked recovery strategies (no LLM)

With Promptolian: after 3 identical calls: [STUCK DETECTION: "read_file" called 3 times with identical inputs] Suggested strategies: 80% list the directory to see what files actually exist 65% check if a previous step was supposed to create this file 55% search for a similar name (.yml, .toml, .env)

Rule-based: zero LLM cost 7 error categories covered Free: no API key needed

Problem 6

Over-paying for model capacity

Most agent calls don't need Claude Opus. Simple lookups and drafting tasks get routed to the same expensive model as your hardest reasoning problems. And when you want cheaper alternatives like DeepSeek, switching SDKs breaks your whole stack.

Without Promptolian: All calls → claude-opus-4-8 $15/M tokens ← overkill for 80% of tasks Switch SDK → deepseek-chat ← breaks tool calls, streaming, auth

Fix: Complexity routing + multi-provider gateway

With Promptolian: score: 18 → haiku $0.80/M ✓ simple Q&A score: 42 → sonnet $3.00/M ✓ multi-turn score: 88 → opus $15.00/M ✓ deep reasoning model: deepseek-chat → same Anthropic SDK $0.14/M ✓

21x cheaper with DeepSeek vs Opus No SDK change: just swap the model name

Measured, not estimated

Context quality across three systems

Factory.ai 6-dimension probe scoring · 25 sessions · May 2026

System	Quality	Compression	Fact-loss rate
Promptolian ✦	4.26 / 5	22%	14.8%
Anthropic built-in	3.44 / 5	98.7%	31.2%
OpenAI built-in	3.35 / 5	99.3%	33.0%

If a context failure costs your team more than 3.5 minutes to debug, Promptolian is cheaper than Anthropic built-in in total cost.

See the cost model →

Tool schema caching: interactive

See what happens on each call

Before: raw JSON schema

{ "name": "get_weather", "description": "Get current weather for a city", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": "City name or coordinates" }, "units": { "type": "string", "enum": ["celsius","fahrenheit"] } }, "required": ["location"] } }

After: Promptolian DSL 69% fewer tokens

FN get_weather: "Get current weather for a city" location:str* units:str["celsius","fahrenheit"]

Before: re-sent every call

(already sent this session: re-sending full JSON again) { "name": "get_weather", ...600 tokens of JSON... }

After: cached reference ~90% session average

TOOLS:[get_weather]

Pricing

Start free. Scale when you need it.

Tool deduplication and schema caching are free. KV-sandwich context compression requires an API key.

Free

Self-hosted · no key needed

Tool result dedup (34.6% savings)
Tool schema caching (~90% savings)
DS4 / local model support (--upstream)
Working memory across sessions
Thinking compression + transparency
SQLite sessions
Claude + OpenAI compatible

Get started

FEATURED

Solo

$9/mo

+ API key · cloud KV-sandwich

Everything in Free
KV-sandwich compression (4.26/5)
Session reset, no context wall
Stuck-loop detection + ranked recovery strategies
Complexity routing, Haiku to Opus
Multi-provider gateway (DeepSeek, GLM)
Task tracking + savings dashboard

Team

$49/mo

Up to 10 keys · per-project analytics

Everything in Solo
Up to 10 API keys
Per-project cost breakdown
Priority support

Full pricing details, ROI calculator → pricing.html

Smart Token Discipline
for AI Agents.

How Promptolian works

Tool schema re-sending

Context quality loss

Repeated tool outputs

Local inference (DS4 / DeepSeek) burns context fast

Agents stuck in loops waste turns and hit the context wall

Over-paying for model capacity

Context quality across three systems

See what happens on each call

Start free. Scale when you need it.

One line. Works with any agent.

Smart Token Disciplinefor AI Agents.

How Promptolian works

Tool schema re-sending

Context quality loss

Repeated tool outputs

Local inference (DS4 / DeepSeek) burns context fast

Agents stuck in loops waste turns and hit the context wall

Over-paying for model capacity

Context quality across three systems

See what happens on each call

Start free. Scale when you need it.

One line. Works with any agent.

Smart Token Discipline
for AI Agents.