Open Benchmarks

Real numbers.
No fabrication.

Every metric below was computed by running actual prompts through the engine and measuring the output. Numbers are committed to the repo.

200 prompts
5 domains
3 tiers
EN · ES · FR · DE · IT
Token counting: tiktoken cl100k_base
Engine: v4.0 · May 2026
9.2%
Standard tier
avg CR — all prompts
20.1%
Developer tier
avg CR — long prompts
69%
Tool schemas
CR — turn 1 (JSON → DSL)
92%
Tool schemas
CR — 5-turn session avg
89.9%
Fact preservation
across all tiers
Results by domain
200 English prompts (150 short + 50 long) across 5 professional domains. Engine v4 — fully deterministic, no external API. Token counts via tiktoken cl100k_base.
Domain Standard CR Pro CR Developer CR Fact preservation
💹Financen=40
10.8%
11.5%
12.9%
90%
💬Social / Marketingn=40
8.6%
9.4%
9.7%
90%
💻Programmingn=40
11.0%
11.7%
11.8%
90%
⚖️Legaln=40
8.9%
10.3%
11.1%
89%
🏥Medicaln=40
6.7%
6.8%
7.1%
89%

Developer tier applies domain-specific abbreviation packs (MRR, ARR, NDA, IP, SLA, BP…) + spaCy clause pruning + parenthetical removal. All processing is local — no external API calls.

Tier Comparison
What you get at each tier
Numbers, URLs, names, and entities are always protected across all tiers.
Standard · Free
~9%
avg CR (up to 16% on long prompts)
  • Symbol substitution (§EXP, §NOT, →code…)
  • Phrase simplification
  • EN · ES · FR · DE · IT
  • Rule-based, instant, deterministic
  • No external API calls
Developer · $29/mo
~11%
avg CR on prompts · up to 92% on tool schemas
  • Everything in Pro
  • Domain packs (finance, legal, medical, code)
  • spaCy clause pruning
  • Context engine (multi-turn delta pruning)
  • Tool schema DSL compiler (69–97% per session)
  • 100% local — no external API calls
Live Examples
From the benchmark set
Real prompts from the dataset, showing original and Developer-tier output.
💹 Finance 71% smaller
Original · 41 tokens
Analyze the monthly recurring revenue growth for Q1 2026. Revenue increased from $2.4M to $2.9M, a 20.8% quarter-over-quarter gain. Project the annual recurring revenue trajectory for the full year.
Developer · 12 tokens
ANLZ MRR growth Q1 2026: $2.4M→$2.9M (+20.8% QoQ). Project ARR full year.
💻 Programming ~33% smaller
Original
You are an expert Python developer. Please review this function and return only the code with the bug fixed. Do not include any explanation.
Standard
§EXP py developer. « this FN →code BUG fixed. §NOT include any ? .
💬 Marketing ~45% smaller
Original
Summarize the key benefits of our product for a social media post. Be very concise. Return as bullet list. Important: focus on the unique value proposition.
Pro
∑ key benefits for social media post. →short. →list. !!: focus on unique value prop.
Agent Layer
Tool schema compression
JSON tool definitions are the largest token consumer in agent workloads — often larger than the user prompt itself. Promptolian compiles them to a compact function-signature DSL and caches them across turns.
69.4%
Turn 1 CR
JSON → DSL compilation
97.5%
Turn 2+ CR
session cache reference
91.9%
5-turn session
average across all turns
JSON schema — 111 tokens
{"type":"function","function":{"name":"search_web","description":"Search the web for up-to-date information on any topic","parameters":{"type":"object","properties":{"query":{"type":"string","description":"The search query"},"num_results":{"type":"integer","default":10},"language":{"type":"string","default":"en"},"safe_search":{"type":"boolean","default":true}},"required":["query"]}}}
DSL — 34 tokens (69% smaller)
search_web(query, language='en', num_results: int=10, safe_search: bool=True)
  # Search the web for up-to-date information…

# Turn 2+:
TOOLS:[search_web,get_user,create_order,…]
# → 30 tokens (97.5% CR)
Tool Raw JSON tokens DSL tokens (turn 1) CR turn 1
search_web1113469.4%
get_user983069.4%
create_order1333375.2%
run_code1194165.5%
query_database1394071.2%
send_email1444270.8%
Total (10 tools)1,22037369.4%

Type elision: 61% of parameters have their type annotation dropped because it is inferrable from the name alone (user_idstr, include_ordersbool, limitint). Enum fields are rendered inline as active|inactive|suspended instead of a full JSON array.

Methodology
How we measure
All measurements are reproducible. The experiment code is in the repo.

Token counting

All token counts use tiktoken cl100k_base (OpenAI's tokenizer, used by GPT-4 and Claude approximation). Older experiments used a words×1.3 heuristic — clearly marked.

Compression rate (CR)

(1 - compressed_tokens / original_tokens) × 100. Measured on the prompt sent to the LLM, not the response.

Fact preservation rate (FPR)

Regex extraction of numbers, URLs, named entities, and quoted strings from the original. We verify each appears in the compressed output (or its decoded form). FPR = retained / total × 100.

Developer tier (NLP pipeline)

Pro output is passed through: (1) domain pack — 60+ domain-specific abbreviations detected via keyword scoring, (2) spaCy dependency parse — non-restrictive relative clauses and low-information adverbial clauses removed, (3) parenthetical pruner — non-numeric parentheticals ≥4 words removed. Fully local, no external API calls.

Prompt benchmark: domain_experiment_v4.py · Tool benchmark: tool_compression_experiment.py · Engine: engine_v4.py · API: POST /compress-tools