Context Compression: Quality and Cost Analysis

How context quality degrades with compression rate, and the cost consequence of that degradation. Quality scores measured across 25 sessions using Factory.ai 6-dimension methodology.

Figure 1

Context quality score vs. compression rate

Context quality score (0–5) as a function of compression rate, measured using Factory.ai's 6-dimension probe methodology. Circles mark the three measured operating points. The curve is an exponential decay fitted to the measurements. Background zones indicate quality thresholds: green ≥ 4.0 (safe), amber 3.5–4.0 (degraded), red < 3.5 (poor). Promptolian (22%) operates within the safe zone; both provider built-ins (~99%) fall in the poor zone.

Figure 2

Monthly cost: API tokens + engineer rework time

Total monthly cost (API tokens + engineer time spent fixing context failures) for a single developer running ~100 sessions/month. Assumptions: 50 calls/session × 8K context tokens × $3/MTok input = $1.20 context spend per session; engineer time valued at $100/hr loaded rate. Fact-loss rate per system is derived directly from quality scores (Promptolian 14.8%, Anthropic 31.2%, OpenAI 33.0%) — no assumed endpoints. At zero debugging time, Anthropic built-in appears cheapest (99% token savings). Above ~3.4 minutes per failure, Promptolian is cheaper than Anthropic built-in — the quality gap starts costing more than the token savings. "No compression" is shown as a flat reference; it is only viable within the model's context window limit.

Quality scores: Factory.ai 6-dimension probe scoring (Accuracy, Context, Artifact, Completeness, Continuity, Instruction). Anthropic and OpenAI baselines from Factory.ai May 2026 study. Promptolian: internal benchmark, 25 sessions, same methodology. Fact-loss rate = 1 − quality/5, derived from benchmark scores. Full methodology: promptolian.com/benchmarks.html