Developer API · v1

Promptly API

Compress AI prompts programmatically. Plug Promptly into your pipeline to cut token costs before every LLM call — with 180+ compression rules, domain-specific vocabulary packs, and provable fact preservation.

Overview

The Promptly REST API exposes the same compression engine used in the browser extension. You send plain English text; you get back Promptolian-compressed text and metrics.

Base URL: https://api.promptly.so

All requests and responses use JSON. All endpoints require a Bearer token in the Authorization header.

The Developer API plan ($19/mo) includes 1 M tokens/day. Need more? Contact us for volume pricing.

Authentication

Pass your API key in the Authorization header as a Bearer token:

curl https://api.promptly.so/v1/health \
  -H "Authorization: Bearer YOUR_API_KEY"
import promptly

client = promptly.Client(api_key="YOUR_API_KEY")
# or: client = promptly.Client()  — reads PROMPTLY_API_KEY env var
use promptly::Client;

let client = Client::new("YOUR_API_KEY");
// or: Client::from_env()  — reads PROMPTLY_API_KEY
#include "promptly/client.h"

Promptly::Client client("YOUR_API_KEY");
// or: Promptly::Client::from_env()

Quickstart

Compress a prompt in one call:

curl -X POST https://api.promptly.so/v1/compress \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "You are an expert Python developer. Please write a function to sort a list and return only the code without any explanation.",
    "mode": "standard"
  }'
import promptly

client = promptly.Client(api_key="YOUR_API_KEY")

result = client.compress(
    text="You are an expert Python developer. Please write a function to sort a list and return only the code without any explanation.",
    mode="standard",
)

print(result.compressed)      # §EXP py developer. FN sort list →code ⊖explain.
print(result.compression_rate)  # 0.28  (28% tokens saved)
use promptly::{Client, CompressRequest};

let client = Client::new("YOUR_API_KEY");

let res = client
    .compress(CompressRequest {
        text: "You are an expert Python developer. Please write a function to sort a list.".into(),
        mode: "standard".into(),
        ..Default::default()
    })
    .await?;

println!("{}", res.compressed);
println!("CR: {:.1}%", res.compression_rate * 100.0);
#include "promptly/client.h"
#include <iostream>

Promptly::Client client("YOUR_API_KEY");

auto req = Promptly::CompressRequest{};
req.text = "You are an expert Python developer. Write a function to sort a list.";
req.mode = "standard";

auto res = client.compress(req);
std::cout << res.compressed << "\n";
std::cout << "CR: " << res.compression_rate * 100 << "%\n";

Response

{ "compressed": "§EXP py developer. FN sort list →code ⊖explain.", "original_tokens": 24, "compressed_tokens": 17, "compression_rate": 0.29, "mode": "standard", "domain": null, "processing_ms": 3 }

POST /v1/compress

Compress a single prompt. Returns the compressed text plus token metrics.

POST /v1/compress

Request body

FieldTypeDescription
text*stringThe prompt text to compress. Max 32,000 characters.
modeoptstringstandard (default), telegraphic, or adaptive. Telegraphic strips articles and copulas for maximum compression. Adaptive auto-selects based on Protected Token Ratio.
domainoptstringApply a domain vocabulary pack on top of standard rules: medical, academic, legal_pro, finance, data_science, social.
custom_rulesoptarrayArray of [pattern, replacement] string pairs. Pattern is treated as a word-boundary regex.
preserve_factsoptboolDefault true. Pre-pass protects proper nouns, numbers, URLs, emails, and file paths from compression.

Full example with all options

curl -X POST https://api.promptly.so/v1/compress \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Analyze this patient case. The patient has hypertension and type 2 diabetes. Calculate the glomerular filtration rate and provide a comprehensive metabolic panel interpretation.",
    "mode": "telegraphic",
    "domain": "medical",
    "custom_rules": [["patient case", "pt case"]],
    "preserve_facts": true
  }'
result = client.compress(
    text="Analyze this patient case. The patient has hypertension and type 2 diabetes. "
         "Calculate the glomerular filtration rate and provide a comprehensive metabolic panel interpretation.",
    mode="telegraphic",
    domain="medical",
    custom_rules=[["patient case", "pt case"]],
    preserve_facts=True,
)

print(result.compressed)
# ANLZ pt case. pt HTN T2DM. calc GFR →long CMP interpretation.
print(f"{result.compression_rate:.0%} compression")  # 38% compression
let res = client.compress(CompressRequest {
    text: "Analyze this patient case. The patient has hypertension and type 2 diabetes.".into(),
    mode: "telegraphic".into(),
    domain: Some("medical".into()),
    custom_rules: Some(vec![("patient case".into(), "pt case".into())]),
    preserve_facts: Some(true),
}).await?;

println!("{}", res.compressed);
auto req = Promptly::CompressRequest{};
req.text = "Analyze this patient case. The patient has hypertension and type 2 diabetes.";
req.mode = "telegraphic";
req.domain = "medical";
req.custom_rules = {{"patient case", "pt case"}};
req.preserve_facts = true;

auto res = client.compress(req);
std::cout << res.compressed << "\n";

POST /v1/compress/batch

Compress up to 100 prompts in a single request. Each item can specify its own mode and domain.

POST /v1/compress/batch
curl -X POST https://api.promptly.so/v1/compress/batch \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "items": [
      { "id": "p1", "text": "You are an expert Python developer. Write a unit test for a sorting function.", "mode": "standard" },
      { "id": "p2", "text": "Analyze the patient blood pressure and heart rate. Check for atrial fibrillation.", "mode": "telegraphic", "domain": "medical" }
    ]
  }'
results = client.compress_batch([
    {"id": "p1", "text": "You are an expert Python developer. Write a unit test for a sorting function.", "mode": "standard"},
    {"id": "p2", "text": "Analyze the patient blood pressure and heart rate.", "mode": "telegraphic", "domain": "medical"},
])

for r in results:
    print(f"{r.id}: {r.compressed}")
let results = client.compress_batch(vec![
    BatchItem { id: "p1".into(), text: "You are an expert Python developer...".into(), mode: "standard".into(), ..Default::default() },
    BatchItem { id: "p2".into(), text: "Analyze patient blood pressure...".into(), mode: "telegraphic".into(), domain: Some("medical".into()), ..Default::default() },
]).await?;
std::vector<Promptly::BatchItem> items = {
    {"p1", "You are an expert Python developer...", "standard", ""},
    {"p2", "Analyze patient blood pressure...",  "telegraphic", "medical"},
};
auto results = client.compress_batch(items);

Batch response

{ "results": [ { "id": "p1", "compressed": "§EXP py developer. TEST sort FN.", "original_tokens": 14, "compressed_tokens": 9, "compression_rate": 0.36 }, { "id": "p2", "compressed": "ANLZ pt BP HR. check AFib.", "original_tokens": 10, "compressed_tokens": 6, "compression_rate": 0.40 } ], "total_original_tokens": 24, "total_compressed_tokens": 15, "avg_compression_rate": 0.38, "processing_ms": 7 }

GET /v1/health

Returns server status and your plan quota. No body required.

GET /v1/health
{ "status": "ok", "plan": "developer", "tokens_today": 42310, "tokens_limit": 1000000, "reset_at": "2026-05-03T00:00:00Z" }

Compression modes

ModeWhat it doesTypical CR
standardApplies all 180+ symbol substitution and grammar rules. Protects facts (numbers, names, URLs). Best balance of compression and readability.12–22%
telegraphicStandard rules + strips articles (the/a/an), copulas (is/are/was), and filler adverbs. Maximum compression; output may read like abbreviated notes.17–36%
adaptiveComputes Protected Token Ratio (PTR = protected tokens / total). Uses telegraphic when PTR > 0.65, otherwise standard.12–36%

Domain packs

Domain packs add vocabulary rules on top of the standard set. Active when the domain field is set.

PackRules addedExamples
medical39 ruleshypertension→HTN, myocardial infarction→MI, complete blood count→CBC, diagnosis→Dx
academic33 rulesrandomized controlled trial→RCT, confidence interval→CI, null hypothesis→H₀, peer-reviewed→peer-rev
legal_pro25 rulesplaintiff→pltf, motion for summary judgment→MSJ, statute of limitations→SOL, deposition→depo
finance28 rulesmonthly recurring revenue→MRR, compound annual growth rate→CAGR, year-over-year→YoY
data_science31 rulesconvolutional neural network→CNN, exploratory data analysis→EDA, cross-validation→CV
social20 rulesclick-through rate→CTR, search engine optimization→SEO, call to action→CTA

Errors

All errors return a JSON body with error.code and error.message.

200Success
400invalid_request — missing or malformed fields
401unauthorized — missing or invalid API key
413text_too_long — input exceeds 32,000 characters
422invalid_domain — unrecognised domain pack name
429quota_exceeded — daily token limit reached. Resets at midnight UTC.
500internal_error — something went wrong on our end
// 429 example { "error": { "code": "quota_exceeded", "message": "Daily token limit of 1,000,000 reached. Resets 2026-05-03T00:00:00Z.", "docs": "https://promptly.so/docs#limits" } }

Rate limits

PlanTokens / dayRequests / minBatch size
Developer ($19/mo)1,000,000120100
Volume (custom)UnlimitedCustom1,000
Token counts use a word-boundary approximation: ⌈words × 1.3⌉. Both original and compressed token counts are measured so you can verify savings independently.

Python SDK

Install

pip install promptly-sdk

Usage

import promptly
import os

client = promptly.Client(api_key=os.environ["PROMPTLY_API_KEY"])

# Single compress
res = client.compress("You are an expert data scientist. Analyze this dataset and return the results as a JSON object.", mode="standard", domain="data_science")
print(res.compressed)   # §EXP data scientist. ANLZ dataset →json.

# Pipeline integration — compress before every OpenAI call
import openai

def chat(prompt: str) -> str:
    compressed = client.compress(prompt).compressed
    return openai.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": compressed}],
    ).choices[0].message.content

JavaScript / Node SDK

Install

npm install promptly-sdk
# or: npx promptly compress "your prompt here"  (CLI mode)

Usage

import { Promptly } from 'promptly-sdk';

const client = new Promptly({ apiKey: process.env.PROMPTLY_API_KEY });

// Single call
const { compressed, compressionRate } = await client.compress({
  text: 'You are an expert React developer. Refactor this component to use hooks.',
  mode: 'standard',
});
console.log(compressed);        // §EXP React developer. ∆ component → hooks.
console.log(compressionRate);   // 0.31

// CLI usage:
// $ npx promptly compress "You are an expert..." --mode telegraphic
// $ npx promptly compress --file prompt.txt --domain medical

Rust SDK

Cargo.toml

[dependencies]
promptly = "0.1"
tokio    = { version = "1", features = ["full"] }

Usage

use promptly::{Client, CompressRequest, Mode};

#[tokio::main]
async fn main() -> Result<(), promptly::Error> {
    let client = Client::from_env()?;  // reads PROMPTLY_API_KEY

    let res = client.compress(CompressRequest {
        text:   "Analyze this legal contract for indemnification clauses and jurisdiction.".into(),
        mode:   Mode::Telegraphic,
        domain: Some("legal_pro".into()),
        ..Default::default()
    }).await?;

    println!("Compressed: {}", res.compressed);
    println!("Saved {:.0}% of tokens", res.compression_rate * 100.0);
    Ok(())
}

C++ SDK

The C++ SDK uses libcurl and nlohmann/json. Header-only, C++17.

Install (CMake)

FetchContent_Declare(promptly
  GIT_REPOSITORY https://github.com/promptly-so/promptly-cpp
  GIT_TAG        v0.1.0
)
FetchContent_MakeAvailable(promptly)
target_link_libraries(my_target PRIVATE promptly::promptly)

Usage

#include "promptly/client.h"
#include <cstdlib>
#include <iostream>

int main() {
    Promptly::Client client(std::getenv("PROMPTLY_API_KEY"));

    auto req = Promptly::CompressRequest{};
    req.text   = "You are an expert at machine learning. Evaluate the model overfitting using cross-validation.";
    req.mode   = Promptly::Mode::Telegraphic;
    req.domain = "data_science";

    try {
        auto res = client.compress(req);
        std::cout << "Compressed: " << res.compressed << "\n";
        std::cout << "CR: "         << res.compression_rate * 100 << "%\n";
    } catch (const Promptly::QuotaExceeded& e) {
        std::cerr << "Quota exceeded: " << e.what() << "\n";
    }
}