Promptly API
Compress AI prompts programmatically. Plug Promptly into your pipeline to cut token costs before every LLM call — with 180+ compression rules, domain-specific vocabulary packs, and provable fact preservation.
Overview
The Promptly REST API exposes the same compression engine used in the browser extension. You send plain English text; you get back Promptolian-compressed text and metrics.
Base URL: https://api.promptly.so
All requests and responses use JSON. All endpoints require a Bearer token in the Authorization header.
Authentication
Pass your API key in the Authorization header as a Bearer token:
curl https://api.promptly.so/v1/health \
-H "Authorization: Bearer YOUR_API_KEY"
import promptly
client = promptly.Client(api_key="YOUR_API_KEY")
# or: client = promptly.Client() — reads PROMPTLY_API_KEY env var
use promptly::Client;
let client = Client::new("YOUR_API_KEY");
// or: Client::from_env() — reads PROMPTLY_API_KEY
#include "promptly/client.h"
Promptly::Client client("YOUR_API_KEY");
// or: Promptly::Client::from_env()
Quickstart
Compress a prompt in one call:
curl -X POST https://api.promptly.so/v1/compress \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"text": "You are an expert Python developer. Please write a function to sort a list and return only the code without any explanation.",
"mode": "standard"
}'
import promptly
client = promptly.Client(api_key="YOUR_API_KEY")
result = client.compress(
text="You are an expert Python developer. Please write a function to sort a list and return only the code without any explanation.",
mode="standard",
)
print(result.compressed) # §EXP py developer. FN sort list →code ⊖explain.
print(result.compression_rate) # 0.28 (28% tokens saved)
use promptly::{Client, CompressRequest};
let client = Client::new("YOUR_API_KEY");
let res = client
.compress(CompressRequest {
text: "You are an expert Python developer. Please write a function to sort a list.".into(),
mode: "standard".into(),
..Default::default()
})
.await?;
println!("{}", res.compressed);
println!("CR: {:.1}%", res.compression_rate * 100.0);
#include "promptly/client.h"
#include <iostream>
Promptly::Client client("YOUR_API_KEY");
auto req = Promptly::CompressRequest{};
req.text = "You are an expert Python developer. Write a function to sort a list.";
req.mode = "standard";
auto res = client.compress(req);
std::cout << res.compressed << "\n";
std::cout << "CR: " << res.compression_rate * 100 << "%\n";
Response
POST /v1/compress
Compress a single prompt. Returns the compressed text plus token metrics.
Request body
| Field | Type | Description |
|---|---|---|
| text* | string | The prompt text to compress. Max 32,000 characters. |
| modeopt | string | standard (default), telegraphic, or adaptive. Telegraphic strips articles and copulas for maximum compression. Adaptive auto-selects based on Protected Token Ratio. |
| domainopt | string | Apply a domain vocabulary pack on top of standard rules: medical, academic, legal_pro, finance, data_science, social. |
| custom_rulesopt | array | Array of [pattern, replacement] string pairs. Pattern is treated as a word-boundary regex. |
| preserve_factsopt | bool | Default true. Pre-pass protects proper nouns, numbers, URLs, emails, and file paths from compression. |
Full example with all options
curl -X POST https://api.promptly.so/v1/compress \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"text": "Analyze this patient case. The patient has hypertension and type 2 diabetes. Calculate the glomerular filtration rate and provide a comprehensive metabolic panel interpretation.",
"mode": "telegraphic",
"domain": "medical",
"custom_rules": [["patient case", "pt case"]],
"preserve_facts": true
}'
result = client.compress(
text="Analyze this patient case. The patient has hypertension and type 2 diabetes. "
"Calculate the glomerular filtration rate and provide a comprehensive metabolic panel interpretation.",
mode="telegraphic",
domain="medical",
custom_rules=[["patient case", "pt case"]],
preserve_facts=True,
)
print(result.compressed)
# ANLZ pt case. pt HTN T2DM. calc GFR →long CMP interpretation.
print(f"{result.compression_rate:.0%} compression") # 38% compression
let res = client.compress(CompressRequest {
text: "Analyze this patient case. The patient has hypertension and type 2 diabetes.".into(),
mode: "telegraphic".into(),
domain: Some("medical".into()),
custom_rules: Some(vec![("patient case".into(), "pt case".into())]),
preserve_facts: Some(true),
}).await?;
println!("{}", res.compressed);
auto req = Promptly::CompressRequest{};
req.text = "Analyze this patient case. The patient has hypertension and type 2 diabetes.";
req.mode = "telegraphic";
req.domain = "medical";
req.custom_rules = {{"patient case", "pt case"}};
req.preserve_facts = true;
auto res = client.compress(req);
std::cout << res.compressed << "\n";
POST /v1/compress/batch
Compress up to 100 prompts in a single request. Each item can specify its own mode and domain.
curl -X POST https://api.promptly.so/v1/compress/batch \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"items": [
{ "id": "p1", "text": "You are an expert Python developer. Write a unit test for a sorting function.", "mode": "standard" },
{ "id": "p2", "text": "Analyze the patient blood pressure and heart rate. Check for atrial fibrillation.", "mode": "telegraphic", "domain": "medical" }
]
}'
results = client.compress_batch([
{"id": "p1", "text": "You are an expert Python developer. Write a unit test for a sorting function.", "mode": "standard"},
{"id": "p2", "text": "Analyze the patient blood pressure and heart rate.", "mode": "telegraphic", "domain": "medical"},
])
for r in results:
print(f"{r.id}: {r.compressed}")
let results = client.compress_batch(vec![
BatchItem { id: "p1".into(), text: "You are an expert Python developer...".into(), mode: "standard".into(), ..Default::default() },
BatchItem { id: "p2".into(), text: "Analyze patient blood pressure...".into(), mode: "telegraphic".into(), domain: Some("medical".into()), ..Default::default() },
]).await?;
std::vector<Promptly::BatchItem> items = {
{"p1", "You are an expert Python developer...", "standard", ""},
{"p2", "Analyze patient blood pressure...", "telegraphic", "medical"},
};
auto results = client.compress_batch(items);
Batch response
GET /v1/health
Returns server status and your plan quota. No body required.
Compression modes
| Mode | What it does | Typical CR |
|---|---|---|
| standard | Applies all 180+ symbol substitution and grammar rules. Protects facts (numbers, names, URLs). Best balance of compression and readability. | 12–22% |
| telegraphic | Standard rules + strips articles (the/a/an), copulas (is/are/was), and filler adverbs. Maximum compression; output may read like abbreviated notes. | 17–36% |
| adaptive | Computes Protected Token Ratio (PTR = protected tokens / total). Uses telegraphic when PTR > 0.65, otherwise standard. | 12–36% |
Domain packs
Domain packs add vocabulary rules on top of the standard set. Active when the domain field is set.
| Pack | Rules added | Examples |
|---|---|---|
| medical | 39 rules | hypertension→HTN, myocardial infarction→MI, complete blood count→CBC, diagnosis→Dx |
| academic | 33 rules | randomized controlled trial→RCT, confidence interval→CI, null hypothesis→H₀, peer-reviewed→peer-rev |
| legal_pro | 25 rules | plaintiff→pltf, motion for summary judgment→MSJ, statute of limitations→SOL, deposition→depo |
| finance | 28 rules | monthly recurring revenue→MRR, compound annual growth rate→CAGR, year-over-year→YoY |
| data_science | 31 rules | convolutional neural network→CNN, exploratory data analysis→EDA, cross-validation→CV |
| social | 20 rules | click-through rate→CTR, search engine optimization→SEO, call to action→CTA |
Errors
All errors return a JSON body with error.code and error.message.
| 200 | Success |
| 400 | invalid_request — missing or malformed fields |
| 401 | unauthorized — missing or invalid API key |
| 413 | text_too_long — input exceeds 32,000 characters |
| 422 | invalid_domain — unrecognised domain pack name |
| 429 | quota_exceeded — daily token limit reached. Resets at midnight UTC. |
| 500 | internal_error — something went wrong on our end |
Rate limits
| Plan | Tokens / day | Requests / min | Batch size |
|---|---|---|---|
| Developer ($19/mo) | 1,000,000 | 120 | 100 |
| Volume (custom) | Unlimited | Custom | 1,000 |
⌈words × 1.3⌉. Both original and compressed token counts are measured so you can verify savings independently.
Python SDK
Install
pip install promptly-sdk
Usage
import promptly
import os
client = promptly.Client(api_key=os.environ["PROMPTLY_API_KEY"])
# Single compress
res = client.compress("You are an expert data scientist. Analyze this dataset and return the results as a JSON object.", mode="standard", domain="data_science")
print(res.compressed) # §EXP data scientist. ANLZ dataset →json.
# Pipeline integration — compress before every OpenAI call
import openai
def chat(prompt: str) -> str:
compressed = client.compress(prompt).compressed
return openai.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": compressed}],
).choices[0].message.content
JavaScript / Node SDK
Install
npm install promptly-sdk
# or: npx promptly compress "your prompt here" (CLI mode)
Usage
import { Promptly } from 'promptly-sdk';
const client = new Promptly({ apiKey: process.env.PROMPTLY_API_KEY });
// Single call
const { compressed, compressionRate } = await client.compress({
text: 'You are an expert React developer. Refactor this component to use hooks.',
mode: 'standard',
});
console.log(compressed); // §EXP React developer. ∆ component → hooks.
console.log(compressionRate); // 0.31
// CLI usage:
// $ npx promptly compress "You are an expert..." --mode telegraphic
// $ npx promptly compress --file prompt.txt --domain medical
Rust SDK
Cargo.toml
[dependencies]
promptly = "0.1"
tokio = { version = "1", features = ["full"] }
Usage
use promptly::{Client, CompressRequest, Mode};
#[tokio::main]
async fn main() -> Result<(), promptly::Error> {
let client = Client::from_env()?; // reads PROMPTLY_API_KEY
let res = client.compress(CompressRequest {
text: "Analyze this legal contract for indemnification clauses and jurisdiction.".into(),
mode: Mode::Telegraphic,
domain: Some("legal_pro".into()),
..Default::default()
}).await?;
println!("Compressed: {}", res.compressed);
println!("Saved {:.0}% of tokens", res.compression_rate * 100.0);
Ok(())
}
C++ SDK
The C++ SDK uses libcurl and nlohmann/json. Header-only, C++17.
Install (CMake)
FetchContent_Declare(promptly
GIT_REPOSITORY https://github.com/promptly-so/promptly-cpp
GIT_TAG v0.1.0
)
FetchContent_MakeAvailable(promptly)
target_link_libraries(my_target PRIVATE promptly::promptly)
Usage
#include "promptly/client.h"
#include <cstdlib>
#include <iostream>
int main() {
Promptly::Client client(std::getenv("PROMPTLY_API_KEY"));
auto req = Promptly::CompressRequest{};
req.text = "You are an expert at machine learning. Evaluate the model overfitting using cross-validation.";
req.mode = Promptly::Mode::Telegraphic;
req.domain = "data_science";
try {
auto res = client.compress(req);
std::cout << "Compressed: " << res.compressed << "\n";
std::cout << "CR: " << res.compression_rate * 100 << "%\n";
} catch (const Promptly::QuotaExceeded& e) {
std::cerr << "Quota exceeded: " << e.what() << "\n";
}
}