Skip to main content

Token Saving

Reduce token usage by compressing input content automatically.

How It Works

Request Arrives
{"name": "John", "age": 30}
14 input tokens
QuilrAI Compresses
name:John|age:30
8 input tokens
Sent to LLM
43% tokens saved
Same response quality
QuilrAI
  1. Request Arrives - Your app sends a normal API call
  2. Gateway Compresses - Content is transformed to use fewer tokens
  3. Forwarded to LLM - Optimized content sent - same accuracy, lower cost

Compression Methods

Smart JSON Compression - Up to 20% savings

Converts JSON objects in LLM inputs to TOON format - ideal for tool call responses and structured data.

BeforeAfter
{"name": "John", "age": 30}name:John|age:30

HTML to Text

Strips HTML tags and extracts clean text - removes markup overhead from scraped pages or rich content.

BeforeAfter
<p class="intro"><b>Hello</b> world</p>Hello world

Markdown to Text

Removes Markdown syntax characters that consume tokens without adding meaning for the LLM.

BeforeAfter
## Hello **world**Hello world

Seamless and Input-Only

Compression is applied only to input tokens before they reach the LLM. Responses are returned untouched. Your application code stays exactly the same - no SDK changes, no prompt rewrites, just lower costs.