Cut your Claude Code token bill by up to 70%

condense.chat is a drop-in proxy that shrinks your prompts, context and tool output before the model sees them — same model, same results, a fraction of the input tokens.

runs claude through the proxy — sign up to get access, no key swap. · ↻ replay
benchmark

Numbers, not adjectives.

avg compression
70%
typical saving across 12k real agent sessions — half saved even more.
proxy added latency
254ms
average latency the proxy adds to a request. 95% stay under 749 ms.
accuracy kept after compression
9/10
answers stay correct when the model reads only the compressed context.
claude code · before / after

Same session. A fraction of the tokens.

The same prompts, the same model, the same answer. When a loop of work closes, condense's own model rewrites it for the next request — tool calls kept as a digest, file dumps and test logs compressed into a short written summary. Not truncation: a model writes what mattered.

claude
context58% 0 tok
claude · via condense.chat
context20% 0 tok
sent upstream via condense saved answer
integrate

One line. Any stack.

Drop-in proxy for the OpenAI SDK and Claude Code — just point your existing client at api.condense.chat with an ak_ key in X-Condense-Auth-Token. Or switch to X-Condense-Function: rewrite to get the compressed request body back without an upstream call. Your model, your tools, your evals — ours just makes them cheaper.

→ zero retraining, zero tuning
→ streams transparently
proxy or rewrite via X-Condense-Function

            
use cases

Wherever context is the bill.

coding agents

Ship longer sessions.

Tool outputs, file reads, test runs — the stuff that eats your window. Condense rewrites it on the edge, every turn, so sessions don't collapse into compact-and-lose-everything.

typical saving−64%
RAG pipelines

Fit more retrieved docs.

Pack 3× the chunks into the same window without re-ranking or dropping recall. Faithfulness holds at 90 on LongMemEval even with long, citation-heavy payloads.

typical saving−70%
chat products

Cut your per-msg cost.

System prompt + history + tool schemas add up fast when you're serving millions of turns. Condense runs once per request, transparent to your SDK.

typical saving−55%
Built by engineers shipping LLM infra at
Nord Security Kiloverse nexos.ai basedcollective_

Your next turn starts with fewer tokens.

Sign up to claim a key, then drop one command into your terminal.

Sign up →

Already have an account?