Cut your Claude Code token bill by up to 70%

condense.chat is a drop-in proxy that shrinks your prompts, context and tool output before the model sees them — same model, same results, a fraction of the input tokens.

runs claude through the proxy — sign up to get access, no key swap. · ↻ replay

benchmark

Numbers, not adjectives.

avg compression

70%

typical saving across 12k real agent sessions — half saved even more.

proxy added latency

                254ms
              

average latency the proxy adds to a request. 95% stay under 749 ms.

accuracy kept after compression

                9/10
              

answers stay correct when the model reads only the compressed context.

claude code · before / after

Same session. A fraction of the tokens.

The same prompts, the same model, the same answer. When a loop of work closes, condense's own model rewrites it for the next request — tool calls kept as a digest, file dumps and test logs compressed into a short written summary. Not truncation: a model writes what mattered.

claude

context58% 0 tok

claude · via condense.chat

context20% 0 tok

sent upstream— via condense— saved— answer—

integrate

One line. Any stack.

Drop-in proxy for the OpenAI SDK and Claude Code — just point your existing client at api.condense.chat with an ak_ key in X-Condense-Auth-Token. Or switch to X-Condense-Function: rewrite to get the compressed request body back without an upstream call. Your model, your tools, your evals — ours just makes them cheaper.

→ zero retraining, zero tuning
→ streams transparently

                  →
                  proxy
                  or
                  rewrite
                  via
                  X-Condense-Function
                

                  →
                  read the docs
                

use cases

Wherever context is the bill.

coding agents

Ship longer sessions.

Tool outputs, file reads, test runs — the stuff that eats your window. Condense rewrites it on the edge, every turn, so sessions don't collapse into compact-and-lose-everything.

typical saving−64%

RAG pipelines

Fit more retrieved docs.

Pack 3× the chunks into the same window without re-ranking or dropping recall. Faithfulness holds at 90 on LongMemEval even with long, citation-heavy payloads.

typical saving−70%

chat products

Cut your per-msg cost.

System prompt + history + tool schemas add up fast when you're serving millions of turns. Condense runs once per request, transparent to your SDK.

typical saving−55%

Built by engineers shipping LLM infra at

Nord Security

Kiloverse

nexos.ai basedcollective_

Your next turn starts with fewer tokens.

Already have an account? Log in