Cut your Claude Code token bill by up to 70%
condense.chat is a drop-in proxy that shrinks your prompts, context and tool output before the model sees them — same model, same results, a fraction of the input tokens.
claude through the proxy — sign up to
get access, no key swap.
·
↻ replay
Numbers, not adjectives.
Same session. A fraction of the tokens.
The same prompts, the same model, the same answer. When a loop of work closes, condense's own model rewrites it for the next request — tool calls kept as a digest, file dumps and test logs compressed into a short written summary. Not truncation: a model writes what mattered.
One line. Any stack.
Drop-in proxy for the OpenAI SDK and Claude Code — just
point your existing client at
api.condense.chat
with an
ak_
key in
X-Condense-Auth-Token. Or switch to
X-Condense-Function: rewrite
to get the compressed request body back without an upstream
call. Your model, your tools, your evals — ours just makes
them cheaper.
proxy
or
rewrite
via
X-Condense-Function
Wherever context is the bill.
Ship longer sessions.
Tool outputs, file reads, test runs — the stuff that eats your window. Condense rewrites it on the edge, every turn, so sessions don't collapse into compact-and-lose-everything.
Fit more retrieved docs.
Pack 3× the chunks into the same window without re-ranking or dropping recall. Faithfulness holds at 90 on LongMemEval even with long, citation-heavy payloads.
Cut your per-msg cost.
System prompt + history + tool schemas add up fast when you're serving millions of turns. Condense runs once per request, transparent to your SDK.