Low latency
Under 50ms compression overhead on typical agent prompts. Your agent does not feel slower.
TokenPak — local LLM proxy
TokenPak is a local proxy that compresses your LLM context before it hits the API — fewer tokens, lower cost, same results.
Built for the parts of an agent workload you cannot renegotiate.
Under 50ms compression overhead on typical agent prompts. Your agent does not feel slower.
pip install tokenpak && tokenpak setup. Interactive wizard detects your API keys, picks a compression profile, and starts the proxy. Per-client auto-integration (tokenpak integrate) is on the roadmap.
Claude Code, Cursor, Cline, Continue, Aider, OpenAI SDK, Anthropic SDK, LiteLLM, Codex. No plugin rewrites.
No cloud component. Your credentials stay in your environment and the provider flow you already use; compression happens on your machine before the request leaves.
Three steps. No cloud, no rewrites.
Step 1
pip install tokenpak. Runs at 127.0.0.1 as a local proxy.
Step 2
tokenpak setup — interactive wizard wires your keys + starts the proxy. Every outbound request goes through Prompt Packing: TokenPak selects, reduces, and structures context into a Pak, then moves it through TIP (the TokenPak Integration Protocol) to the provider.
Step 3
Tokens saved and costs avoided are logged per-request in the Savings Ledger — attributable to cause (Prompt Packing, cache hit) and origin (proxy vs. provider).
Refreshed automatically on every release + on a daily safety-net schedule.
Release v1.5.6
Curated entry points sourced from tokenpak/docs.
starter
Install TokenPak and see savings in one command.
starter
pip install options, OS notes, troubleshooting install failures.
reference
Every tokenpak verb, flag, and exit code.
reference
Proxy-centered model; three planes; 18 subsystems at a glance.
support
Common symptoms and the fixes that work.
pip install tokenpak && tokenpak setup. No cloud component; credentials stay in your environment and provider flow.