TokenPak — local LLM proxy

Local. Context-lean. Measurable.

TokenPak is a local proxy that packs your LLM context before it hits the API — fewer repeated tokens on the wire, more reuse opportunities, and savings you can measure per-request.

Get Started View on GitHub

TokenPak doesn't replace your AI stack.

It adds the missing logistics layer: packing, routing, reusable context, guardrails, orchestration, and per-request records.

Six capabilities

Pack

Deterministically pack context before it goes on the wire — stop re-sending the same boilerplate, file contents, and system prompts.

Route

Send each request to the right model, with fallback rules you control.

Reuse

Recall and reuse context from local PAKs instead of rebuilding it every session.

Guard

Spend and safety guardrails run as a side-channel gatehouse across every request.

Dispatch

Coordinate scoped, multi-step and multi-agent work. Preview (alpha) — not yet in a published release.

Record

Every request leaves a local record — what was sent, reused, observed, and charged.

How a request flows

One linear path, packed and recorded locally before it leaves your machine.

Request flow

Raw context
→
Packing Station
→
Route
→
Send
→
Provider
→
Record

Reuse feed: PAKs feed into the Packing Station — recall and reuse context instead of rebuilding it.

Gatehouse (Guard): spend and safety guardrails run as a side-channel overlay before Send.

Where TokenPak fits

TokenPak overlaps with tools you may already run — and adds a layer they don't. Run them together.

How it sits alongside gateways, observability tools, and MCP

Where it fits

Your agent

Claude Code
Cursor · Cline
→
TokenPak (local)

pack · measure
guard · record
→
Your gateway

LiteLLM /
OpenRouter / …
→
Model provider

Anthropic · OpenAI
Google Gemini

Dispatch Center dashboard (preview) — orchestration for multi-step and multi-agent work, around the flow.

Tool type	Overlaps on	TokenPak adds	Together?
Gateways / routers	routing	deterministic context packing + reusable PAKs	Run both.
Observability tools	measurement	pre-send packing + per-request context-level attribution	Run both.
MCP-based workflows	ecosystem coordination	a semantic contract (TIP) for packing, routing, cost, telemetry	Composes.

Full comparison

Savings

TokenPak avoids re-sending context your tools already sent. We don't publish savings numbers yet — receipt-backed measurements will land with the benchmark work. Until then, every request leaves a local record you can check yourself.

Open source & Pro

TokenPak's core is open source (Apache-2.0). Pro adds team dashboards, advanced routing, and enterprise controls.

Pro is delivered as the tokenpak-paid package via a separate index.

Latest release

Release

TokenPak v1.12.0

v1.12.0 · Jul 10, 2026

latest

### Added - **Codex receipt-only launch mode.** `--receipt-only` (requiring `--receipt-out` and `--run-id`, mutually exclusive with `--budget` and `--install-only`) lets a launch emit accounting receipts without installing the TokenPak mechanism, and launches without a request body now produce **no-body accounting receipts** so accounting stays truthful instead of silently dropping those events. - **Canonical staging-to-public promotion tooling.** `scripts/promote-staging-to-public.sh` and `scripts/promotion-drift-report.sh` codify the promotion train between the staging and pu

GitHub PyPI Changelog

Documentation

starter

Try it locally.

pip install tokenpak && tokenpak setup — then point your AI client at the local proxy. No cloud component; credentials stay in your environment and provider flow.

Read the docs