Cut Claude Code Token Costs 60-90% With rtk: Hands-On Guide

rtk Claude Code token reduction — terminal split-screen showing raw verbose output vs filtered compact output

Jenny Ouyang's two-month Claude Code bill came to $1,600. She wrote up the autopsy on her Build to Launch Substack, and the diagnosis was unambiguous: the prompts were not the problem. The tool output was. "Every time Claude reads a file, runs a shell command, or calls an MCP server, the full output gets appended to context," she wrote. By message 40 of a session, she was paying for everything that came before — over and over.

If you have ever seen Claude Code helpfully run git log only to dump 800 lines of merge commits into its working memory, you already understand the shape of this problem. The fix that the open-source community has rallied around in 2026 is a 4 MB Rust binary called rtk — short for Rust Token Killer. It sits between your AI agent and your shell, intercepting noisy commands and returning compact, LLM-friendly summaries before the bytes ever reach the context window. The README claims 60–90% token reduction on common dev commands. Independent users report 70–89% in real sessions, which we'll get into below.

One honesty note up front. We have not run these benchmarks in a controlled environment ourselves. The numbers in this article come directly from the rtk README, the project's own analytics output, and public reports from developers running it in production for weeks. Treat them as documented behavior, not stopwatch results.

This guide walks through the install, the Claude Code hook setup, three real workflows where the savings show up most, and a side-by-side comparison with the other contender in this category, context-mode.

The token-bloat problem, restated

Most Claude Code users discover the cost problem late. The /cost command is technically available, but you have to remember to run it, and Anthropic's own dashboards lag the actual session by enough that the damage is usually done before you notice. Jenny Ouyang's piece is one of several recent post-mortems — KDnuggets ran a practical guide in March that opens with the same observation: "Opus costs 5x more than Sonnet per token," and most of that spend is going to context, not generation.

The Anthropic team has acknowledged this directly. Their Claude Cookbook on context engineering frames the discipline as managing three streams that accumulate during long-horizon agent work: tool results, the model's own reasoning, and user messages. The middle one — model reasoning — you can't easily compress without losing capability. The first one — tool results — you absolutely can.

That's the whole thesis behind rtk. Tool output is, in their phrasing, "re-fetchable." If Claude needs the data again, it can run the command again. Storing 2,000 tokens of git status output in the conversation history forever is pure waste.

Shidhin on X: Your Claude Code sessions are wasting 89% of tokens. It's not your prompts — it's raw terminal output getting dumped into context.

View original post on X →

What rtk actually does

The architecture is small enough to describe in two sentences. rtk is a CLI proxy: you call rtk git status instead of git status, and it runs the underlying command, parses the output, applies a domain-aware filter, and returns the compressed result. The Claude Code integration installs a PreToolUse hook that automatically rewrites Bash commands so Claude doesn't even know the rewrite happened — it just gets cleaner output.

The filtering rules are command-specific. For git status, rtk strips the verbose Git suggestion text ("(use 'git restore --staged'..."), groups files by status, and compacts the section headers. For cargo test, it removes the per-test progress lines and keeps only the summary plus failure messages. For find, it returns a token-optimized tree rather than a flat list of paths. The README documents over 100 supported commands across file operations, Git, GitHub CLI, test runners, build tools, package managers, AWS CLI, Docker, and kubectl — overhead is measured in single-digit milliseconds.

The numbers from the project's own benchmark, taken from a 30-minute Claude session on a medium TypeScript/Rust project:

Command	Raw tokens	rtk tokens	Reduction
`git status`	~3,000	~600	80%
`cargo test`	~25,000	~2,500	90%
`ls`/`tree`	~2,000	~400	80%
Total session	~118,000	~23,900	80%

That's the headline. Whether you actually see numbers like this depends entirely on your workflow — more on that in the three scenarios below.

rtk-ai/rtk on GitHub showing 43.1k stars and the Rust Token Killer description

View on GitHub →

Install and Claude Code setup

The install is genuinely the entire setup. From the rtk README:

# macOS / Homebrew (recommended)
brew install rtk

# Linux / macOS one-liner
curl -fsSL https://raw.githubusercontent.com/rtk-ai/rtk/refs/heads/master/install.sh | sh

# From source
cargo install --git https://github.com/rtk-ai/rtk

Verify the binary is on your path:

rtk --version
# rtk 0.x.x

Now wire it into Claude Code:

rtk init -g

The -g flag installs a global PreToolUse hook into your Claude Code settings. Open ~/.claude/settings.json after running it and you'll see a new hooks entry that rewrites any bash tool invocation through rtk. From Claude's perspective, nothing has changed — it still calls git status. From your wallet's perspective, the conversation now stores 600 tokens instead of 3,000 every time it does.

If you also use Gemini CLI, Cursor, Codex, Cline, OpenCode, or Kilo Code, rtk has an --<agent> flag for each. The same Rust binary services all of them; the integration layer is just a hook configuration.

Workflow 1: Planning sessions

The first place you'll feel rtk is when Claude Code is in planning mode and exploring an unfamiliar repo. A typical opening flurry looks something like this:

Bash: git log --oneline -20
Bash: git status
Bash: ls -R src/
Bash: find . -name "*.test.ts" -not -path "*/node_modules/*"
Bash: cat package.json

In raw form, this is the worst kind of context spend. ls -R src/ on a real codebase produces hundreds of paths, most of which are irrelevant. find against a Node project, even with the obligatory node_modules exclusion, can return 200+ test files. None of it is generation-worthy detail — it's all just reconnaissance that should be summarized.

This is exactly the workflow rtk was designed for. Based on the project's documented per-command savings, a planning sequence like the one above would compress as follows:

Step	Raw	With rtk	Savings
`git log --oneline -20`	~600 tokens	~150 tokens	75%
`git status`	~2,000–3,000 tokens	~300–600 tokens	80%
`ls -R src/`	~3,000–8,000 tokens	~600–1,500 tokens	80%
`find ... *.test.ts`	~1,500 tokens	~400 tokens	73%

This is what AshJo described in their Medium walkthrough: the smaller the raw command, the smaller the relative savings (their git log --oneline example only shaved 2.9%), but in cumulative session terms, the planning phase is where this compounds. Esteban Estrada's write-up at codestz.dev reports a 70% overall Claude Code token reduction, mostly attributable to recon-heavy early-session activity.

Workflow 2: Large refactors

The second workflow where rtk dominates is anything involving a test runner. If you've ever asked Claude Code to "fix the failing test" on a 400-test Jest suite, you've watched it accept a 20,000-token tool result, then ask for the same thing with --verbose, then ask again after a code change. Multiply by the iteration count and you're easily in six-figure-token territory for a single bug fix.

# What you actually want Claude to see
> 3 tests failed in src/auth/session.test.ts
>   - "should expire tokens after TTL" (line 47)
>   - "should refuse expired refresh" (line 89)
>   - "should rotate refresh on use" (line 112)

# What it actually gets without rtk
[400 lines of "✓ test passed" entries, color codes,
 progress bars, timing summaries, and Jest banner]

rtk filters this aggressively. The README's cargo test example — 25,000 → 2,500 tokens — is the canonical case, and equivalent rules apply to pytest, jest, vitest, go test, and eslint. The conversation history ends up with the failure summary and the failing test names, which is approximately what you needed in the first place.

This is the workflow where independent users have reported the most dramatic numbers. The creator's own usage stats, posted on the Show HN thread, show 7,061 commands run over 15 days saving 24.6M tokens, an 83.7% average reduction. FlorianBruniaux replied with very similar numbers — 83.6% over 7,081 commands. LivioGama posted a one-week sample: 79.3%. None of these are stopwatch comparisons against a control session; they are the rtk-reported "what we filtered" totals. But the consistency across users is a useful sanity check.

A separate developer thread on Kilo Code's discussion board — titled "I saved 10M tokens (89%) on my Claude Code sessions with a CLI proxy" — reports a typical 30-minute session dropping from 150,000 tokens to ~45,000 with rtk in front of the agent.

Kilo Code discussion thread: I saved 10M tokens (89%) on my Claude Code sessions with a CLI proxy

View community discussion →

Hacker News Show HN: RTK – Wrap your CLI commands, save 60-90% of tokens in AI coding agents

View on Hacker News →

Jason Zhou on X recommending RTK as the best open-source tool for reducing Claude Code tokens up to 60%

View original post on X →

Workflow 3: Codebase search

The third high-leverage scenario is anything where Claude has to search the codebase. grep -r, find, and rg against a non-trivial repo produce token storms — and worse, most of the matches are noise. rtk applies relevance heuristics: it groups matches by file, drops binary-looking lines, and truncates oversized matches into context-aware excerpts.

The same pattern shows up with directory listings. On a TypeScript monorepo, rtk ls returns a tree that respects .gitignore and collapses noisy directories like node_modules, dist, and .next into single summary lines. From the README's example metrics, that's an 80% reduction on commands like ls/tree.

Because Claude Code likes to verify before committing — re-running git diff between edits is its most common nervous tic — these reductions stack across a session.

Measuring with `rtk gain`

rtk ships its own analytics command, which is useful because guessing about token spend is what got most of us into this mess in the first place. After a few sessions:

rtk gain
# Total commands: 412
# Tokens saved: 1.84M (81.2%)
# Estimated USD saved: $5.52 (Sonnet input pricing)

rtk gain --graph
# 30-day ASCII chart of tokens-saved-per-day

rtk gain --daily
# Per-day breakdown

rtk gain --all --format json
# Machine-readable export for further analysis

The "estimated USD saved" line uses Sonnet 4.6 input pricing as the default reference. For Opus-heavy workloads, the realized savings are roughly 5x larger per token. None of this is independently audited — rtk gain is reporting what it filtered, not what your actual Claude bill came down to. Cross-check it against /cost in Claude Code or your Anthropic console to validate.

context-mode: the sandbox-based alternative

context-mode, at 13.6k stars, takes a different approach to the same problem. Instead of filtering output, it sandboxes it. Each ctx_execute call spawns an isolated subprocess, runs the command there, captures the raw output to disk, and returns only a summary to the conversation. The raw data — log files, API responses, snapshots — never leaves the sandbox, but stays addressable for follow-up queries via ctx_search (FTS5 + BM25).

The headline benchmark from the project's BENCHMARK.md: a session with 315 KB of raw output compressed to 5.4 KB visible context — a 98% reduction. Specific cases include:

Playwright snapshots: 99% reduction
20 GitHub issues: 98% savings
500-line access logs: 100% reduction (sandboxed entirely, queryable on demand)

The install is a Claude Code plugin marketplace install:

/plugin marketplace add mksglu/context-mode
/plugin install context-mode@context-mode

Then /reload-plugins. context-mode registers 11 MCP tools — six sandbox primitives (ctx_execute, ctx_batch_execute, ctx_execute_file, ctx_index, ctx_search, ctx_fetch_and_index) and five meta-tools (ctx_stats, ctx_doctor, ctx_upgrade, ctx_purge, ctx_insight). It supports 14 platforms including Claude Code, Gemini CLI, VS Code Copilot, JetBrains, Cursor, OpenCode, and Codex CLI.

mksglu/context-mode on GitHub — 13.6k stars, 98% context reduction, 14 platforms

View on GitHub →

When to use which

The two tools are not actually competitors so much as different points on a tradeoff curve.

Dimension	rtk	context-mode
Strategy	Filter at command boundary	Sandbox + index, query on demand
Setup	Single Rust binary, hook install	Claude Code plugin, MCP tools
Granularity	Per-command rules (100+ commands)	Per-execution sandbox + FTS5 search
Best at	Recon, tests, git, package managers	Web fetches, large logs, multi-step research
Reduction	60–90% (typical 80%)	Up to 98% on log-heavy workloads
Trade-off	Loses output Claude might want to see	Adds an indirection layer Claude has to learn
Honest weak spot	Some commands need verbose output for debugging — rtk's filter rules have to be turned off case-by-case	Subprocess sandboxing changes how some interactive tools (TUIs, prompts) behave

If your day is mostly Git, tests, and package operations, install rtk first. The hook-based integration means zero behavioral change from Claude's perspective — it's the lowest-friction win in the category. If your day involves a lot of large web fetches, log analysis, or research-style workflows where the same data gets queried multiple times, context-mode's index-and-retrieve model gets you closer to the 98% number.

You can run both. Our internal convergence tracking on May 4 noted that "rtk-ai/rtk and mksglu/context-mode both pitch 60–98% context reduction via tool-output sandboxing… expect this category to consolidate into one or two winners by Q3."

Honest limitations

Three things to know before you install:

1. The compression is heuristic, not lossless. rtk's filter rules drop output that is usually irrelevant. Sometimes Claude actually wants the full git log to understand a regression, or the full npm install warnings to debug a build. When that happens, you can bypass rtk for a single command by calling the underlying tool directly — but Claude will only know to do this if you tell it. The HN comment from a developer who'd been using rtk for "a few weeks" flagged exactly this friction.

2. The reported savings are rtk's own accounting. rtk gain shows what got filtered, not what your bill came down to. The relationship is correlated but not 1:1 — caching, compaction, and sub-agent isolation also affect your actual spend. Treat the percentage as an upper bound on the savings, and verify with /cost.

3. The tool surface is still moving. rtk is on a fast release cadence and new commands get added every week. If you rely on a niche test runner or build tool that isn't yet supported, you'll get pass-through behavior (rtk runs the command but doesn't filter), which is harmless but doesn't save you anything until rules are written.

The bigger picture

Token-economy tooling is the first deployable infra layer to emerge from what the industry has started calling the "context engineering" era. A year ago, every model release was about pushing context windows higher. Now the marginal win is cutting them — both because cost has caught up to capability, and because raw context size is no longer a strict capability multiplier. Anthropic's own context engineering cookbook is explicit: "If context bloat is mostly re-fetchable tool output, clearing is cheaper and lossless."

rtk operationalizes that observation at the shell. context-mode operationalizes it at the agent layer. The two together approximate what a future generation of agent harnesses will probably do natively — Claude Code, Cursor, and Codex are all clearly heading toward output-budgeting being a first-class part of the runtime rather than a third-party install. Until then, a 4 MB Rust binary with a hook is the cheapest 60–90% you can buy. If you're spending real money on Claude Code today, set up rtk before you do anything else this week.

For the cost-floor approach — replacing the model entirely with cheaper alternatives — see our earlier guide on running Claude Code with Ollama and OpenRouter. For the security implications of layering proxies and plugins into your agent stack, our piece on the LiteLLM supply-chain attack is the relevant read. The unifying theme: Claude Code is a harness, and almost every component in that harness is now optimizable independently.

The cheapest token is the one you never had to send.

Cut Claude Code Token Costs 60-90% With rtk: Hands-On Guide

The token-bloat problem, restated

What rtk actually does

Install and Claude Code setup

Workflow 1: Planning sessions

Workflow 2: Large refactors

Workflow 3: Codebase search

Measuring with `rtk gain`

context-mode: the sandbox-based alternative

When to use which

Honest limitations

The bigger picture

About ComputeLeap Team

💬 Join the Discussion

Related Articles

Chrome's Gemini Nano Prompt API: A Step-by-Step Guide

Claude Code Agentic Stack: cc-switch & claude-context MCP

openai-agents-python: Build Multi-Agent AI Workflows (2026)

The ComputeLeap Weekly

The token-bloat problem, restated

What rtk actually does

Install and Claude Code setup

Workflow 1: Planning sessions

Workflow 2: Large refactors

Workflow 3: Codebase search

Measuring with rtk gain

context-mode: the sandbox-based alternative

When to use which

Honest limitations

The bigger picture

About ComputeLeap Team

💬 Join the Discussion

Related Articles

Chrome's Gemini Nano Prompt API: A Step-by-Step Guide

Claude Code Agentic Stack: cc-switch & claude-context MCP

openai-agents-python: Build Multi-Agent AI Workflows (2026)

The ComputeLeap Weekly

Measuring with `rtk gain`