FreeLaunched

CtxSift

Name: CtxSift
Rating: 1 (1 reviews)
Author: TabMate

Save tokens and extend your coding sessions

Developer Tools Open Source Hardware Free

TBuilt by TabMate

ctxsift.dev

Open CtxSift on the web — ctxsift.dev

ctxsift.dev

Visit website

Command outputs and state recollection are the biggest source of token overuse. 

Agents consume raw command outputs for most tasks. But often, LLMs don't need entire outputs to be able to 
answer something or figure out a situation. This not only increases token usage but also affects speed of responses. 
It compounds in complex, multistep tasks where rounds of context compaction cause the agent to re-read and re-run
commands to get back to speed with latest code state. Just compressing command outputs is not enough - it shifts the
token tax to these recollection moments. 

**CtxSift** (pronounced Context Sift) is a skill that helps agents sift through the repeated noise and find the real signals needed for every task.
It compresses tool outputs and caches them so that agents can do a look-up when needing context or recollecting.

How it works

With CtxSift, your agents use two steps to keep minimal token footprint:

1. Extract and cache only what they need from raw outputs

2. Look up context later instead of repeatedly re-running commands or dragging raw terminal output back into the session.

That's it. Unlike other token savers, which can get heavy can confuse the agent with multiple tools, CtxSift keeps it simple and light. No multiple tools, MCP servers or sandbox spin-up dependencies.

Use local models on CPU/GPU or remotely hosted LLMs for compression.

By default, CtxSift starts with a small GGUF model on local CPU. If you have CUDA available, local compression can use normal Hugging Face text-generation models instead. If you prefer hosted inference, remote compression works through LiteLLM-compatible endpoints.

Recall embeddings stay local and separate from compression, so the retrieval path remains the same whether compression is local or remote.

Start with the runtime path that matches your machine and workflow.

Use local CPU for the simplest default path, local GPU when you want faster local inference, and remote provider mode when you want hosted models through a LiteLLM-compatible endpoint.

Benchmarked model comparisons live on their own page. Use the benchmark guide when you want tested CPU and GPU recommendations rather than setup instructions.

## Getting Started

### Prerequisites

- **Python ≥ 3.12** — [python.org/downloads](https://www.python.org/downloads/)
- **uv** — a fast Python package manager
  - Install: [docs.astral.sh/uv/getting-started/installation](https://docs.astral.sh/uv/getting-started/installation/)
- C compiler
  - Linux: [gcc](https://gcc.gnu.org/install/) or [clang](https://clang.llvm.org/get_started.html)
  - Windows: [Visual Studio](https://visualstudio.microsoft.com/downloads/) or [MinGW-w64](https://www.mingw-w64.org/downloads/)
  - MacOS: [Xcode](https://developer.apple.com/xcode/)


### Install

CtxSift uses a language model to compress tool outputs. This can be a model running locally or hosted remotely.
You can choose the installation path best suited to your environment. When using local models, you can override the default
model - see the [supported models](#local-model-support) section for further details. 


> ❗ For the best experience, please see the minimum hardware requirements below.
> 
> <details>
> <summary>Requirements matrix</summary>
>
> | Compression Mode | Minimum RAM | Minimum VRAM                                       | Comments                                                   |
> |---|---|----------------------------------------------------|------------------------------------------------------------|
> | Local, no GPU | 8 GB | N/A                                                | Both embedding and compression models are loaded into RAM  |
> | Local, with GPU | 2 GB | 8 GB                                               | Both embedding and compression models get loaded into VRAM |
> | Remote, no GPU | 4 GB | N/A | Only the embedding model gets loaded into RAM              |
> | Remote, with GPU | 2 GB | 4 GB                                               | Only the embedding model gets loaded into VRAM             |
> 
> </details>


```bash 
# Install the base package - inference runs on CPU
uv tool install ctxsift

# Install with GPU add-ons - inference with GPU acceleration
uv tool install "ctxsift[gpu]"

# Enable quantization support on GPU
uv tool install "ctxsift[gpu,quant]"

# Install with LiteLLM included - use remotely hosted models for inference
uv tool install "ctxsift[remote]"

# Install the full package
uv tool install "ctxsift[all]"
```

If `ctxsift` is not found after installation, run:

```bash frame="none"
uv tool update-shell
```

Then restart your shell and try `ctxsift` again.

### First-time setup

Run a guided setup to configure your model provider, workspace settings and **install the skill** for your favorite agent harness.

```bash frame="none"
ctxsift configure
```

### Verify and test your setup

```bash frame="none"
# Verify
ctxsift doctor

# Test compression
echo "alpha\nbeta\ngamma" | ctxsift compress --intent exact-lines "Return only the first line, no explanations."
```

Get a quick summary or comparison from ChatGPT, Claude, Gemini, Perplexity, or Mistral using this project's public listing.