What is Minima

A recommendation engine, not a router. Route every LLM call to the cheapest model that will actually succeed — then watch the picks get sharper as you report outcomes.

Minima only recommends. It does not proxy your call, execute a model, rewrite prompts, cache, or compress. You ask "which model should run this?", it answers, and you run the model yourself in your own stack — so it adds zero latency to the actual LLM request.

Two ways to use Minima

Minima SDK & API

A hosted API and a typed Python client. Call recommend, run the model in your own stack, then feedback the outcome. Embed cost-aware routing directly in your app.

Minima CLI

minima — a cost-aware, memory-isolated terminal coding agent that routes every prompt through Minima, runs the chosen model, and closes the feedback loop for you.

How it works

Recommend

POST /v1/recommend — Minima recalls similar past task → model → outcome records, aggregates empirical success rates, and returns the cheapest model expected to clear your quality bar. Zero added latency to your actual LLM call.

Run it yourself

Minima hands back a recommended_model and a recommendation_id. You run the model in your own stack — Minima never proxies, executes, or rewrites.

Feed back

POST /v1/feedback — report the outcome, quality score, and realized tokens. Minima writes the outcome to memory, reinforces the exact neighbors that drove the pick, and promotes durable lessons.

Improve automatically

Every feedback call makes the next recommendation sharper. The cost basis climbs from flat estimates → observed median → rescaled per-request as history accumulates.

  POST /v1/recommend  ──▶  you run the model  ──▶  POST /v1/feedback
   (recall + rank)          (your stack)            (write outcome, reinforce)
        ▲                                                    │
        └────────────── picks get sharper ──────────────────┘

Key properties

Recommend-only. No proxy, no prompt rewriting, no caching. Sits beside your calls, adds zero latency to them.
Memory-driven. Backed by Mubit — semantic k-NN recall, per-entry reinforcement, drift signals, failure lessons, and lesson promotion. See How Minima uses Mubit.
Accurate cost ranking. Three cost-basis tiers (rescaled → observed → estimate) correctly rank reasoning-heavy models that look cheap on paper but spend output tokens thinking.
Works cold. Prior-only fallback + benchmark seeding means useful picks from day one.
Provable. GET /v1/savings reports estimated and realized savings against honest baselines, and GET /v1/calibration shows whether predicted_success matches your real outcomes.
Bring your own memory. Authenticate with your own Mubit API key — Minima reads and writes your task → model → outcome history in your Mubit instance, with namespace / user_id sub-scoping inside it.

💡Tip

New to Minima? Start with the SDK Getting Started for the hosted API, or jump to the Minima CLI Overview if you want the terminal agent.