Skip to content

Minima

A recommendation engine, not a router. Route every LLM call to the cheapest model that will actually succeed — then watch the picks get sharper.

Recommend

POST /v1/recommend — Minima recalls similar past task → model → outcome records, aggregates empirical success rates, and returns the cheapest model expected to clear your quality bar. Zero added latency to your actual LLM call.

Run it yourself

Minima hands back a recommended_model and a recommendation_id. You run the model in your own stack — Minima never proxies, executes, or rewrites.

Feed back

POST /v1/feedback — report the outcome, quality score, and realized tokens. Minima writes the outcome to memory, reinforces the exact neighbors that drove the pick, and promotes durable lessons.

Improve automatically

Every feedback call makes the next recommendation sharper. The cost basis climbs from flat estimates → observed median → rescaled per-request as history accumulates.

  • Recommend-only. No proxy, no prompt rewriting, no caching. Sits beside your calls, adds zero latency to them.
  • Memory-driven. Backed by Mubit — semantic k-NN recall, per-entry reinforcement, and lesson promotion.
  • Accurate cost ranking. Three cost-basis tiers (rescaled → observed → estimate) correctly rank reasoning-heavy models that look cheap on paper but spend output tokens thinking.
  • Works cold. Prior-only fallback + RouterBench seeding means useful picks from day one.
  • Bring your own memory. Authenticate with your own Mubit API key — Minima reads and writes your task → model → outcome history in your Mubit instance, with namespace / user_id sub-scoping inside it.
POST /v1/recommend ──▶ you run the model ──▶ POST /v1/feedback
(recall + rank) (your stack) (write outcome, reinforce)
▲ │
└────────────── picks get sharper ──────────────────┘