Route by tier, not by guessing.
Pick a tier — cheap, balanced, or premium. InferBridge handles the rest: which provider, which model, what to do when one fails. Override per-request when you need surgical control.
Built in public · v0.2.0
Drop-in OpenAI-compatible gateway with routing, caching, and observability across OpenAI, Anthropic, Together, Sarvam, and your self-hosted models. BYOK. Zero markup.
No new SDK. No refactoring. Change your base URL and key — your prompts, streaming, parameters, and parsing stay exactly the same.
# Before
from openai import OpenAI
client = OpenAI(api_key="sk-...")
# After — change these two lines
from openai import OpenAI
client = OpenAI(
api_key="ib_your_key_here",
base_url="https://inferbridge.dev/v1",
) // Before
import OpenAI from "openai";
const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
// After — change these two lines
import OpenAI from "openai";
const client = new OpenAI({
apiKey: process.env.INFERBRIDGE_API_KEY,
baseURL: "https://inferbridge.dev/v1",
}); # Before
curl https://api.openai.com/v1/chat/completions \
-H "Authorization: Bearer sk-..." \
-d '{"model":"gpt-4o-mini","messages":[{"role":"user","content":"Hi"}]}'
# After — change these two lines
curl https://inferbridge.dev/v1/chat/completions \
-H "Authorization: Bearer ib_your_key_here" \
-d '{"model":"ib/balanced","messages":[{"role":"user","content":"Hi"}]}' That’s it. No code changes. No lock-in. Revert the two lines anytime.
Pick a tier — cheap, balanced, or premium. InferBridge handles the rest: which provider, which model, what to do when one fails. Override per-request when you need surgical control.
Per-request logs with tokens, latency, cost, and provider. One endpoint answers “where is my money going?” for the first time. Aggregate it by mode, provider, or time range.
Sarvam and self-hosted endpoints ship on day one. Add
X-InferBridge-Residency: india to route only to
India-hosted infrastructure. No other gateway does this.
# Your app
client.chat.completions.create(model="ib/balanced", ...)
↓ InferBridge evaluates:
model: "ib/cheap" → Together (Llama 3.3 70B)
model: "ib/balanced" → OpenAI (gpt-4o-mini)
model: "ib/premium" → Anthropic (claude-opus-4-7)
residency: "india" → Sarvam (sarvam-m)
fallback: on 5xx/429 → next candidate in tier
↓ Your app receives: standard OpenAI response Real routing logic, not marketing. Override any decision per-request.
POST to /v1/users with your email. You get an API
key starting with ib_. Shown once, stored as a
SHA-256 hash.
BYOK means you bring your existing OpenAI, Anthropic, Together, or Sarvam keys. We Fernet-encrypt them at rest and never touch your billing.
Two lines in your existing code. Any language, any framework. If it talks to OpenAI, it talks to InferBridge.
Monitor cost and latency at /v1/stats. Fallback
and caching work automatically. Revert the two lines if
anything breaks.
| Provider | Residency | Example models |
|---|---|---|
| OpenAI | Global | gpt-4o-mini, gpt-4o |
| Anthropic | Global | claude-haiku-4-5, claude-opus-4-7 |
| Together AI | Global | Llama-3.3-70B-Instruct-Turbo |
| Sarvam | India | sarvam-m |
| Self-hosted | User-declared | Any OpenAI-compatible endpoint |
Register any combination. Route across all of them.
You pay your providers directly through your own BYOK keys. InferBridge never marks up tokens.
Available now
₹0 / month
Unlimited BYOK usage. 30-day log retention. All providers, all routing modes, all observability.
Get your API keyComing soon
₹1,499 / month
Unlimited log retention. Priority support. Custom routing rules. Team seats.
Not available yetComing soon
₹9,999 / month
SLA. Dedicated support. Custom residency policies. GST-compliant invoices.
Not available yetNo. We log metadata only — tokens, latency, cost, provider, request ID. Prompt and completion content are never written to logs. There’s a dedicated test in the codebase that fails if anyone tries to change this.
Revert the two lines. Your SDK talks directly to OpenAI again in seconds. No data migration. No lock-in.
BYOK by default — you pay providers directly, we take no cut. Indian providers (Sarvam, self-hosted) are first-class, not an afterthought. And we’re built for developers who want simple, debuggable routing, not an enterprise policy engine.
Streaming tool use and vision aren’t hardened yet — use the provider SDK directly for those. Embeddings aren’t exposed through InferBridge at all in v1. Text chat completions, streaming, caching, and fallback are production-ready.
Provider API keys are Fernet-encrypted at rest, decrypted only
in-memory at request time. We never log content. You can delete
any registered key from /v1/keys at any time.
Not publicly yet. Open-source release is planned for month 2. Email yogesh@inferbridge.dev if you want early access or need an on-prem deployment.
Get the public launch announcement, early-access pricing, and the v0.2.0 migration guide.
Loops embed drops in here — until then, email works too.