✦ v1.0  |  MIT License  |  OpenAI-Compatible

Every Free AI Provider.
One Unified API.

LLMux unifies 20+ free-tier providers — text, image, audio, video — behind a single OpenAI-compatible gateway. Automatic fallback, smart routing, multi-key rotation, zero lock-in.

from openai import OpenAI

# Just change base_url — your existing code works
client = OpenAI(
    api_key="any",
    base_url="http://localhost:3000/v1",
)

response = client.chat.completions.create(
    model="llama-3.3-70b",  # or "gemini-flash", "deepseek-r1", ...
    messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)
import OpenAI from "openai";

// Change one line — everything else stays the same
const client = new OpenAI({
  apiKey: "any",
  baseURL: "http://localhost:3000/v1",
});

const { choices } = await client.chat.completions.create({
  model: "auto",  // LLMux auto-picks the best available provider
  messages: [{ role: "user", content: "Write a poem" }],
  stream: true,
});
# Text completion
curl http://localhost:3000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"gemini-flash","messages":[{"role":"user","content":"Hi!"}]}'

# Image generation
curl http://localhost:3000/v1/images/generations \
  -H "Content-Type: application/json" \
  -d '{"prompt":"a robot in a garden","model":"pollinations-flux"}'

# List all models
curl http://localhost:3000/v1/models
20+
Free providers
4
Modalities: text, image, audio, video
5
Routing strategies
$0
Monthly cost on free tiers
1
Line change to migrate

Everything you need.
Nothing you don't.

LLMux handles the hard parts so you can focus on building.

🔌

OpenAI-Compatible API

Drop-in replacement for the OpenAI API. Works with every OpenAI SDK, LangChain, LlamaIndex, and any tool that speaks OpenAI.

🔄

Automatic Fallback

3-level fallback: retry the same provider, try another with the same model, then any compatible provider. Your app never goes down.

🗝️

Multi-Key Rotation

Add multiple API keys per provider. LLMux round-robins and auto-rotates on 429 errors. Triple your effective rate limit instantly.

📊

Rate Limit Tracking

Tracks RPM, RPD, TPM, and TPD per provider. In-memory by default, Redis-backed for multi-instance deployments.

Full Streaming Support

SSE streaming proxied transparently across all text providers. Your UI gets tokens as fast as the provider sends them.

🎯

Smart Routing

5 built-in strategies: priority, round-robin, least-busy, latency-based, random-weighted. Switch in one config line.

🖼️

Multi-Modal

Text, image (FLUX, Seedream, GPT-Image), audio (TTS + STT), and video — all through the same OpenAI-compatible surface.

📡

Live Dashboard

Real-time provider health, rate limit states, and request metrics at /dashboard. Know what's healthy before you need it.

🔐

Gateway Auth

Optional shared-secret auth on /v1/* routes. Deploy securely to your team without exposing underlying provider keys.

Built differently.
Because free tiers deserve better.

Other gateways were designed for paid-API workflows. LLMux was built from day one around the constraints of free-tier providers — rate limits, multi-key pools, and zero-downtime fallback.

Feature ⚡ LLMux OpenRouter LiteLLM Portkey OSS
Per-key rotation on 429 ✅ Built-in
Rate-limit-aware routing ✅ Full ⚠ Partial
Multi-key pool per provider ✅ Unlimited
Self-hosted (no cloud dependency) ❌ Cloud-only ⚠ Limited
OpenAI-compatible API
Free-tier providers (20+) ✅ 20+ ⚠ Some
Automatic 3-level fallback ✅ 3 levels ⚠ Basic
Multiple routing strategies ✅ 5 built-in ⚠ Partial
Zero database required N/A ❌ Needs DB
TypeScript / Node.js N/A ❌ Python
Live operational dashboard ✅ Built-in ✅ Hosted ⚠ Paid tier ⚠ Basic
Text + Image + Audio + Video ✅ All 4
Cost to self-host on free tiers $0 $0 + markup $0 $0
🔄

vs OpenRouter

OpenRouter is a hosted cloud proxy — your traffic goes through their servers and you pay a per-token markup. LLMux runs on your hardware, rotates your own API keys, and costs zero per token on free tiers.

🐍

vs LiteLLM

LiteLLM is excellent but Python-only, requires PostgreSQL + Redis for full features, and doesn't do per-key multi-key rotation on 429s. LLMux is TypeScript, runs with zero database, and key rotation is built into every provider.

☁️

vs Portkey OSS

Portkey's core observability and load balancing features require their hosted cloud service. The OSS version has limited support for multi-key rotation and budget-aware routing. LLMux is 100% self-contained by design.

Full operational visibility.
Zero extra tools.

The built-in dashboard at /dashboard gives you real-time insight into every request, every provider, and every rate limit — no setup required.

localhost:3000/dashboard
LIVE uptime: 4h 22m
Total Requests
2.4k
Succeeded
2.3k
96.1% rate
Failed
93
42 rate-limited
Tokens Used
1.8M
Providers
14
11 healthy
Fallbacks
37
Provider Health Matrix
Groq Llamatext180ms⭐ T1
Cerebrastext240ms⭐ T1
Gemini Flashtext520ms⭐ T1
Pollinationsimage1.2s✦ T2
Cloudflare AIimage890ms⭐ T1
Live Request Log
14:23:01groqllama-3.3-70bsuccess182ms
14:22:59geminigemini-flashsuccess541ms
14:22:58openrouterdeepseek-r1fallback1.1s
14:22:55pollinationsfluxsuccess2.4s
14:22:51groqwhisper-v3rl
14:22:49cerebrasllama-3.3-70bsuccess231ms
🩺

Provider Health Matrix

Live status dot, RPM usage bar, average latency, and tier — for every configured provider. Spot degraded providers before they affect users.

📋

Live Request Log

Last 100 requests with provider, model, status, latency, and token count. Fallback chains visible inline. Updates every 3 seconds.

🌡️

Rate Limit Gauges

RPM and RPD usage bars for each provider. Color shifts yellow → red as limits approach. Rotation events logged in real time.

Full control.
No config files needed.

The settings panel at /settings lets you manage providers, routing, and API keys — all from a browser.

🔌

Providers Tab

See all providers at a glance. Enable/disable, edit API keys and rate limits, change routing tier, add new dynamic providers, and test connectivity — without touching YAML.

Status shows No API Key for providers that are enabled but missing credentials.

🎯

Router Tab

Switch the active routing strategy (priority, round-robin, least-busy, latency-based, random-weighted) with a single click. Changes take effect immediately — no restart needed.

🔑

Gateway Keys Tab

Create and revoke named API keys for clients accessing /v1/*. Each key has a label, creation date, and can be revoked independently. Keys are shown once on creation.

Named keys for every integration.

Issue a separate gateway key for each app, team, or integration. Revoke one without affecting the rest. The GATEWAY_API_KEY env var remains active alongside UI-managed keys.

1

Open Gateway tab in /settings

Navigate to /settings and click Gateway in the left sidebar.

2

Click + New Key

Enter a label (e.g. production-app, vscode-ext) and click Generate.

3

Copy the key — shown once only

The full key is displayed immediately after creation. Copy it now — it cannot be retrieved later.

4

Use as a Bearer token

Pass the key to any OpenAI SDK or HTTP client via Authorization: Bearer llmux-...

REST API — manage keys
# List all keys
curl http://localhost:3000/api/admin/gateway-keys \
  -H "X-Admin-Token: $ADMIN_TOKEN"

# Create a named key
curl -X POST http://localhost:3000/api/admin/gateway-keys \
  -H "X-Admin-Token: $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"name": "my-app"}'

# Revoke a key
curl -X DELETE http://localhost:3000/api/admin/gateway-keys/<id> \
  -H "X-Admin-Token: $ADMIN_TOKEN"
python — use gateway key
from openai import OpenAI

client = OpenAI(
    api_key="llmux-xxxxxxxxxxxxxxxxxxxx",
    base_url="http://localhost:3000/v1",
)

# All /v1/* requests now require the gateway key
response = client.chat.completions.create(
    model="auto",
    messages=[{"role": "user", "content": "Hello!"}],
)

Test every provider,
right in your browser.

The built-in chat playground at /playground lets you interactively test providers and models without writing any code.

🎯

Pin Any Provider

Select a specific provider from the dropdown to route directly to it — bypassing the normal routing strategy. Great for comparing providers side-by-side.

🧠

Model Selection

All models for the selected provider are listed in a second dropdown. The actual API model ID is sent to the provider; the friendly alias is shown in the UI. Status bar shows the provider name and model that served the last response.

Real-Time Streaming

Toggle streaming on or off. Streaming responses show a blinking cursor while tokens arrive, then render the complete output with a smooth fade-in.

📝

Markdown Rendering

Assistant responses are fully rendered as markdown — headings, bullet lists, numbered lists, fenced code blocks with syntax highlighting, tables, and blockquotes. Code blocks get a one-click copy button on hover.

📊

Stats & Tokens

Every playground conversation is counted in the dashboard stats — Total Requests, Tokens Used, and per-provider breakdowns. Playground requests are first-class citizens.

🌙

Dark & Light Themes

Full dark/light mode toggle. Preferences are persisted in localStorage and respected across dashboard, settings, and playground pages.

localhost:3000/playground
Provider: Groq
Model: llama-3.3-70b
Stream
Explain the transformer architecture in simple terms.
The Transformer is a neural network architecture based on the concept of attention...

attention(Q, K, V) = softmax(QKᵀ/√d) · V
Ask anything...
✦ Groq · llama-3.3-70b 382ms · 847 tokens

20+ Free Providers.
All Modalities.

Every provider is configurable in a single YAML file. No code changes needed to add or swap providers.

ProviderTierTop ModelsFree Limits
Groq⭐ 1GPT-OSS 120B, GPT-OSS 20B, Llama 3.3 70B, Llama 4 Scout30 RPM / 14.4K RPD
Cerebras⭐ 1Llama 3.3 70B, Qwen3-32B30 RPM / 1M TPD
Google Gemini⭐ 1Gemini 3.1 Pro Preview, Gemini 3 Flash, 2.5 Pro (1M–2M ctx)15 RPM / 1K RPD
Mistral AI⭐ 1Mistral Large, Small 4 (256K), Magistral Medium, Codestral~2 RPM / 500K TPM
OpenRouter✦ 2DeepSeek R1, DeepSeek V3, Qwen3 Coder 480B, Gemini 2.5 Flash20 RPM / 50 RPD
Cloudflare Workers AI✦ 2Llama 3.3 70B, Llama 4 Scout, DeepSeek R110K neurons/day
Hugging Face✦ 2Llama 4 Scout, Llama 3.3 70B, Qwen3-235BDaily limit
SambaNova✦ 2Llama 4 Maverick, Llama 4 Scout, DeepSeek R1/V320 RPM / 200K TPD
Cohere✦ 2Command A (256K), Command A Reasoning, Command A Vision1K calls/month
DeepSeek✦ 2DeepSeek V3.2, R1 (131K ctx)5M free tokens
NVIDIA NIM✦ 2Nemotron 3 Super 120B (1M ctx), Qwen3.5 397B, Mistral Small 4Free endpoints
GitHub Models✦ 2GPT-5.4, Claude Sonnet 4.6, Llama 4 Scout, DeepSeek R115 RPM / 150 RPD
Pollinations AI✦ 2openai (GPT-5 Mini), claude-fast, deepseek, mistral, kimi, glm, minimax, perplexity + 30 moreFree tier + API key
xAI Grok✦ 2Grok-3, Grok-3-Mini, Grok-3-Fast, Grok-3-Mini-Fast (131K ctx)Paid — credits
Moonshot AI (Kimi)✦ 2Kimi K2.5 (1T MoE, vision), Kimi K2 ThinkingFree credits on signup
Zhipu AI (GLM)✦ 2GLM-5 (744B MoE), GLM-4-FlashFree tier available
ProviderTierModelsNotes
Pollinations AI⭐ 1flux, zimage, klein, gptimage, wan-image, qwen-image, seedream5, gptimage-large, kontextflux/zimage/klein/wan-image free
Cloudflare Workers AI⭐ 1FLUX.2 Klein, Flux Schnell, DreamShaper-810K neurons/day
Together AI✦ 2FLUX.1 Schnell Free, FLUX.1 DevFree endpoint
Hugging Face✦ 2FLUX.1 Schnell, SDXLRate-limited
fal.ai✦ 2FLUX Pro 1.1, Schnell~100 free credits
ProviderModeModels / VoicesFree Limits
Groq WhisperSTTwhisper-large-v3-turbo, distil-whisper-en20 RPM / 2K RPD
Groq PlayAITTSPlayAI Dialog, ArabicRate-limited
ElevenLabsTTSFlash v2.5, Multilingual v210K chars/month
DeepgramTTS + STTAura-2, Nova-3$200 free credit
Fish AudioTTSSpeech 1.6, 1.5Daily limit
Pollinations AITTS + MusicElevenLabs v3 (30+ voices), ACE-Step music genelevenlabs TTS free
Pollinations AISTTWhisper Large V3, Scribe v2 (90+ langs, diarization)Free + API key
ProviderModelsNotes
Pollinations AIltx-2 (LTX-2.3 free), veo (Google Veo 3.1 Fast), wan (Wan 2.6), wan-fast (Wan 2.2), seedance, seedance-pro, nova-reelltx-2 free; others paid pollen credits
Replicateminimax/video-01, tencent/hunyuan-videoPaid credits (free trial available)
Hugging FaceVideo diffusion modelsRate-limited free tier

See it in action.

Copy-paste examples for every modality, in Python, TypeScript, and bash.

python — streaming
from openai import OpenAI

client = OpenAI(api_key="any", base_url="http://localhost:3000/v1")

# Streaming response
stream = client.chat.completions.create(
    model="gemini-flash",
    messages=[{"role": "user", "content": "Write a haiku about free APIs"}],
    stream=True,
)
for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="", flush=True)
typescript — auto-routing
import OpenAI from "openai";

const client = new OpenAI({ apiKey: "any", baseURL: "http://localhost:3000/v1" });

// "auto" lets LLMux pick the best available provider
const res = await client.chat.completions.create({
  model: "auto",
  messages: [{ role: "user", content: "Explain quantum entanglement simply" }],
});
console.log(res.choices[0].message.content);
bash — reasoning model
curl http://localhost:3000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-r1",
    "messages": [{"role":"user","content":"Prove sqrt(2) is irrational"}]
  }'
python — openai sdk
from openai import OpenAI

client = OpenAI(api_key="any", base_url="http://localhost:3000/v1")

img = client.images.generate(
    prompt="a robot painting a masterpiece, oil on canvas, dramatic lighting",
    model="pollinations-zimage",  # or "cf-flux-schnell", "together-flux-schnell"
    size="1024x1024",
)
print(img.data[0].url)
bash — curl
curl http://localhost:3000/v1/images/generations \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "a cyberpunk city at sunset with neon lights",
    "model": "pollinations-flux",
    "size": "1024x1024",
    "response_format": "url"
  }'
python — text to speech
client = OpenAI(api_key="any", base_url="http://localhost:3000/v1")

# Standard TTS (Pollinations, ElevenLabs, Deepgram, Groq PlayAI)
audio = client.audio.speech.create(
    model="tts-1",           # routes to best available TTS provider
    input="Welcome to LLMux, your unified AI gateway.",
    voice="nova",            # LLMux, echo, fable, onyx, nova, shimmer + 25 more
    response_format="mp3",
)
audio.stream_to_file("welcome.mp3")

# Music generation via Pollinations elevenmusic
music = client.audio.speech.create(
    model="pollinations-music",
    input="An upbeat electronic track with synth leads",
    voice="LLMux",
)
music.stream_to_file("track.mp3")
bash — curl
curl http://localhost:3000/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{"model":"tts-1","input":"Hello from LLMux!","voice":"nova"}' \
  --output speech.mp3
python — speech to text
client = OpenAI(api_key="any", base_url="http://localhost:3000/v1")

with open("meeting.mp3", "rb") as f:
    transcript = client.audio.transcriptions.create(
        model="whisper-large-v3-turbo",  # Groq — fastest free STT
        file=f,
        response_format="json",
    )
print(transcript.text)

# 90+ language support via Pollinations Scribe
transcript = client.audio.transcriptions.create(
    model="pollinations-scribe",
    file=f,
    language="fr",  # ISO-639-1 language code
)
bash — curl
curl http://localhost:3000/v1/audio/transcriptions \
  -F "file=@audio.mp3" \
  -F "model=whisper-large-v3-turbo"
bash — list all models
# List all available models with aliases
curl http://localhost:3000/v1/models | jq '.data[].id'

# Per-provider health + rate-limit state
curl http://localhost:3000/health/providers | jq '.'

# Gateway health
curl http://localhost:3000/health

Multiply your rate limits.
Zero code changes.

Add multiple API keys per provider. LLMux round-robins normally and auto-rotates on 429 errors — fully transparent to your client.

1

Create multiple free accounts

Sign up for 2-3 accounts at Groq, Gemini, or any provider. Each gets the full free quota.

2

Add keys to .env

Name them GROQ_API_KEY_1, GROQ_API_KEY_2, etc.

3

Use api_keys in providers.yaml

Switch from api_key to api_keys (plural) with the array of keys.

4

Automatic rotation

LLMux round-robins across keys, and instantly rotates to the next key on any 429 rate-limit error.

providers.yaml
# Before: single key
- id: groq-llama3-70b
  api_key: ${GROQ_API_KEY}

# After: 3 keys = 3× the rate limit
- id: groq-llama3-70b
  api_keys:
    - ${GROQ_API_KEY_1}   # 30 RPM
    - ${GROQ_API_KEY_2}   # 30 RPM
    - ${GROQ_API_KEY_3}   # 30 RPM
  # Result: 90 effective RPM ↑

5 Strategies.
One Config Line.

Change router.default_strategy in providers.yaml to switch strategies instantly.

priority

Priority

Routes to Tier 1 first, then Tier 2, etc. Ensures the highest-priority providers are used first.

Best for: default, quality-first
round-robin

Round-Robin

Cycles through all available providers evenly, distributing load across the entire pool.

Best for: even load distribution
least-busy

Least-Busy

Always routes to the provider with the fewest in-flight requests. Minimizes queue depth.

Best for: high-concurrency workloads
latency-based

Latency-Based

Tracks rolling average latency per provider and routes to the historically fastest one.

Best for: latency-sensitive apps
random-weighted

Random-Weighted

Tier-weighted random selection. Spreads load while still preferring higher-tier providers.

Best for: A/B testing, load spreading

Tiers = routing priority. Not price.

Every provider gets a tier (1–4). LLMux tries Tier 1 providers first, then falls back down. There's no cost or quality implication — you can run completely free providers at Tier 1.

TIER 1 — HIGH PRIORITY
Tried first on every request. Your fastest, most reliable providers (Groq, Gemini, Cerebras).
TIER 2 — STANDARD
Tried after Tier 1 is busy or rate-limited. Good for secondary free providers.
TIER 3 — FALLBACK
Used when higher tiers are all busy or failing. Good for slower, uncapped providers.
TIER 4 — LAST RESORT
Emergency fallback only. For experimental or rarely-available providers.

Automatic 3-Level Fallback

When a provider fails, LLMux silently tries the next option. Your client sees a successful response — nothing else.

Primary Provider
Tier 1, best match
Rate limited?
429 / timeout
Same Model
Alt Provider
Retry max_retries
Any Compatible
Provider
Same modality
✓ Response
Client gets answer

Deploy anywhere.
In minutes.

LLMux is a standard Node.js app. Deploy it anywhere you can run containers or Node.

🐳 Docker

The recommended way. Full control, Redis included via Compose.

docker build -t LLMux-gateway .
docker run -p 3000:3000 \
  --env-file .env \
  -v $(pwd)/config:/app/config \
  LLMux-gateway

🚂 Railway

One-click deploy. Set API keys as env vars, done.

# railway.toml already configured
railway up

# Set env vars in Railway dashboard

▲ Vercel Edge

Global edge deployment. Sub-20ms latency from any region.

# vercel.json already configured
vercel deploy --prod

🖥️ Self-hosted VPS

Full control. Use with PM2 or systemd for production stability.

pnpm build
pm2 start dist/index.js \
  --name LLMux-gateway

Add any provider
in 4 steps.

LLMux's plugin architecture makes adding new providers simple. If it has an API, it can be a provider.

1

Create the provider class

src/providers/text/myprovider.ts
import { BaseProvider } from "../base.js";

export class MyProvider extends BaseProvider {
  async chatCompletion(req, ctx) {
    const model = this.resolveModel(req.model);
    const res = await this.postJSON(
      `${this.config.baseUrl}/chat/completions`,
      { ...req, model }, ctx
    );
    if (req.stream) return this.proxyStream(res, ctx);
    const data = await res.json();
    // recordTokens updates rate-limit tracker AND dashboard stats
    await this.recordTokens(ctx, data.usage?.total_tokens ?? 0);
    return new Response(JSON.stringify(data), {
      headers: { "Content-Type": "application/json" }
    });
  }
}
2

Register in registry.ts

src/providers/registry.ts
import { MyProvider } from "./text/myprovider.js";

const PROVIDER_FACTORIES = {
  // existing providers...
  "my-provider-id": MyProvider,  // must match id in providers.yaml
};
3

Add to providers.yaml

config/providers.yaml
- id: my-provider-id
  name: My Provider
  modality: text
  tier: 2
  enabled: true
  requires_auth: true
  api_keys:                    # multi-key support built-in
    - ${MY_PROVIDER_KEY_1}
    - ${MY_PROVIDER_KEY_2}
  base_url: https://api.myprovider.com/v1
  adapter: openai
  models:
    - id: my-model-large
      alias: my-large
      context_window: 128000
  limits:
    rpm: 60
  concurrency: 5
  timeout: 30000
  max_retries: 2
4

Add API key to .env and restart

.env
MY_PROVIDER_KEY_1=your-key-here
MY_PROVIDER_KEY_2=optional-second-key
That's it. LLMux auto-discovers the provider on startup. No restarts needed in dev mode (hot reload via pnpm dev).

Ready to unify your AI stack?

Clone, configure, deploy. Three commands to a production AI gateway that never goes down.

🚀 3-COMMAND QUICKSTART
git clone https://github.com/shaik-shahansha/llmux
cd LLMux-llm-gateway && pnpm install
cp config/providers.yaml.example config/providers.yaml
# add your API keys...
pnpm dev