✦ v1.0 | MIT License | OpenAI-Compatible

Every Free AI Provider.
One Unified API.

LLMux unifies 20+ free-tier providers — text, image, audio, video — behind a single OpenAI-compatible gateway. Automatic fallback, smart routing, multi-key rotation, zero lock-in.

⭐ Star on GitHub See Examples →

from openai import OpenAI

# Just change base_url — your existing code works
client = OpenAI(
    api_key="any",
    base_url="http://localhost:3000/v1",
)

response = client.chat.completions.create(
    model="llama-3.3-70b",  # or "gemini-flash", "deepseek-r1", ...
    messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)

import OpenAI from "openai";

// Change one line — everything else stays the same
const client = new OpenAI({
  apiKey: "any",
  baseURL: "http://localhost:3000/v1",
});

const { choices } = await client.chat.completions.create({
  model: "auto",  // LLMux auto-picks the best available provider
  messages: [{ role: "user", content: "Write a poem" }],
  stream: true,
});

# Text completion
curl http://localhost:3000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"gemini-flash","messages":[{"role":"user","content":"Hi!"}]}'

# Image generation
curl http://localhost:3000/v1/images/generations \
  -H "Content-Type: application/json" \
  -d '{"prompt":"a robot in a garden","model":"pollinations-flux"}'

# List all models
curl http://localhost:3000/v1/models

Core Features

Everything you need.
Nothing you don't.

LLMux handles the hard parts so you can focus on building.

🔌

OpenAI-Compatible API

Drop-in replacement for the OpenAI API. Works with every OpenAI SDK, LangChain, LlamaIndex, and any tool that speaks OpenAI.

🔄

Automatic Fallback

3-level fallback: retry the same provider, try another with the same model, then any compatible provider. Your app never goes down.

🗝️

Multi-Key Rotation

Add multiple API keys per provider. LLMux round-robins and auto-rotates on 429 errors. Triple your effective rate limit instantly.

📊

Rate Limit Tracking

Tracks RPM, RPD, TPM, and TPD per provider. In-memory by default, Redis-backed for multi-instance deployments.

⚡

Full Streaming Support

SSE streaming proxied transparently across all text providers. Your UI gets tokens as fast as the provider sends them.

🎯

Smart Routing

5 built-in strategies: priority, round-robin, least-busy, latency-based, random-weighted. Switch in one config line.

🖼️

Multi-Modal

Text, image (FLUX, Seedream, GPT-Image), audio (TTS + STT), and video — all through the same OpenAI-compatible surface.

📡

Live Dashboard

Real-time provider health, rate limit states, and request metrics at /dashboard. Know what's healthy before you need it.

🔐

Gateway Auth

Optional shared-secret auth on /v1/* routes. Deploy securely to your team without exposing underlying provider keys.

Why LLMux?

Built differently.
Because free tiers deserve better.

Other gateways were designed for paid-API workflows. LLMux was built from day one around the constraints of free-tier providers — rate limits, multi-key pools, and zero-downtime fallback.

Feature	⚡ LLMux	OpenRouter	LiteLLM	Portkey OSS
Per-key rotation on 429	✅ Built-in	❌	❌	❌
Rate-limit-aware routing	✅ Full	❌	⚠ Partial	❌
Multi-key pool per provider	✅ Unlimited	❌	❌	❌
Self-hosted (no cloud dependency)	✅	❌ Cloud-only	✅	⚠ Limited
OpenAI-compatible API	✅	✅	✅	✅
Free-tier providers (20+)	✅ 20+	⚠ Some	✅	✅
Automatic 3-level fallback	✅ 3 levels	⚠ Basic	✅	✅
Multiple routing strategies	✅ 5 built-in	❌	✅	⚠ Partial
Zero database required	✅	N/A	❌ Needs DB	✅
TypeScript / Node.js	✅	N/A	❌ Python	✅
Live operational dashboard	✅ Built-in	✅ Hosted	⚠ Paid tier	⚠ Basic
Text + Image + Audio + Video	✅ All 4	✅	✅	✅
Cost to self-host on free tiers	$0	$0 + markup	$0	$0

🔄

vs OpenRouter

OpenRouter is a hosted cloud proxy — your traffic goes through their servers and you pay a per-token markup. LLMux runs on your hardware, rotates your own API keys, and costs zero per token on free tiers.

🐍

vs LiteLLM

LiteLLM is excellent but Python-only, requires PostgreSQL + Redis for full features, and doesn't do per-key multi-key rotation on 429s. LLMux is TypeScript, runs with zero database, and key rotation is built into every provider.

☁️

vs Portkey OSS

Portkey's core observability and load balancing features require their hosted cloud service. The OSS version has limited support for multi-key rotation and budget-aware routing. LLMux is 100% self-contained by design.

Live Dashboard

Full operational visibility.
Zero extra tools.

The built-in dashboard at /dashboard gives you real-time insight into every request, every provider, and every rate limit — no setup required.

localhost:3000/dashboard

⚡ LLMux / dashboard

LIVE uptime: 4h 22m

Total Requests

2.4k

Succeeded

2.3k

96.1% rate

Failed

42 rate-limited

Tokens Used

1.8M

Providers

11 healthy

Fallbacks

Provider Health Matrix

Groq Llamatext180ms⭐ T1

Cerebrastext240ms⭐ T1

Gemini Flashtext520ms⭐ T1

Pollinationsimage1.2s✦ T2

Cloudflare AIimage890ms⭐ T1

Live Request Log

14:23:01groqllama-3.3-70bsuccess182ms

14:22:59geminigemini-flashsuccess541ms

14:22:58openrouterdeepseek-r1fallback1.1s

14:22:55pollinationsfluxsuccess2.4s

14:22:51groqwhisper-v3rl—

14:22:49cerebrasllama-3.3-70bsuccess231ms

🩺

Provider Health Matrix

Live status dot, RPM usage bar, average latency, and tier — for every configured provider. Spot degraded providers before they affect users.

📋

Live Request Log

Last 100 requests with provider, model, status, latency, and token count. Fallback chains visible inline. Updates every 3 seconds.

🌡️

Rate Limit Gauges

RPM and RPD usage bars for each provider. Color shifts yellow → red as limits approach. Rotation events logged in real time.

Settings & Admin

Full control.
No config files needed.

The settings panel at /settings lets you manage providers, routing, and API keys — all from a browser.

🔌

Providers Tab

See all providers at a glance. Enable/disable, edit API keys and rate limits, change routing tier, add new dynamic providers, and test connectivity — without touching YAML.

Status shows No API Key for providers that are enabled but missing credentials.

🎯

Router Tab

Switch the active routing strategy (priority, round-robin, least-busy, latency-based, random-weighted) with a single click. Changes take effect immediately — no restart needed.

🔑

Gateway Keys Tab

Create and revoke named API keys for clients accessing /v1/*. Each key has a label, creation date, and can be revoked independently. Keys are shown once on creation.

Gateway API Keys

Named keys for every integration.

Issue a separate gateway key for each app, team, or integration. Revoke one without affecting the rest. The GATEWAY_API_KEY env var remains active alongside UI-managed keys.

Open Gateway tab in /settings

Navigate to /settings and click Gateway in the left sidebar.

Click ＋ New Key

Enter a label (e.g. production-app, vscode-ext) and click Generate.

Copy the key — shown once only

The full key is displayed immediately after creation. Copy it now — it cannot be retrieved later.

Use as a Bearer token

Pass the key to any OpenAI SDK or HTTP client via Authorization: Bearer llmux-...

REST API — manage keys

# List all keys
curl http://localhost:3000/api/admin/gateway-keys \
  -H "X-Admin-Token: $ADMIN_TOKEN"

# Create a named key
curl -X POST http://localhost:3000/api/admin/gateway-keys \
  -H "X-Admin-Token: $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"name": "my-app"}'

# Revoke a key
curl -X DELETE http://localhost:3000/api/admin/gateway-keys/<id> \
  -H "X-Admin-Token: $ADMIN_TOKEN"

python — use gateway key

from openai import OpenAI

client = OpenAI(
    api_key="llmux-xxxxxxxxxxxxxxxxxxxx",
    base_url="http://localhost:3000/v1",
)

# All /v1/* requests now require the gateway key
response = client.chat.completions.create(
    model="auto",
    messages=[{"role": "user", "content": "Hello!"}],
)

Playground

Test every provider,
right in your browser.

The built-in chat playground at /playground lets you interactively test providers and models without writing any code.

🎯

Pin Any Provider

Select a specific provider from the dropdown to route directly to it — bypassing the normal routing strategy. Great for comparing providers side-by-side.

🧠

Model Selection

All models for the selected provider are listed in a second dropdown. The actual API model ID is sent to the provider; the friendly alias is shown in the UI. Status bar shows the provider name and model that served the last response.

⚡

Real-Time Streaming

Toggle streaming on or off. Streaming responses show a blinking cursor while tokens arrive, then render the complete output with a smooth fade-in.

📝

Markdown Rendering

Assistant responses are fully rendered as markdown — headings, bullet lists, numbered lists, fenced code blocks with syntax highlighting, tables, and blockquotes. Code blocks get a one-click copy button on hover.

📊

Stats & Tokens

Every playground conversation is counted in the dashboard stats — Total Requests, Tokens Used, and per-provider breakdowns. Playground requests are first-class citizens.

🌙

Dark & Light Themes

Full dark/light mode toggle. Preferences are persisted in localStorage and respected across dashboard, settings, and playground pages.

localhost:3000/playground

Provider: Groq

Model: llama-3.3-70b

Stream

Explain the transformer architecture in simple terms.

The Transformer is a neural network architecture based on the concept of attention...

attention(Q, K, V) = softmax(QKᵀ/√d) · V

Ask anything...

✦ Groq · llama-3.3-70b 382ms · 847 tokens

Providers

20+ Free Providers.
All Modalities.

Every provider is configurable in a single YAML file. No code changes needed to add or swap providers.

        
          ProviderTierTop ModelsFree Limits

          Groq⭐ 1GPT-OSS 120B, GPT-OSS 20B, Llama 3.3 70B, Llama 4 Scout30 RPM / 14.4K RPD
Cerebras⭐ 1Llama 3.3 70B, Qwen3-32B30 RPM / 1M TPD
Google Gemini⭐ 1Gemini 3.1 Pro Preview, Gemini 3 Flash, 2.5 Pro (1M–2M ctx)15 RPM / 1K RPD
Mistral AI⭐ 1Mistral Large, Small 4 (256K), Magistral Medium, Codestral~2 RPM / 500K TPM
OpenRouter✦ 2DeepSeek R1, DeepSeek V3, Qwen3 Coder 480B, Gemini 2.5 Flash20 RPM / 50 RPD
Cloudflare Workers AI✦ 2Llama 3.3 70B, Llama 4 Scout, DeepSeek R110K neurons/day
Hugging Face✦ 2Llama 4 Scout, Llama 3.3 70B, Qwen3-235BDaily limit
SambaNova✦ 2Llama 4 Maverick, Llama 4 Scout, DeepSeek R1/V320 RPM / 200K TPD
Cohere✦ 2Command A (256K), Command A Reasoning, Command A Vision1K calls/month
DeepSeek✦ 2DeepSeek V3.2, R1 (131K ctx)5M free tokens
NVIDIA NIM✦ 2Nemotron 3 Super 120B (1M ctx), Qwen3.5 397B, Mistral Small 4Free endpoints
GitHub Models✦ 2GPT-5.4, Claude Sonnet 4.6, Llama 4 Scout, DeepSeek R115 RPM / 150 RPD
Pollinations AI✦ 2openai (GPT-5 Mini), claude-fast, deepseek, mistral, kimi, glm, minimax, perplexity + 30 moreFree tier + API key
xAI Grok✦ 2Grok-3, Grok-3-Mini, Grok-3-Fast, Grok-3-Mini-Fast (131K ctx)Paid — credits
Moonshot AI (Kimi)✦ 2Kimi K2.5 (1T MoE, vision), Kimi K2 ThinkingFree credits on signup
Zhipu AI (GLM)✦ 2GLM-5 (744B MoE), GLM-4-FlashFree tier available

        
      

Provider	Tier	Top Models	Free Limits
Groq	⭐ 1	GPT-OSS 120B, GPT-OSS 20B, Llama 3.3 70B, Llama 4 Scout	30 RPM / 14.4K RPD
Cerebras	⭐ 1	Llama 3.3 70B, Qwen3-32B	30 RPM / 1M TPD
Google Gemini	⭐ 1	Gemini 3.1 Pro Preview, Gemini 3 Flash, 2.5 Pro (1M–2M ctx)	15 RPM / 1K RPD
Mistral AI	⭐ 1	Mistral Large, Small 4 (256K), Magistral Medium, Codestral	~2 RPM / 500K TPM
OpenRouter	✦ 2	DeepSeek R1, DeepSeek V3, Qwen3 Coder 480B, Gemini 2.5 Flash	20 RPM / 50 RPD
Cloudflare Workers AI	✦ 2	Llama 3.3 70B, Llama 4 Scout, DeepSeek R1	10K neurons/day
Hugging Face	✦ 2	Llama 4 Scout, Llama 3.3 70B, Qwen3-235B	Daily limit
SambaNova	✦ 2	Llama 4 Maverick, Llama 4 Scout, DeepSeek R1/V3	20 RPM / 200K TPD
Cohere	✦ 2	Command A (256K), Command A Reasoning, Command A Vision	1K calls/month
DeepSeek	✦ 2	DeepSeek V3.2, R1 (131K ctx)	5M free tokens
NVIDIA NIM	✦ 2	Nemotron 3 Super 120B (1M ctx), Qwen3.5 397B, Mistral Small 4	Free endpoints
GitHub Models	✦ 2	GPT-5.4, Claude Sonnet 4.6, Llama 4 Scout, DeepSeek R1	15 RPM / 150 RPD
Pollinations AI	✦ 2	openai (GPT-5 Mini), claude-fast, deepseek, mistral, kimi, glm, minimax, perplexity + 30 more	Free tier + API key
xAI Grok	✦ 2	Grok-3, Grok-3-Mini, Grok-3-Fast, Grok-3-Mini-Fast (131K ctx)	Paid — credits
Moonshot AI (Kimi)	✦ 2	Kimi K2.5 (1T MoE, vision), Kimi K2 Thinking	Free credits on signup
Zhipu AI (GLM)	✦ 2	GLM-5 (744B MoE), GLM-4-Flash	Free tier available

        
          ProviderTierModelsNotes

          Pollinations AI⭐ 1flux, zimage, klein, gptimage, wan-image, qwen-image, seedream5, gptimage-large, kontextflux/zimage/klein/wan-image free
Cloudflare Workers AI⭐ 1FLUX.2 Klein, Flux Schnell, DreamShaper-810K neurons/day
Together AI✦ 2FLUX.1 Schnell Free, FLUX.1 DevFree endpoint
Hugging Face✦ 2FLUX.1 Schnell, SDXLRate-limited
fal.ai✦ 2FLUX Pro 1.1, Schnell~100 free credits

Provider	Tier	Models	Notes
Pollinations AI	⭐ 1	flux, zimage, klein, gptimage, wan-image, qwen-image, seedream5, gptimage-large, kontext	flux/zimage/klein/wan-image free
Cloudflare Workers AI	⭐ 1	FLUX.2 Klein, Flux Schnell, DreamShaper-8	10K neurons/day
Together AI	✦ 2	FLUX.1 Schnell Free, FLUX.1 Dev	Free endpoint
Hugging Face	✦ 2	FLUX.1 Schnell, SDXL	Rate-limited
fal.ai	✦ 2	FLUX Pro 1.1, Schnell	~100 free credits

        
          ProviderModeModels / VoicesFree Limits

          Groq WhisperSTTwhisper-large-v3-turbo, distil-whisper-en20 RPM / 2K RPD
Groq PlayAITTSPlayAI Dialog, ArabicRate-limited
ElevenLabsTTSFlash v2.5, Multilingual v210K chars/month
DeepgramTTS + STTAura-2, Nova-3$200 free credit
Fish AudioTTSSpeech 1.6, 1.5Daily limit
Pollinations AITTS + MusicElevenLabs v3 (30+ voices), ACE-Step music genelevenlabs TTS free
Pollinations AISTTWhisper Large V3, Scribe v2 (90+ langs, diarization)Free + API key

        
      

Provider	Mode	Models / Voices	Free Limits
Groq Whisper	STT	whisper-large-v3-turbo, distil-whisper-en	20 RPM / 2K RPD
Groq PlayAI	TTS	PlayAI Dialog, Arabic	Rate-limited
ElevenLabs	TTS	Flash v2.5, Multilingual v2	10K chars/month
Deepgram	TTS + STT	Aura-2, Nova-3	$200 free credit
Fish Audio	TTS	Speech 1.6, 1.5	Daily limit
Pollinations AI	TTS + Music	ElevenLabs v3 (30+ voices), ACE-Step music gen	elevenlabs TTS free
Pollinations AI	STT	Whisper Large V3, Scribe v2 (90+ langs, diarization)	Free + API key

        
          ProviderModelsNotes

          Pollinations AIltx-2 (LTX-2.3 free), veo (Google Veo 3.1 Fast), wan (Wan 2.6), wan-fast (Wan 2.2), seedance, seedance-pro, nova-reelltx-2 free; others paid pollen credits
Replicateminimax/video-01, tencent/hunyuan-videoPaid credits (free trial available)
Hugging FaceVideo diffusion modelsRate-limited free tier

Provider	Models	Notes
Pollinations AI	ltx-2 (LTX-2.3 free), veo (Google Veo 3.1 Fast), wan (Wan 2.6), wan-fast (Wan 2.2), seedance, seedance-pro, nova-reel	ltx-2 free; others paid pollen credits
Replicate	minimax/video-01, tencent/hunyuan-video	Paid credits (free trial available)
Hugging Face	Video diffusion models	Rate-limited free tier

Examples

See it in action.

Copy-paste examples for every modality, in Python, TypeScript, and bash.

python — streaming

from openai import OpenAI

client = OpenAI(api_key="any", base_url="http://localhost:3000/v1")

# Streaming response
stream = client.chat.completions.create(
    model="gemini-flash",
    messages=[{"role": "user", "content": "Write a haiku about free APIs"}],
    stream=True,
)
for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="", flush=True)

typescript — auto-routing

import OpenAI from "openai";

const client = new OpenAI({ apiKey: "any", baseURL: "http://localhost:3000/v1" });

// "auto" lets LLMux pick the best available provider
const res = await client.chat.completions.create({
  model: "auto",
  messages: [{ role: "user", content: "Explain quantum entanglement simply" }],
});
console.log(res.choices[0].message.content);

bash — reasoning model

curl http://localhost:3000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-r1",
    "messages": [{"role":"user","content":"Prove sqrt(2) is irrational"}]
  }'

python — openai sdk

from openai import OpenAI

client = OpenAI(api_key="any", base_url="http://localhost:3000/v1")

img = client.images.generate(
    prompt="a robot painting a masterpiece, oil on canvas, dramatic lighting",
    model="pollinations-zimage",  # or "cf-flux-schnell", "together-flux-schnell"
    size="1024x1024",
)
print(img.data[0].url)

bash — curl

curl http://localhost:3000/v1/images/generations \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "a cyberpunk city at sunset with neon lights",
    "model": "pollinations-flux",
    "size": "1024x1024",
    "response_format": "url"
  }'

python — text to speech

client = OpenAI(api_key="any", base_url="http://localhost:3000/v1")

# Standard TTS (Pollinations, ElevenLabs, Deepgram, Groq PlayAI)
audio = client.audio.speech.create(
    model="tts-1",           # routes to best available TTS provider
    input="Welcome to LLMux, your unified AI gateway.",
    voice="nova",            # LLMux, echo, fable, onyx, nova, shimmer + 25 more
    response_format="mp3",
)
audio.stream_to_file("welcome.mp3")

# Music generation via Pollinations elevenmusic
music = client.audio.speech.create(
    model="pollinations-music",
    input="An upbeat electronic track with synth leads",
    voice="LLMux",
)
music.stream_to_file("track.mp3")

bash — curl

curl http://localhost:3000/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{"model":"tts-1","input":"Hello from LLMux!","voice":"nova"}' \
  --output speech.mp3

python — speech to text

client = OpenAI(api_key="any", base_url="http://localhost:3000/v1")

with open("meeting.mp3", "rb") as f:
    transcript = client.audio.transcriptions.create(
        model="whisper-large-v3-turbo",  # Groq — fastest free STT
        file=f,
        response_format="json",
    )
print(transcript.text)

# 90+ language support via Pollinations Scribe
transcript = client.audio.transcriptions.create(
    model="pollinations-scribe",
    file=f,
    language="fr",  # ISO-639-1 language code
)

bash — curl

curl http://localhost:3000/v1/audio/transcriptions \
  -F "file=@audio.mp3" \
  -F "model=whisper-large-v3-turbo"

bash — list all models

# List all available models with aliases
curl http://localhost:3000/v1/models | jq '.data[].id'

# Per-provider health + rate-limit state
curl http://localhost:3000/health/providers | jq '.'

# Gateway health
curl http://localhost:3000/health

Multi-Key Rotation

Multiply your rate limits.
Zero code changes.

Add multiple API keys per provider. LLMux round-robins normally and auto-rotates on 429 errors — fully transparent to your client.

Create multiple free accounts

Add keys to .env

Name them GROQ_API_KEY_1, GROQ_API_KEY_2, etc.

Use `api_keys` in providers.yaml

Switch from api_key to api_keys (plural) with the array of keys.

Automatic rotation

LLMux round-robins across keys, and instantly rotates to the next key on any 429 rate-limit error.

          providers.yaml
        

          # Before: single key
- id: groq-llama3-70b
  api_key: ${GROQ_API_KEY}

# After: 3 keys = 3× the rate limit
- id: groq-llama3-70b
  api_keys:
    - ${GROQ_API_KEY_1}   # 30 RPM
    - ${GROQ_API_KEY_2}   # 30 RPM
    - ${GROQ_API_KEY_3}   # 30 RPM
  # Result: 90 effective RPM ↑
        

Routing

5 Strategies.
One Config Line.

Change router.default_strategy in providers.yaml to switch strategies instantly.

priority

Priority

Routes to Tier 1 first, then Tier 2, etc. Ensures the highest-priority providers are used first.

Best for: default, quality-first

round-robin

Round-Robin

Cycles through all available providers evenly, distributing load across the entire pool.

Best for: even load distribution

least-busy

Least-Busy

Always routes to the provider with the fewest in-flight requests. Minimizes queue depth.

Best for: high-concurrency workloads

latency-based

Latency-Based

Tracks rolling average latency per provider and routes to the historically fastest one.

Best for: latency-sensitive apps

random-weighted

Random-Weighted

Tier-weighted random selection. Spreads load while still preferring higher-tier providers.

Best for: A/B testing, load spreading

Provider Tiers

Tiers = routing priority. Not price.

Every provider gets a tier (1–4). LLMux tries Tier 1 providers first, then falls back down. There's no cost or quality implication — you can run completely free providers at Tier 1.

TIER 1 — HIGH PRIORITY

Tried first on every request. Your fastest, most reliable providers (Groq, Gemini, Cerebras).

TIER 2 — STANDARD

Tried after Tier 1 is busy or rate-limited. Good for secondary free providers.

TIER 3 — FALLBACK

Used when higher tiers are all busy or failing. Good for slower, uncapped providers.

TIER 4 — LAST RESORT

Emergency fallback only. For experimental or rarely-available providers.

Fallback

Automatic 3-Level Fallback

When a provider fails, LLMux silently tries the next option. Your client sees a successful response — nothing else.

Primary Provider

Tier 1, best match

→

Rate limited?

429 / timeout

→

Same Model
Alt Provider

Retry max_retries

→

Any Compatible
Provider

Same modality

→

✓ Response

Client gets answer

Deployment

Deploy anywhere.
In minutes.

LLMux is a standard Node.js app. Deploy it anywhere you can run containers or Node.

🐳 Docker

The recommended way. Full control, Redis included via Compose.

            docker build -t LLMux-gateway .
docker run -p 3000:3000 \
  --env-file .env \
  -v $(pwd)/config:/app/config \
  LLMux-gateway
          

🚂 Railway

One-click deploy. Set API keys as env vars, done.

            # railway.toml already configured
railway up

# Set env vars in Railway dashboard
          

▲ Vercel Edge

Global edge deployment. Sub-20ms latency from any region.

            # vercel.json already configured
vercel deploy --prod
          

🖥️ Self-hosted VPS

Full control. Use with PM2 or systemd for production stability.

            pnpm build
pm2 start dist/index.js \
  --name LLMux-gateway
          

Extensibility

Add any provider
in 4 steps.

LLMux's plugin architecture makes adding new providers simple. If it has an API, it can be a provider.

Create the provider class

src/providers/text/myprovider.ts

import { BaseProvider } from "../base.js";

export class MyProvider extends BaseProvider {
  async chatCompletion(req, ctx) {
    const model = this.resolveModel(req.model);
    const res = await this.postJSON(
      `${this.config.baseUrl}/chat/completions`,
      { ...req, model }, ctx
    );
    if (req.stream) return this.proxyStream(res, ctx);
    const data = await res.json();
    // recordTokens updates rate-limit tracker AND dashboard stats
    await this.recordTokens(ctx, data.usage?.total_tokens ?? 0);
    return new Response(JSON.stringify(data), {
      headers: { "Content-Type": "application/json" }
    });
  }
}

Register in registry.ts

src/providers/registry.ts

              import { MyProvider } from "./text/myprovider.js";

const PROVIDER_FACTORIES = {
  // existing providers...
  "my-provider-id": MyProvider,  // must match id in providers.yaml
};
            

Add to providers.yaml

config/providers.yaml

              - id: my-provider-id
  name: My Provider
  modality: text
  tier: 2
  enabled: true
  requires_auth: true
  api_keys:                    # multi-key support built-in
    - ${MY_PROVIDER_KEY_1}
    - ${MY_PROVIDER_KEY_2}
  base_url: https://api.myprovider.com/v1
  adapter: openai
  models:
    - id: my-model-large
      alias: my-large
      context_window: 128000
  limits:
    rpm: 60
  concurrency: 5
  timeout: 30000
  max_retries: 2
            

Add API key to .env and restart

.env

              MY_PROVIDER_KEY_1=your-key-here
MY_PROVIDER_KEY_2=optional-second-key
            

That's it. LLMux auto-discovers the provider on startup. No restarts needed in dev mode (hot reload via pnpm dev).

Every Free AI Provider.One Unified API.

Everything you need.Nothing you don't.

OpenAI-Compatible API

Automatic Fallback

Multi-Key Rotation

Rate Limit Tracking

Full Streaming Support

Smart Routing

Multi-Modal

Live Dashboard

Gateway Auth

Built differently.Because free tiers deserve better.

vs OpenRouter

vs LiteLLM

vs Portkey OSS

Full operational visibility.Zero extra tools.

Provider Health Matrix

Live Request Log

Rate Limit Gauges

Full control.No config files needed.

Providers Tab

Router Tab

Gateway Keys Tab

Named keys for every integration.

Open Gateway tab in /settings

Click ＋ New Key

Copy the key — shown once only

Use as a Bearer token

Test every provider,right in your browser.

Pin Any Provider

Model Selection

Real-Time Streaming

Markdown Rendering

Stats & Tokens

Dark & Light Themes

20+ Free Providers.All Modalities.

See it in action.

Multiply your rate limits.Zero code changes.

Create multiple free accounts

Add keys to .env

Use api_keys in providers.yaml

Automatic rotation

5 Strategies.One Config Line.

Priority

Round-Robin

Least-Busy

Latency-Based

Random-Weighted

Tiers = routing priority. Not price.

Automatic 3-Level Fallback

Deploy anywhere.In minutes.

🐳 Docker

🚂 Railway

▲ Vercel Edge

🖥️ Self-hosted VPS

Add any providerin 4 steps.

Create the provider class

Register in registry.ts

Add to providers.yaml

Add API key to .env and restart

Ready to unify your AI stack?

Every Free AI Provider.
One Unified API.

Everything you need.
Nothing you don't.

Built differently.
Because free tiers deserve better.

Full operational visibility.
Zero extra tools.

Full control.
No config files needed.

Test every provider,
right in your browser.

20+ Free Providers.
All Modalities.

Multiply your rate limits.
Zero code changes.

Use `api_keys` in providers.yaml

5 Strategies.
One Config Line.

Deploy anywhere.
In minutes.

Add any provider
in 4 steps.