Best Free LLM APIs for Chat

398 free models available for chat. How to choose a free LLM for chat →

For general conversation, look for low latency, strong instruction following, and a helpful personality. Gemini 2.5 Flash offers the largest free context window (1M tokens) with multimodal support. Llama 3.3 70B via Groq delivers the fastest tokens-per-second. Qwen3.5 models on NVIDIA NIM strike a balance of quality and speed.

What to Look for in a Chat Model

Chat models are the most common type of LLM, but they vary significantly in quality for conversation use:

  • Latency / tokens per second — Real-time conversation needs fast responses. Groq's LPU hardware delivers the fastest inference (Llama 3.3 70B hits 100+ tok/s). NVIDIA NIM and OpenRouter are slower but offer more model variety.
  • Context window — Long conversations or document Q&A need a large context window. Gemini 2.5 Flash (1M ctx) can hold an entire book in memory. Most chat models have 32K–128K, which handles typical back-and-forth conversations easily.
  • Instruction following — A good chat model stays on-topic, follows system prompts, and avoids hallucinating. Llama 3.3 70B and Qwen3 are known for strong instruction adherence.
  • Multilingual support — If you chat in non-English languages, check the model's training data. Qwen3 has strong Chinese/English bilingual performance. Gemini and Llama support 30+ languages.
  • Multimodal input — Want to share images or audio in chat? Gemini 2.5 Flash accepts text, image, audio, and video. Most chat models are text-only.

How to Choose a Free Chat Model

Match the model to your chat use case:

  • Casual conversation / chatbot? → Prioritize latency and personality. Llama 3.3 70B via Groq (fastest) or Gemini 2.5 Flash via Google AI Studio (most capable).
  • Long-form Q&A / document chat? → Maximize context window. Gemini 2.5 Flash (1M) or Qwen3.5 122B (262K via NVIDIA NIM).
  • Multilingual chat? → Qwen3.5 excels in Chinese-English. Gemini supports 30+ languages. Llama covers major European and Asian languages.
  • Roleplay / creative conversation? → Look for models with strong creative writing. Llama 3.3 70B and Mistral models tend to have more varied output styles.
  • Customer support bot? → Instruction following and safety are critical. Gemini and Qwen3 are well-aligned. Avoid unmoderated open models unless you add guardrails.

Top Picks for Chat

All Free Chat Models

Provider Model Context Max Output Modality Rate Limit Released
OpenRouter Cohere: North Mini Code (free) 256K 64K textcode 200 req/day (free tier) Jun 9, 2026 Details
OpenRouter Nex AGI: Nex-N2-Pro 262K 262K textimage 200 req/day (free tier) Jun 2, 2026 Details
OpenRouter NVIDIA: Nemotron 3.5 Content Safety (free) 128K 8K textimagereasoning 200 req/day (free tier) Jun 4, 2026 Details
OpenRouter NVIDIA: Nemotron 3 Ultra (free) 1.0M 66K textreasoning 200 req/day (free tier) Jun 4, 2026 Details
OpenRouter MiniMax: MiniMax M3 1.0M 512K textimagevideoreasoning 200 req/day (free tier) Jun 1, 2026 Details
OpenRouter inclusionAI: Ring-2.6-1T 262K 66K text 200 req/day (free tier) May 8, 2026 Details
OpenRouter NVIDIA: Nemotron 3 Nano Omni (free) 256K 66K textimageaudiovideoreasoning 200 req/day (free tier) Apr 28, 2026 Details
OpenRouter Poolside: Laguna XS.2 (free) 262K 33K text 200 req/day (free tier) Apr 28, 2026 Details
OpenRouter Poolside: Laguna M.1 (free) 262K 33K text 200 req/day (free tier) Apr 28, 2026 Details
OpenRouter DeepSeek: DeepSeek V4 Flash 1.0M 66K textreasoning 200 req/day (free tier) Apr 24, 2026 Details
OpenRouter MoonshotAI: Kimi K2.6 262K 262K textimagevideoreasoning 200 req/day (free tier) Apr 20, 2026 Details
OpenRouter Z.ai: GLM 5.1 203K 8K text 200 req/day (free tier) Apr 7, 2026 Details
OpenRouter Google: Gemma 4 26B A4B (free) 262K 33K textimagereasoning 200 req/day (free tier) Apr 2, 2026 Details
OpenRouter Google: Gemma 4 31B (free) 262K 8K textimagereasoning 200 req/day (free tier) Apr 2, 2026 Details
OpenRouter Arcee AI: Trinity Large Thinking 262K 80K textreasoning 200 req/day (free tier) Apr 1, 2026 Details
OpenRouter Google: Lyria 3 Pro Preview 1.0M 66K textimage 200 req/day (free tier) Mar 30, 2026 Details
OpenRouter Google: Lyria 3 Clip Preview 1.0M 66K textimage 200 req/day (free tier) Mar 30, 2026 Details
OpenRouter NVIDIA: Nemotron 3 Super (free) 1.0M 262K textreasoning 200 req/day (free tier) Mar 11, 2026 Details
OpenRouter MiniMax: MiniMax M2.5 205K 197K textreasoning 200 req/day (free tier) Feb 12, 2026 Details
OpenRouter Free Models Router 200K 8K textimage 200 req/day (free tier) Feb 1, 2026 Details
OpenRouter LiquidAI: LFM2.5-1.2B-Thinking (free) 33K 8K textreasoning 200 req/day (free tier) Jan 20, 2026 Details
OpenRouter LiquidAI: LFM2.5-1.2B-Instruct (free) 33K 8K text 200 req/day (free tier) Jan 5, 2026 Details
OpenRouter NVIDIA: Nemotron 3 Nano 30B A3B (free) 256K 8K textreasoning 200 req/day (free tier) Dec 14, 2025 Details
OpenRouter OpenAI: gpt-oss-safeguard-20b 131K 66K text 200 req/day (free tier) Oct 29, 2025 Details
OpenRouter NVIDIA: Nemotron Nano 12B 2 VL (free) 128K 128K textimagevideoreasoning 200 req/day (free tier) Oct 28, 2025 Details
OpenRouter Qwen: Qwen3 Next 80B A3B Instruct (free) 262K 8K text 200 req/day (free tier) Sep 11, 2025 Details
OpenRouter NVIDIA: Nemotron Nano 9B V2 (free) 128K 8K textreasoning 200 req/day (free tier) Sep 5, 2025 Details
OpenRouter OpenAI: gpt-oss-120b (free) 131K 131K textreasoning 200 req/day (free tier) Aug 5, 2025 Details
OpenRouter OpenAI: gpt-oss-20b (free) 131K 33K text 200 req/day (free tier) Aug 5, 2025 Details
OpenRouter Z.ai: GLM 4.5 Air 131K 98K textreasoning 200 req/day (free tier) Jul 28, 2025 Details
OpenRouter Qwen: Qwen3 Coder 480B A35B (free) 1.0M 262K textcode 200 req/day (free tier) Jul 23, 2025 Details
OpenRouter Venice: Uncensored (free) 33K 8K text 200 req/day (free tier) Jul 9, 2025 Details
OpenRouter Meta: Llama 3.3 70B Instruct (free) 131K 8K text 200 req/day (free tier) Dec 6, 2024 Details
OpenRouter Meta: Llama 3.2 3B Instruct (free) 131K 8K text 200 req/day (free tier) Sep 25, 2024 Details
OpenRouter Nous: Hermes 3 405B Instruct (free) 131K 8K text 200 req/day (free tier) Aug 16, 2024 Details
Chutes.ai Llama 3.1 70B 131K 8K text Community-powered, no hard cap Jul 23, 2024 Details
Glhf.chat Llama 3.1 70B 131K 8K text Unlimited for free models Jul 23, 2024 Details
Glhf.chat Mixtral 8x7B 33K 0 text Unlimited for free models Details
Grok (xAI) Grok-2 131K 0 text $25/month free credits, resets monthly Dec 12, 2024 Details
Grok (xAI) Grok-2 Mini 131K 0 text $25/month free credits, resets monthly Details
Groq Moonshot Kimi K2 131K 0 text See provider page Sep 5, 2025 Details
Groq Moonshot Kimi K2 0905 131K 0 text See provider page Sep 5, 2025 Details
Groq GPT-OSS 120B 131K 0 text See provider page Aug 5, 2025 Details
Groq GPT-OSS 20B 131K 0 text See provider page Aug 5, 2025 Details
GitHub Models Mistral Large (24.11) 131K 16K textimage See provider page Feb 26, 2024 Details
GitHub Models AI21 Jamba 1.5 Large 256K 0 text See provider page Details
Cerebras Llama 3.1 70B 131K 8K text See provider page Jul 23, 2024 Details
Mistral AI Mistral 7B 33K 0 text See provider page Details
Mistral AI Mixtral 8x7B 33K 0 text See provider page Details
Cloudflare Workers AI Mistral 7B 33K 0 text See provider page Details
Cloudflare Workers AI Qwen 1.5 7B 33K 0 text See provider page Details
Agnes AI agnes-1.5-flash 256K 64K textvision 30 RPM Details
Agnes AI agnes-2.0-flash 256K 64K textvision 30 RPM Details
Aion Labs Aion 2.5 128K 32K text 15 RPM, 20K TPD Details
Aion Labs Aion 2.0 128K 32K text 15 RPM, 20K TPD Feb 23, 2026 Details
Aion Labs Aion-RP 1.0 (8B) 32K 8K text 15 RPM, 20K TPD Details
Cohere Command A+ (218B) 128K 4K text 20 RPM Details
Cohere Command A (111B) 256K 4K text 20 RPM Details
Cohere Command R+ 128K 4K text 20 RPM Details
Cohere Command R7B 128K 4K text 20 RPM Details
Google Gemini Gemini 3.5 Flash 1.0M 64K textimagevideoaudiopdfreasoning 15 RPM, 1,500 RPD May 19, 2026 Details
Google Gemini Gemini 3.1 Flash-Lite 1.0M 65K textimagevideoaudiopdfreasoning 30 RPM, 1,500 RPD Mar 3, 2026 Details
Google Gemini Gemini 2.5 Flash 1.0M 65K textimageaudiovideopdfreasoning 15 RPM, 1,500 RPD May 20, 2025 Details
Google Gemini Gemini 2.5 Pro 2.0M 65K textimageaudiovideopdfreasoning 5 RPM, 50 RPD Jun 5, 2025 Details
Mistral AI Mistral Medium 3.5 (128B) 256K 256K text ~1 RPS, 500K TPM Details
Mistral AI Mistral Small 4 256K 256K text ~1 RPS, 500K TPM Mar 16, 2026 Details
Mistral AI Mistral Large 3 256K 256K text ~1 RPS, 500K TPM Dec 2, 2025 Details
Mistral AI Mistral Nemo (12B) 128K 128K text ~1 RPS, 500K TPM Jul 1, 2024 Details
Mistral AI Codestral 256K 256K textcode ~1 RPS, 500K TPM Details
Mistral AI Pixtral Large 128K 128K textimage ~1 RPS, 500K TPM Nov 18, 2024 Details
Z AI (Zhipu AI) GLM-4.7-Flash 200K 128K textreasoning 1 concurrent request Jan 19, 2026 Details
Z AI (Zhipu AI) GLM-4.6V-Flash 128K 4K textimagevideoreasoning 1 concurrent request Dec 8, 2025 Details
Cerebras gpt-oss-120b 128K 8K textreasoning 30 RPM, 14,400 RPD, 1M TPD Aug 5, 2025 Details
Cerebras zai-glm-4.7 128K 8K textreasoning 10 RPM, 100 RPD, 1M TPD Dec 22, 2025 Details
Cloudflare Workers AI @cf/meta/llama-3.3-70b-instruct-fp8-fast 131K 131K text 10K neurons/day (shared) Dec 6, 2024 Details
Cloudflare Workers AI @cf/meta/llama-4-scout-17b-16e-instruct 10.0M 131K text 10K neurons/day (shared) Details
Cloudflare Workers AI @cf/openai/gpt-oss-120b 128K 131K text 10K neurons/day (shared) Details
Cloudflare Workers AI @cf/moonshotai/kimi-k2.7-code 262K 131K textcodeimagevideoreasoning 10K neurons/day (shared) Jun 12, 2026 Details
Cloudflare Workers AI @cf/google/gemma-4-26b-a4b-it 256K 131K textimagereasoning 10K neurons/day (shared) Apr 2, 2026 Details
Cloudflare Workers AI @cf/zhipuai/glm-4.7-flash 131K 131K textreasoning 10K neurons/day (shared) Jan 19, 2026 Details
Cloudflare Workers AI @cf/mistralai/mistral-small-3.1-24b-instruct 128K 131K text 10K neurons/day (shared) Mar 17, 2025 Details
Cloudflare Workers AI @cf/deepseek-ai/deepseek-r1-distill-qwen-32b 32K 131K textreasoning 10K neurons/day (shared) Jan 20, 2025 Details
GitHub Models gpt-5 200K 32K textimagereasoning 10 RPM, 50 RPD Aug 7, 2025 Details
GitHub Models gpt-4.1 1.0M 32K textimagepdf 10 RPM, 50 RPD Apr 14, 2025 Details
GitHub Models gpt-4.1-mini 1.0M 32K textimagepdf 15 RPM, 150 RPD Apr 14, 2025 Details
GitHub Models gpt-4o 128K 16K textimagepdf 10 RPM, 50 RPD May 13, 2024 Details
GitHub Models o4-mini 200K 100K textimagereasoning 10 RPM, 50 RPD Apr 16, 2025 Details
GitHub Models Llama-4-Scout-17B-16E 512K 4K textimage 15 RPM, 150 RPD Apr 5, 2025 Details
GitHub Models Llama-4-Maverick-17B-128E 256K 4K textimage 10 RPM, 50 RPD Apr 5, 2025 Details
GitHub Models Meta-Llama-3.3-70B 131K 4K text 15 RPM, 150 RPD Dec 6, 2024 Details
GitHub Models DeepSeek-R1 64K 8K textreasoning 15 RPM, 150 RPD May 28, 2025 Details
GitHub Models Mistral-Small-3.1 128K 4K text 15 RPM, 150 RPD Mar 17, 2025 Details
Groq llama-3.3-70b-versatile 131K 32K text 30 RPM, 1,000 RPD Dec 6, 2024 Details
Groq llama-3.1-8b-instant 131K 131K text 30 RPM, 1,000 RPD Jul 23, 2024 Details
Groq llama-4-scout-17b-16e-instruct 131K 8K textimage 30 RPM, 1,000 RPD Apr 5, 2025 Details
Groq qwen3-32b 131K 131K textreasoning 30 RPM, 1,000 RPD Apr 28, 2025 Details
Hugging Face Meta-Llama-3.1-8B-Instruct 128K 4K text Credit-metered Jul 23, 2024 Details
Hugging Face Mistral-7B-Instruct-v0.3 32K 4K text Credit-metered Details
Hugging Face Mixtral-8x7B-Instruct-v0.1 32K 4K text Credit-metered Details
Hugging Face Phi-3.5-mini-instruct 128K 4K text Credit-metered Details
Hugging Face Qwen2.5-7B-Instruct 131K 4K text Credit-metered Oct 16, 2024 Details
Kilo Code x-ai/grok-code-fast-1:free 256K 131K textcode ~200 req/hr Aug 28, 2025 Details
Kilo Code minimax/minimax-m2.5:free 196K 8K textreasoning ~200 req/hr Feb 12, 2026 Details
Kilo Code bytedance-seed/dola-seed-2.0-pro:free 131K 131K text ~200 req/hr Details
Kilo Code nvidia/nemotron-3-super-120b-a12b:free 262K 32K textreasoning ~200 req/hr Mar 11, 2026 Details
Kilo Code arcee-ai/trinity-large-thinking:free 131K 131K textreasoning ~200 req/hr Apr 1, 2026 Details
LLM7.io deepseek-r1-0528 131K 131K textreasoning 30 RPM (120 with token) May 28, 2025 Details
LLM7.io deepseek-v3-0324 131K 131K text 30 RPM (120 with token) Mar 25, 2025 Details
LLM7.io gemini-2.5-flash-lite 131K 131K textimageaudiovideopdfreasoning 30 RPM (120 with token) Jun 17, 2025 Details
LLM7.io gpt-4o-mini 131K 131K textimagepdf 30 RPM (120 with token) Jul 18, 2024 Details
LLM7.io mistral-small-3.1-24b 32K 131K text 30 RPM (120 with token) Mar 17, 2025 Details
LLM7.io qwen2.5-coder-32b 131K 131K textcode 30 RPM (120 with token) Nov 11, 2024 Details
ModelScope Qwen/Qwen3.5-35B-A3B 131K 131K textimagevideoaudioreasoning 2,000 RPD total; <=500 RPD/model (dynamic) Feb 24, 2026 Details
ModelScope Qwen/Qwen3.5-27B 131K 131K textimagevideoaudioreasoning 2,000 RPD total; <=500 RPD/model (dynamic) Feb 24, 2026 Details
Ollama Cloud gpt-oss:120b-cloud 128K 131K textreasoning Session/weekly limits (unpublished) Aug 5, 2025 Details
Ollama Cloud deepseek-v3.1:671b-cloud 128K 131K text Session/weekly limits (unpublished) Details
Ollama Cloud qwen3-coder:480b-cloud 128K 131K textcode Session/weekly limits (unpublished) Details
Ollama Cloud kimi-k2:1t-cloud 262K 131K text Session/weekly limits (unpublished) Details
Ollama Cloud glm-4.6:cloud 128K 131K textreasoning Session/weekly limits (unpublished) Sep 30, 2025 Details
Ollama Cloud deepseek-r1:cloud 128K 131K textreasoning Session/weekly limits (unpublished) Jan 20, 2025 Details
OVHcloud AI Endpoints Qwen3.5-397B-A17B 131K 32K textimagevideoaudioreasoning 2 RPM (anonymous) Feb 16, 2026 Details
OVHcloud AI Endpoints gpt-oss-20b 128K 8K text 2 RPM (anonymous) Aug 5, 2025 Details
OVHcloud AI Endpoints Meta-Llama-3_3-70B-Instruct 131K 4K text 2 RPM (anonymous) Dec 6, 2024 Details
OVHcloud AI Endpoints Llama-3.1-8B-Instruct 131K 4K text 2 RPM (anonymous) Jul 23, 2024 Details
OVHcloud AI Endpoints Qwen3.6-27B 131K 32K textimagevideoaudioreasoning 2 RPM (anonymous) Apr 22, 2026 Details
OVHcloud AI Endpoints Qwen3.5-9B 131K 8K textreasoning 2 RPM (anonymous) Mar 2, 2026 Details
OVHcloud AI Endpoints Qwen3-Coder-30B-A3B-Instruct 262K 32K textcode 2 RPM (anonymous) Jul 31, 2025 Details
OVHcloud AI Endpoints Qwen2.5-VL-72B-Instruct 128K 8K textimage 2 RPM (anonymous) Sep 1, 2024 Details
OVHcloud AI Endpoints Mistral-Small-3.2-24B-Instruct 128K 4K text 2 RPM (anonymous) Jun 20, 2025 Details
OVHcloud AI Endpoints Mistral-Nemo-Instruct-2407 128K 4K text 2 RPM (anonymous) Jul 1, 2024 Details
SambaNova DeepSeek-V3.1 128K 8K text 20 RPM, 20 RPD, 200K TPD Aug 21, 2025 Details
SambaNova DeepSeek-V3.2 (Preview) 128K 8K text 20 RPM, 20 RPD, 200K TPD Details
SambaNova MiniMax-M2.7 128K 8K textreasoning 20 RPM, 20 RPD, 200K TPD Mar 18, 2026 Details
SambaNova gemma-4-31B-it (Preview) 128K 8K textimagereasoning 20 RPM, 20 RPD, 200K TPD Apr 2, 2026 Details
SiliconFlow deepseek-ai/DeepSeek-R1-Distill-Qwen-7B 131K 131K textreasoning 30 RPM, 60K TPM Details
SiliconFlow Abbreviation 131K 8K text See provider page Details
NVIDIA NIM 01-ai/yi-large 131K 8K text Up to 40 RPM Details
NVIDIA NIM adept/fuyu-8b 131K 8K text Up to 40 RPM Details
NVIDIA NIM ai21labs/jamba-1.5-large-instruct 131K 8K text Up to 40 RPM Aug 22, 2024 Details
NVIDIA NIM aisingapore/sea-lion-7b-instruct 131K 8K text Up to 40 RPM Details
NVIDIA NIM baai/bge-m3 131K 8K text Up to 40 RPM Details
NVIDIA NIM bigcode/starcoder2-15b 131K 8K text Up to 40 RPM Details
NVIDIA NIM databricks/dbrx-instruct 131K 8K text Up to 40 RPM Mar 27, 2024 Details
NVIDIA NIM deepseek-ai/deepseek-coder-6.7b-instruct 131K 8K text Up to 40 RPM Details
NVIDIA NIM deepseek-ai/deepseek-v4-flash 1.0M 66K textreasoning Up to 40 RPM Apr 24, 2026 Details
NVIDIA NIM deepseek-ai/deepseek-v4-pro 1.0M 384K textreasoning Up to 40 RPM Apr 24, 2026 Details
NVIDIA NIM google/codegemma-1.1-7b 131K 8K text Up to 40 RPM Details
NVIDIA NIM google/codegemma-7b 131K 8K text Up to 40 RPM Details
NVIDIA NIM google/deplot 131K 8K text Up to 40 RPM Details
NVIDIA NIM google/gemma-2b 131K 8K text Up to 40 RPM Details
NVIDIA NIM google/recurrentgemma-2b 131K 8K text Up to 40 RPM Details
NVIDIA NIM ibm/granite-3.0-3b-a800m-instruct 131K 8K text Up to 40 RPM Details
NVIDIA NIM ibm/granite-3.0-8b-instruct 131K 8K text Up to 40 RPM Details
NVIDIA NIM ibm/granite-34b-code-instruct 131K 8K text Up to 40 RPM Details
NVIDIA NIM ibm/granite-8b-code-instruct 131K 8K text Up to 40 RPM Details
NVIDIA NIM meta/codellama-70b 131K 8K text Up to 40 RPM Details
NVIDIA NIM meta/llama-3.1-70b-instruct 131K 16K text Up to 40 RPM Jul 23, 2024 Details
NVIDIA NIM meta/llama-3.2-11b-vision-instruct 131K 16K textimage Up to 40 RPM Sep 25, 2024 Details
NVIDIA NIM meta/llama-3.2-1b-instruct 131K 60K text Up to 40 RPM Sep 25, 2024 Details
NVIDIA NIM meta/llama-3.2-3b-instruct 131K 8K text Up to 40 RPM Sep 25, 2024 Details
NVIDIA NIM meta/llama-guard-4-12b 164K 16K textimage Up to 40 RPM Apr 30, 2025 Details
NVIDIA NIM meta/llama2-70b 131K 8K text Up to 40 RPM Jul 18, 2023 Details
NVIDIA NIM microsoft/kosmos-2 131K 8K text Up to 40 RPM Details
NVIDIA NIM microsoft/phi-3-vision-128k-instruct 131K 8K text Up to 40 RPM Details
NVIDIA NIM microsoft/phi-3.5-moe-instruct 131K 8K text Up to 40 RPM Details
NVIDIA NIM microsoft/phi-4-multimodal-instruct 131K 8K text Up to 40 RPM Feb 26, 2025 Details
NVIDIA NIM minimaxai/minimax-m2.7 205K 197K textreasoning Up to 40 RPM Mar 18, 2026 Details
NVIDIA NIM minimaxai/minimax-m3 1.0M 512K textimagevideoreasoning Up to 40 RPM Jun 1, 2026 Details
NVIDIA NIM mistralai/codestral-22b-instruct-v0.1 131K 8K text Up to 40 RPM Details
NVIDIA NIM mistralai/mistral-7b-instruct-v0.3 131K 8K text Up to 40 RPM Details
NVIDIA NIM mistralai/mistral-large-2-instruct 131K 8K text Up to 40 RPM Nov 18, 2024 Details
NVIDIA NIM mistralai/mixtral-8x22b-v0.1 131K 8K text Up to 40 RPM Details
NVIDIA NIM moonshotai/kimi-k2.6 262K 262K textimagevideoreasoning Up to 40 RPM Apr 20, 2026 Details
NVIDIA NIM nv-mistralai/mistral-nemo-12b-instruct 131K 8K text Up to 40 RPM Details
NVIDIA NIM nvidia/cosmos-reason2-8b 131K 8K textreasoning Up to 40 RPM Details
NVIDIA NIM nvidia/embed-qa-4 131K 8K embedding Up to 40 RPM Details
NVIDIA NIM nvidia/llama-3.1-nemotron-51b-instruct 131K 8K text Up to 40 RPM Details
NVIDIA NIM nvidia/llama-3.1-nemotron-70b-instruct 131K 8K text Up to 40 RPM Oct 15, 2024 Details
NVIDIA NIM nvidia/llama-3.1-nemotron-ultra-253b-v1 131K 8K textreasoning Up to 40 RPM Apr 7, 2025 Details
NVIDIA NIM nvidia/llama-3.2-nemoretriever-1b-vlm-embed-v1 131K 8K embeddingrerank Up to 40 RPM Details
NVIDIA NIM nvidia/llama-3.2-nv-embedqa-1b-v1 131K 8K embedding Up to 40 RPM Details
NVIDIA NIM nvidia/llama-3.3-nemotron-super-49b-v1.5 131K 16K textreasoning Up to 40 RPM Jul 25, 2025 Details
NVIDIA NIM nvidia/llama-nemotron-embed-1b-v2 131K 8K embeddingtextimage Up to 40 RPM Feb 10, 2026 Details
NVIDIA NIM nvidia/llama-nemotron-embed-vl-1b-v2 131K 8K embeddingtextimage Up to 40 RPM Feb 10, 2026 Details
NVIDIA NIM nvidia/llama3-chatqa-1.5-70b 131K 8K text Up to 40 RPM Details
NVIDIA NIM nvidia/mistral-nemo-minitron-8b-8k-instruct 131K 8K text Up to 40 RPM Details
NVIDIA NIM nvidia/nemoretriever-parse 131K 8K rerank Up to 40 RPM Details
NVIDIA NIM nvidia/nemotron-3.5-content-safety 128K 8K textimagereasoning Up to 40 RPM Jun 4, 2026 Details
NVIDIA NIM nvidia/nemotron-4-340b-instruct 131K 8K text Up to 40 RPM Details
NVIDIA NIM nvidia/nemotron-4-340b-reward 131K 8K text Up to 40 RPM Details
NVIDIA NIM nvidia/nemotron-nano-3-30b-a3b 131K 8K textreasoning Up to 40 RPM Dec 15, 2025 Details
NVIDIA NIM nvidia/nemotron-parse 131K 8K text Up to 40 RPM Details
NVIDIA NIM nvidia/neva-22b 131K 8K text Up to 40 RPM Details
NVIDIA NIM nvidia/nv-embed-v1 131K 8K embedding Up to 40 RPM Details
NVIDIA NIM nvidia/nv-embedcode-7b-v1 131K 8K embedding Up to 40 RPM Details
NVIDIA NIM nvidia/nv-embedqa-e5-v5 131K 8K embedding Up to 40 RPM Details
NVIDIA NIM nvidia/nv-embedqa-mistral-7b-v2 131K 8K embedding Up to 40 RPM Details
NVIDIA NIM nvidia/nvclip 131K 8K text Up to 40 RPM Details
NVIDIA NIM nvidia/riva-translate-4b-instruct 131K 8K text Up to 40 RPM Details
NVIDIA NIM nvidia/vila 131K 8K text Up to 40 RPM Details
NVIDIA NIM qwen/qwen3.5-122b-a10b 262K 262K textimagevideoaudioreasoning Up to 40 RPM Feb 24, 2026 Details
NVIDIA NIM qwen/qwen3.5-397b-a17b 256K 8K textimagevideoaudioreasoning Up to 40 RPM Feb 16, 2026 Details
NVIDIA NIM snowflake/arctic-embed-l 131K 8K embedding Up to 40 RPM Details
NVIDIA NIM stepfun-ai/step-3.5-flash 262K 66K textreasoning Up to 40 RPM Feb 2, 2026 Details
NVIDIA NIM stepfun-ai/step-3.7-flash 256K 256K textimagereasoning Up to 40 RPM May 29, 2026 Details
NVIDIA NIM writer/palmyra-creative-122b 131K 8K text Up to 40 RPM Details
NVIDIA NIM writer/palmyra-fin-70b-32k 131K 8K text Up to 40 RPM Details
NVIDIA NIM writer/palmyra-med-70b 131K 8K text Up to 40 RPM Details
NVIDIA NIM writer/palmyra-med-70b-32k 131K 8K text Up to 40 RPM Details
NVIDIA NIM z-ai/glm-5.1 203K 8K text Up to 40 RPM Apr 7, 2026 Details
NVIDIA NIM zyphra/zamba2-7b-instruct 131K 8K text Up to 40 RPM Details
AI21 Labs Jamba Large 1.7 256K 4K text 200 RPM, 10 RPS Aug 8, 2025 Details
AI21 Labs Jamba Mini 2 256K 4K text 200 RPM, 10 RPS Details
Aion Labs aion-1.0 131K 32K text Daily token allowance Feb 4, 2025 Details
Aion Labs aion-1.0-mini 131K 32K text Daily token allowance Feb 4, 2025 Details
Alibaba Cloud Model Studio Qwen3-Max 128K 32K text Tiered by region Sep 23, 2025 Details
Alibaba Cloud Model Studio Qwen3-Plus 1.0M 32K text Tiered by region Details
Alibaba Cloud Model Studio Qwen3-VL-Plus 128K 8K textimage Tiered by region Details
Alibaba Cloud Model Studio Qwen3-Coder-Plus 256K 8K textcode Tiered by region Sep 23, 2025 Details
Alibaba Cloud Model Studio QwQ-Plus 131K 32K text Tiered by region Details
Cohere Embed 4 131K 131K textembedding 2,000 inputs/min Details
Cohere Rerank 3.5 131K 131K textrerank 10 RPM Details
DeepSeek deepseek-chat (V3.2) 128K 8K text Dynamic Dec 1, 2025 Details
DeepSeek deepseek-reasoner (R1) 128K 8K textreasoning Dynamic Details
Google Gemini Gemini 3 Flash (Preview) 1.0M 65K text Preview limits Details
Mistral AI Mistral Medium 3 128K 128K text ~1 RPS, 500K TPM May 7, 2025 Details
xAI grok-4.3 1.0M 32K text Credit-based Apr 30, 2026 Details
xAI grok-4.1-fast 2.0M 32K text Credit-based Nov 19, 2025 Details
xAI grok-3-mini 131K 8K text Credit-based Details
Z AI (Zhipu AI) GLM-4.5-Flash 128K 8K text 1 concurrent request Details
Cerebras llama-3.3-70b 128K 8K text 30 RPM, 14,400 RPD, 1M TPD Dec 6, 2024 Details
Cerebras qwen-3-235b-a22b-instruct-2507 131K 8K text 30 RPM, 14,400 RPD, 1M TPD Apr 28, 2025 Details
Cerebras qwen-3-32b 131K 8K text 30 RPM, 14,400 RPD, 1M TPD Apr 28, 2025 Details
Cloudflare Workers AI @cf/meta/llama-3.1-8b-instruct-fp8-fast 131K 131K text 10K neurons/day (shared) Jul 23, 2024 Details
Cloudflare Workers AI @cf/meta/llama-3.2-11b-vision-instruct 131K 131K textimage 10K neurons/day (shared) Sep 25, 2024 Details
Cloudflare Workers AI @cf/moonshotai/kimi-k2.5 256K 131K text 10K neurons/day (shared) Details
Groq llama-4-maverick-17b-128e-instruct 131K 8K text 15 RPM, 500 RPD Details
Groq kimi-k2-instruct 262K 262K text 30 RPM, 14,400 RPD Sep 5, 2025 Details
Groq deepseek-r1-distill-70b 131K 8K textreasoning 30 RPM, 14,400 RPD Details
Groq whisper-large-v3 131K 131K text 20 RPM, 2,000 RPD Details
Groq whisper-large-v3-turbo 131K 131K text 20 RPM, 2,000 RPD Details
ModelScope Qwen/Qwen-Image 131K 131K text 2,000 RPD total; model/AIGC-specific caps Details
Nebius Qwen3-235B-A22B 128K 32K text Tier-based Apr 28, 2025 Details
Nscale Llama-3.3-70B-Instruct 128K 8K text Fair-use Dec 6, 2024 Details
Nscale DeepSeek-R1-Distill-Llama-70B 128K 32K textreasoning Fair-use Jan 20, 2025 Details
OVHcloud AI Endpoints Qwen3Guard-Gen-8B 32K 4K text 2 RPM (anonymous) Details
OVHcloud AI Endpoints Qwen3Guard-Gen-0.6B 32K 4K text 2 RPM (anonymous) Details
SiliconFlow deepseek-ai/DeepSeek-OCR 131K 8K text 30 RPM, 60K TPM Details
OpenRouter Baidu Qianfan: CoBuddy 131K 65K textcode 200 req/day (free tier) Details
OpenRouter NVIDIA: Llama Nemotron Embed VL 1B V2 (free) 131K 8K textimageembedding 200 req/day (free tier) Feb 25, 2026 Details
OpenRouter NVIDIA: Llama Nemotron Rerank VL 1B V2 (free) 10K 8K textimagererank 200 req/day (free tier) Jun 9, 2026 Details
OpenCode Zen big-pickle N/A N/A Details
OpenCode Zen DeepSeek V4 Flash 1.0M 384K reasoning Details
OpenCode Zen MiMo-V2.5 1.0M 131K visionaudioreasoning Details
OpenCode Zen Nemotron 3 Ultra 550B A55B 1.0M 128K reasoning Details
OpenCode Zen North Mini Code 256K 64K reasoning Details
ModelScope deepseek-ai/DeepSeek-V3.2 8K 4K Dec 1, 2025 Details
LLM7.io Codestral (latest) 256K 4K text Details
ModelScope deepseek-ai/DeepSeek-V4-Flash 8K 4K Apr 24, 2026 Details
Cloudflare Workers AI @cf/openai/gpt-oss-120b 8K 4K Details
ModelScope deepseek-ai/DeepSeek-V4-Pro 8K 4K Apr 24, 2026 Details
Cloudflare Workers AI @cf/baai/bge-m3 8K 4K Details
Google Gemini Gemini 2.5 Flash 1.0M 66K visionaudioreasoning Jun 17, 2025 Details
LLM7.io devstral-small-2:24b 8K 4K Details
Cloudflare Workers AI @cf/google/gemma-2b-it-lora 8K 4K Details
ModelScope LLM-Research/c4ai-command-r-plus-08-2024 8K 4K Details
ModelScope LLM-Research/Llama-4-Maverick-17B-128E-Instruct 8K 4K Details
Cloudflare Workers AI @cf/meta/llama-3.2-3b-instruct 8K 4K Sep 25, 2024 Details
ModelScope MedAIBase/AntAngelMed 8K 4K Details
Cloudflare Workers AI @cf/meta/llama-guard-3-8b 8K 4K Details
ModelScope meituan-longcat/LongCat-Flash-Lite 8K 4K Details
Cloudflare Workers AI @cf/qwen/qwen3-embedding-0.6b 8K 4K embedding Details
ModelScope MiniMax/MiniMax-M1-80k 8K 4K Details
ModelScope MiniMax-M2.5-highspeed 205K 131K reasoning Details
Cloudflare Workers AI @cf/mistral/mistral-7b-instruct-v0.2-lora 8K 4K Details
ModelScope MiniMax-M3 512K 128K visionreasoning Details
ModelScope mistralai/Ministral-8B-Instruct-2410 8K 4K Details
ModelScope mistralai/Mistral-Large-Instruct-2407 8K 4K Nov 19, 2024 Details
ModelScope mistralai/Mistral-Small-Instruct-2409 8K 4K Details
ModelScope Kimi K2.5 262K 262K visionreasoning Details
ModelScope MusePublic/Qwen-Image-Edit 8K 4K Details
ModelScope opencompass/CompassJudger-1-32B-Instruct 8K 4K Details
ModelScope OpenGVLab/InternVL3_5-241B-A28B 8K 4K Details
ModelScope PaddlePaddle/ERNIE-4.5-0.3B-PT 8K 4K Details
ModelScope PaddlePaddle/ERNIE-4.5-21B-A3B-PT 8K 4K Details
ModelScope PaddlePaddle/ERNIE-4.5-300B-A47B-PT 8K 4K Details
ModelScope PaddlePaddle/ERNIE-4.5-VL-28B-A3B-PT 8K 4K Details
ModelScope Qwen/Qwen-Image-Edit 8K 4K Details
ModelScope Qwen/Qwen3-14B 8K 4K Apr 28, 2025 Details
ModelScope Qwen/Qwen3-235B-A22B 8K 4K Apr 28, 2025 Details
Cloudflare Workers AI @cf/moonshotai/kimi-k2.7-code 8K 4K Details
ModelScope Qwen/Qwen3-235B-A22B-Instruct-2507 8K 4K Jul 21, 2025 Details
Cloudflare Workers AI @cf/pfnet/plamo-embedding-1b 8K 4K embedding Details
ModelScope Qwen/Qwen3-235B-A22B-Thinking-2507 8K 4K reasoning Jul 25, 2025 Details
Cloudflare Workers AI @cf/deepseek-ai/deepseek-r1-distill-qwen-32b 8K 4K reasoning Jan 29, 2025 Details
ModelScope Qwen/Qwen3-30B-A3B 8K 4K Apr 28, 2025 Details
ModelScope Qwen/Qwen3-30B-A3B-Thinking-2507 8K 4K reasoning Aug 28, 2025 Details
ModelScope Qwen/Qwen3-32B 8K 4K Apr 28, 2025 Details
ModelScope Qwen/Qwen3-4B 8K 4K Details
Cloudflare Workers AI @cf/meta/llama-3.1-8b-instruct-fp8 8K 4K Jul 23, 2024 Details
ModelScope Qwen/Qwen3-8B 8K 4K Apr 28, 2025 Details
Cloudflare Workers AI @cf/meta/llama-3.2-1b-instruct 8K 4K Sep 25, 2024 Details
ModelScope Qwen/Qwen3-Coder-30B-A3B-Instruct 8K 4K Jul 31, 2025 Details
Cloudflare Workers AI @cf/moonshotai/kimi-k2.6 8K 4K Details
ModelScope Qwen/Qwen3-Next-80B-A3B-Instruct 8K 4K Sep 11, 2025 Details
Cloudflare Workers AI @cf/zai-org/glm-4.7-flash 8K 4K Details
ModelScope Qwen/Qwen3-Next-80B-A3B-Thinking 8K 4K reasoning Sep 11, 2025 Details
ModelScope Qwen/Qwen3-VL-235B-A22B-Instruct 8K 4K Sep 23, 2025 Details
ModelScope Qwen/Qwen3-VL-8B-Instruct 8K 4K Oct 14, 2025 Details
Cloudflare Workers AI @cf/meta-llama/llama-2-7b-chat-hf-lora 8K 4K Details
Cloudflare Workers AI @cf/meta/llama-3.3-70b-instruct-fp8-fast 8K 4K Dec 6, 2024 Details
ModelScope Qwen/Qwen3-VL-8B-Thinking 8K 4K reasoning Oct 14, 2025 Details
Cloudflare Workers AI @cf/ibm-granite/granite-4.0-h-micro 8K 4K Details
ModelScope Qwen/Qwen3.5-122B-A10B 8K 4K Feb 25, 2026 Details
ModelScope Qwen/Qwen3.5-27B 8K 4K Feb 25, 2026 Details
Z AI (Zhipu AI) GLM-4.5-Air 131K 98K reasoning Jul 25, 2025 Details
ModelScope Qwen/Qwen3.5-35B-A3B 8K 4K Feb 25, 2026 Details
Cloudflare Workers AI @cf/baai/bge-small-en-v1.5 8K 4K Details
Cloudflare Workers AI @cf/qwen/qwen2.5-coder-32b-instruct 8K 4K Nov 11, 2024 Details
ModelScope Qwen/Qwen3.5-397B-A17B 8K 4K Feb 16, 2026 Details
ModelScope Shanghai_AI_Laboratory/Intern-S1 8K 4K Details
ModelScope Shanghai_AI_Laboratory/Intern-S1-mini 8K 4K Details
ModelScope Shanghai_AI_Laboratory/Intern-S2-Preview 8K 4K Details
ModelScope stepfun-ai/Step-3.5-Flash 8K 4K Details
ModelScope stepfun-ai/Step-3.7-Flash 8K 4K Details
ModelScope XGenerationLab/XiYanSQL-QwenCoder-32B-2412 8K 4K Details
ModelScope XGenerationLab/XiYanSQL-QwenCoder-32B-2504 8K 4K Details
ModelScope GLM-4.7-FlashX 200K 131K reasoning Details
ModelScope GLM-5.1 200K 131K reasoning Details
ModelScope GLM-5.1 200K 131K reasoning Details
ModelScope GLM-5.2 1.0M 131K reasoning Details
Cloudflare Workers AI @cf/zai-org/glm-5.2 8K 4K Details
Cloudflare Workers AI @cf/nvidia/nemotron-3-120b-a12b 8K 4K Details
Cloudflare Workers AI @cf/baai/bge-base-en-v1.5 8K 4K Details
Cloudflare Workers AI @cf/aisingapore/gemma-sea-lion-v4-27b-it 8K 4K Details
Cloudflare Workers AI @cf/qwen/qwen3-30b-a3b-fp8 8K 4K Apr 28, 2025 Details
Cloudflare Workers AI @cf/google/gemma-7b-it-lora 8K 4K Details
Cloudflare Workers AI @cf/google/gemma-4-26b-a4b-it 8K 4K Apr 3, 2026 Details
Cloudflare Workers AI @cf/mistralai/mistral-small-3.1-24b-instruct 8K 4K Mar 17, 2025 Details
Cloudflare Workers AI @cf/openai/gpt-oss-20b 8K 4K Details
Cloudflare Workers AI @cf/google/embeddinggemma-300m 8K 4K embedding Details
Cloudflare Workers AI @cf/meta/llama-4-scout-17b-16e-instruct 8K 4K Details
Cloudflare Workers AI @cf/qwen/qwq-32b 8K 4K Details
Cloudflare Workers AI @cf/baai/bge-large-en-v1.5 8K 4K Details
NVIDIA NIM abacusai/dracarys-llama-3.1-70b-instruct 8K 4K Details
NVIDIA NIM bytedance/seed-oss-36b-instruct 8K 4K Details
NVIDIA NIM google/diffusiongemma-26b-a4b-it 8K 4K Details
NVIDIA NIM google/gemma-2-2b-it 8K 4K Details
NVIDIA NIM google/gemma-3n-e2b-it 8K 4K Details
OpenRouter Gemma 4 31B IT 262K 33K visionreasoning 200 req/day (free tier) Apr 2, 2026 Details
NVIDIA NIM meta/llama-3.1-8b-instruct 8K 4K Jul 23, 2024 Details
NVIDIA NIM meta/llama-3.2-90b-vision-instruct 8K 4K Details
NVIDIA NIM Llama-3.3-70B-Instruct 128K 4K text Dec 6, 2024 Details
NVIDIA NIM meta/llama-4-maverick-17b-128e-instruct 8K 4K Details
NVIDIA NIM microsoft/phi-4-mini-instruct 8K 4K Details
NVIDIA NIM mistralai/ministral-14b-instruct-2512 8K 4K Dec 2, 2025 Details
NVIDIA NIM mistralai/mistral-large-3-675b-instruct-2512 8K 4K Details
NVIDIA NIM mistralai/mistral-medium-3.5-128b 8K 4K Details
NVIDIA NIM mistralai/mistral-nemotron 8K 4K Details
NVIDIA NIM mistralai/mistral-small-4-119b-2603 8K 4K Details
NVIDIA NIM mistralai/mixtral-8x7b-instruct-v0.1 8K 4K Details
NVIDIA NIM nvidia/gliner-pii 8K 4K Details
NVIDIA NIM nvidia/ising-calibration-1-35b-a3b 8K 4K Details
NVIDIA NIM nvidia/llama-3.1-nemoguard-8b-content-safety 8K 4K Details
NVIDIA NIM nvidia/llama-3.1-nemoguard-8b-topic-control 8K 4K Details
NVIDIA NIM nvidia/llama-3.1-nemotron-nano-vl-8b-v1 8K 4K Details
NVIDIA NIM Llama 3.1 Nemotron Safety Guard 8B v3 128K 4K text Details
NVIDIA NIM Llama 3.3 Nemotron Super 49B v1 131K 131K reasoning Details
NVIDIA NIM Nemotron 3 Content Safety 128K 4K text Details
OpenRouter Nemotron 3 Nano 30B A3B 262K 262K reasoning 200 req/day (free tier) Dec 14, 2025 Details
NVIDIA NIM Nemotron 3 Nano Omni 30B A3B Reasoning 256K 66K visionaudioreasoning Apr 28, 2026 Details
OpenRouter Nemotron 3 Super 120B A12B 262K 262K reasoning 200 req/day (free tier) Mar 11, 2026 Details
OpenRouter Nemotron 3 Ultra 550B A55B 1.0M 128K reasoning 200 req/day (free tier) Jun 4, 2026 Details
NVIDIA NIM Nemotron Content Safety Reasoning 4B 128K 4K reasoning Details
NVIDIA NIM Nemotron Mini 4B Instruct 128K 8K text Details
NVIDIA NIM Nemotron Nano 12B v2 VL 128K 128K visionreasoning Oct 28, 2025 Details
NVIDIA NIM nvidia/nvidia-nemotron-nano-9b-v2 8K 4K Details
NVIDIA NIM nvidia/riva-translate-4b-instruct-v1.1 8K 4K Details
OpenRouter GPT OSS 120B 131K 33K reasoning 200 req/day (free tier) Details
OpenRouter openai/gpt-oss-20b 8K 4K 200 req/day (free tier) Details
OpenRouter qwen/qwen3-next-80b-a3b-instruct 8K 4K 200 req/day (free tier) Sep 11, 2025 Details
NVIDIA NIM sarvamai/sarvam-m 8K 4K Details
NVIDIA NIM stockmark/stockmark-2-100b-instruct 8K 4K Details
NVIDIA NIM upstage/solar-10.7b-instruct 8K 4K Details
Google Gemini Gemma 4 26B A4B IT 262K 33K visionreasoning Apr 3, 2026 Details
Google Gemini Gemma 4 31B IT 262K 33K visionreasoning Apr 2, 2026 Details
Google Gemini Gemini Flash-Lite Latest 1.0M 66K visionaudioreasoning Details
Google Gemini Gemini 2.5 Flash-Lite 1.0M 66K visionaudioreasoning Jul 22, 2025 Details
OpenRouter poolside/laguna-xs.2 8K 4K 200 req/day (free tier) Details
OpenRouter poolside/laguna-m.1 8K 4K 200 req/day (free tier) Details
OpenRouter Gemma 4 26B A4B IT 262K 33K visionreasoning 200 req/day (free tier) Apr 3, 2026 Details
Google Gemini Gemini 3.1 Flash Lite 1.0M 66K visionaudioreasoning May 7, 2026 Details
Google Gemini Gemini 3.1 Flash Lite 1.0M 66K visionaudioreasoning May 7, 2026 Details
OpenRouter qwen/qwen3-coder 8K 4K 200 req/day (free tier) Jul 23, 2025 Details
OpenRouter meta-llama/llama-3.3-70b-instruct 8K 4K 200 req/day (free tier) Dec 6, 2024 Details
OpenRouter meta-llama/llama-3.2-3b-instruct 8K 4K 200 req/day (free tier) Sep 25, 2024 Details
OpenRouter nousresearch/hermes-3-llama-3.1-405b 8K 4K 200 req/day (free tier) Details
Google Gemini gemini-robotics-er-1.6-preview 8K 4K Details