Cerebras — Free LLM API
4 free models available — no credit card required. Get API key →
Ultra-fast inference on Cerebras WSE chips — 1M tokens/day.
Cerebras Cloud offers free API access to Llama and GPT-OSS models running on the Cerebras Wafer-Scale Engine, one of the fastest AI accelerators available. The free tier provides 1 million tokens/day and 14,400 requests/day per model with no credit card required. Context window is limited to 8K on the free tier.
- Ultra-fast inference on WSE chips
- 1M tokens/day free
- No credit card required
- Llama 3.1 8B + GPT-OSS 120B available
API Compatibility: OpenAI SDK-compatible (Chat Completions)
All Free Cerebras Models — Context Windows & Rate Limits
| Model | Context | Max Output | Modality | Rate Limit | Status | |
|---|---|---|---|---|---|---|
| llama3.1-8b | 128K | 8K | 30 RPM, 14,400 RPD, 1M TPD | Details | ||
| gpt-oss-120b | 128K | 8K | 30 RPM, 14,400 RPD, 1M TPD | Details | ||
| qwen-3-235b-a22b-instruct-2507 | 131K | 8K | 30 RPM, 14,400 RPD, 1M TPD | Details | ||
| zai-glm-4.7 | 128K | 8K | 10 RPM, 100 RPD, 1M TPD | Details |
Frequently Asked Questions about Cerebras Free API
Is Cerebras free to use?
Cerebras offers a permanently free tier with 4 available models. No credit card is required to get started — just sign up and generate an API key.
What models does Cerebras offer for free?
Cerebras provides 4 free models covering chat, coding use cases. Supported modalities include text. Browse the full list above with context windows and rate limits.
How do I use Cerebras with Claude Code or Cursor?
Click "Details" on any model above to get one-click configuration snippets for Claude Code (cc), Cursor, Codex, and more.
All Cerebras models listed here use an OpenAI-compatible endpoint, so any tool that accepts a custom baseURL will work.