Cerebras — Free LLM API

4 free models available — no credit card required. Get API key →

Ultra-fast inference on Cerebras WSE chips — 1M tokens/day.

Cerebras Cloud offers free API access to Llama and GPT-OSS models running on the Cerebras Wafer-Scale Engine, one of the fastest AI accelerators available. The free tier provides 1 million tokens/day and 14,400 requests/day per model with no credit card required. Context window is limited to 8K on the free tier.

Ultra-fast inference on WSE chips
1M tokens/day free
No credit card required
Llama 3.1 8B + GPT-OSS 120B available

API Compatibility: OpenAI SDK-compatible (Chat Completions)

All Free Cerebras Models — Context Windows & Rate Limits

Model	Context	Max Output	Modality	Rate Limit
llama3.1-8b	128K	8K	text	30 RPM, 14,400 RPD, 1M TPD	Details
gpt-oss-120b	128K	8K	text	30 RPM, 14,400 RPD, 1M TPD	Details
qwen-3-235b-a22b-instruct-2507	131K	8K	text	30 RPM, 14,400 RPD, 1M TPD	Details
zai-glm-4.7	128K	8K	text	10 RPM, 100 RPD, 1M TPD	Details

Frequently Asked Questions about Cerebras Free API

Is Cerebras free to use?

Cerebras offers a permanently free tier with 4 available models. No credit card is required to get started — just sign up and generate an API key.

What models does Cerebras offer for free?

Cerebras provides 4 free models covering chat, coding use cases. Supported modalities include text. Browse the full list above with context windows and rate limits.

How do I use Cerebras with Claude Code or Cursor?

Click "Details" on any model above to get one-click configuration snippets for Claude Code (cc), Cursor, Codex, and more. All Cerebras models listed here use an OpenAI-compatible endpoint, so any tool that accepts a custom baseURL will work.