Cerebras — Free LLM API

4 free models available — no credit card required. Get API key →

Ultra-fast inference on Cerebras WSE chips — 1M tokens/day.

Cerebras Cloud offers free API access to Llama and GPT-OSS models running on the Cerebras Wafer-Scale Engine, one of the fastest AI accelerators available. The free tier provides 1 million tokens/day and 14,400 requests/day per model with no credit card required. Context window is limited to 8K on the free tier.

  • Ultra-fast inference on WSE chips
  • 1M tokens/day free
  • No credit card required
  • Llama 3.1 8B + GPT-OSS 120B available

API Compatibility: OpenAI SDK-compatible (Chat Completions)

All Free Cerebras Models — Context Windows & Rate Limits

Model Context Max Output Modality Rate Limit Status
llama3.1-8b 128K 8K text 30 RPM, 14,400 RPD, 1M TPD Details
gpt-oss-120b 128K 8K text 30 RPM, 14,400 RPD, 1M TPD Details
qwen-3-235b-a22b-instruct-2507 131K 8K text 30 RPM, 14,400 RPD, 1M TPD Details
zai-glm-4.7 128K 8K text 10 RPM, 100 RPD, 1M TPD Details

Frequently Asked Questions about Cerebras Free API

Is Cerebras free to use?

Cerebras offers a permanently free tier with 4 available models. No credit card is required to get started — just sign up and generate an API key.

What models does Cerebras offer for free?

Cerebras provides 4 free models covering chat, coding use cases. Supported modalities include text. Browse the full list above with context windows and rate limits.

How do I use Cerebras with Claude Code or Cursor?

Click "Details" on any model above to get one-click configuration snippets for Claude Code (cc), Cursor, Codex, and more. All Cerebras models listed here use an OpenAI-compatible endpoint, so any tool that accepts a custom baseURL will work.