nscale logo How to Get a Free nscale API Key (2026)

0 free models available — credit card may be required. Get your nscale API key → Test free models →

nscale FreeLLM Score

🔹 34/100 Niche Provider — Consider for easy signup How we score →
🎁 Generosity 65 🌍 Access 75 📚 Breadth 0 ⚡ Reliability 30 🔌 Compat 15 🧠 Quality 20

All Free nscale Models — Context Windows & Rate Limits

Model Context Max Output Modality Rate Limit Released Status

What is nscale?

EU-hosted Llama 3.3 70B + Qwen3-Coder + DeepSeek-R1 — fair-use limits.

Nscale provides free API access to Llama-3.3-70B-Instruct, Qwen3-Coder-30B-A3B-Instruct, and DeepSeek-R1-Distill-Llama-70B models hosted on European infrastructure. The free tier uses fair-use rate limiting (no hard RPM/RPD — throttles if needed). OpenAI-compatible endpoint with 128K-256K context windows. No credit card required.

  • Llama 3.3 70B + Qwen3-Coder + DeepSeek-R1
  • Fair-use rates — no hard caps
  • Up to 256K context (Qwen3-Coder)
  • EU-hosted, OpenAI-compatible

API Compatibility: OpenAI SDK-compatible (Chat Completions)

How to Get a nscale API Key

  1. 1
    Sign up at console.nscale.com Email registration. No credit card.
  2. 2
    Go to API Keys
  3. 3
    Generate an API key
  4. 4
    Choose a model Llama-3.3-70B for general use. Qwen3-Coder for code. DeepSeek-R1 for reasoning.
  5. 5
    Configure OpenAI client Base URL: https://inference.api.nscale.com/v1. Fair-use rate limits.

nscale Free Tier Limits & Pricing

Credit Card Required
Free Tier Permanently free
Context Range InfinityM – -Infinity
Total Models 0 free
API Compatibility OpenAI SDK-compatible (Chat Completions)

nscale API Setup Tutorial & Tools

nscale is fully compatible with popular AI coding assistants like Cursor, Claude Code, and more. To see step-by-step API configuration instructions for your favorite tool, please visit our Global Configuration Guide →

Use Cases

What nscale's free models are best for, based on aggregated model capabilities:

Limitations & Caveats

  • Fair-use limits — unpredictable during high demand
  • Small provider — limited track record
  • EU-only latency advantage

Frequently Asked Questions

What does "fair-use" rate limiting mean on Nscale?

Instead of fixed RPM/RPD numbers, Nscale throttles when your usage is significantly above average. During normal conditions, you can make many requests. During peak demand, heavy users may be slowed down to ensure fair access for all.

Is Nscale's Qwen3-Coder different from other providers?

Qwen3-Coder-30B-A3B-Instruct is a MoE coding model — 30B total, 3B active per token. Nscale offers it with a 256K context window, which is wider than most providers giving you more code context per request.

How does Nscale compare to Nebius or OVHcloud?

All three are EU-hosted. Nscale has the best model variety (coding + reasoning + general). Nebius has the largest model (Qwen3 235B). OVHcloud has the most model variety and an anonymous tier. Choose based on your specific model needs.

See our FAQ for common questions about free LLM APIs