Nyra v3.2 — now with 256k context

Reasoning models built for builders.

A fast, predictable inference API for production. Stream tokens in under 80ms, run agents at scale, and ship without babysitting your stack.

Start building free Read the docs →

~ chat.js

// One call. Streaming, tools, and JSON mode included.
const res = await nyra.chat({
  model: "nyra-3.2-flash",
  messages: [{ role: "user", content: "Plan my launch week." }],
  stream: true
});

Platform

Everything you need, nothing you don't.

Opinionated primitives for shipping AI features. Designed for the 80% of teams who want to focus on product, not plumbing.

⚡

Sub-100ms latency

Edge-routed inference across 14 regions. Your users feel the speed, your bill doesn't.

🧠

Frontier reasoning

Nyra-3.2 scores 88.4 on MMLU-Pro and ranks top-3 on SWE-bench Verified.

🔌

Tools & structured I/O

Native function calling, strict JSON schemas, and streaming tool deltas out of the box.

🛡️

Private by default

Zero data retention on the API. SOC 2 Type II, HIPAA-ready, and EU residency available.

📈

Observability built in

Traces, evals, and per-prompt cost analytics — without bolting on a third-party stack.

🧩

Drop-in compatible

OpenAI-compatible endpoints. Swap a base URL and keep your existing client code.

By the numbers

Trusted by teams shipping daily.

12B+

Tokens served / day

78ms

Median time-to-first-token

99.98%

Trailing 90-day uptime

4,200+

Active developer teams

Start free. Scale when you're ready.

$10 in credits on signup. No card required. Pay-as-you-go from $0.20 / M tokens.

Create your account →