High-Performance LLM
Inference at Low Cost

Serve models like Llama 4 and DeepSeek 3.1 with our high-performance GPU platform. Our Inference APIs make it simple to scale your projects and pay only for what you need.

Qwen3.5-122B-A10B-FP8 logo

Qwen3.5-122B-A10B-FP8

$0.25 in | $1.50 out262.14K Context

Qwen3.5 is a mixture-of-experts (MoE) language model with 122B total parameters and 10B active. Excels at reasoning, coding, and multilingual tasks.

Get API Key

Quick Start

import OpenAI from "openai";

const openai = new OpenAI({
  baseURL: "https://inference.cloudrift.ai/v1",
  apiKey: "YOUR_RIFT_API_KEY",
});

const completion = await openai.chat.completions.create({
  model: "llama4:maverick",
  messages: [
    {
      role: "user",
      content: "What is the meaning of life?"
    }
  ],
  stream: true,
});

for await (const chunk of completion) {
  process.stdout.write(chunk.choices[0]?.delta.content as string);
}

Models

All Available Models

Cost-effective access to high-performance models — no queues, no GPUs to reserve. Just straightforward model options you can build on.

Qwen3.5-122B-A10B-FP8 logo

Qwen3.5-122B-A10B-FP8

$0.25 in | $1.50 out262.14K Context

Qwen3.5 is a mixture-of-experts (MoE) language model with 122B total parameters and 10B active. Excels at reasoning, coding, and multilingual tasks.

Get API Key
Get in touch

Ready to get started?

Get in touch with our team to discuss your requirements and find the right solution for your infrastructure.