High-Performance LLM
Inference at Low Cost
Serve models like Llama 4 and DeepSeek 3.1 with our high-performance GPU platform. Our Inference APIs make it simple to scale your projects and pay only for what you need.
Qwen3.6-35B-A3B-FP8
$0.15 in | $1.00 out262.14K Context
Qwen3.6 35B mixture-of-experts (3B active) at FP8 - fits a single RTX PRO 6000 Blackwell with 262K context.
Get API KeyQuick Start
import OpenAI from "openai";
const openai = new OpenAI({
baseURL: "https://inference.cloudrift.ai/v1",
apiKey: "YOUR_RIFT_API_KEY",
});
const completion = await openai.chat.completions.create({
model: "llama4:maverick",
messages: [
{
role: "user",
content: "What is the meaning of life?"
}
],
stream: true,
});
for await (const chunk of completion) {
process.stdout.write(chunk.choices[0]?.delta.content as string);
}Models
All Available Models
Cost-effective access to high-performance models — no queues, no GPUs to reserve. Just straightforward model options you can build on.
Qwen3.6-35B-A3B-FP8
$0.15 in | $1.00 out262.14K Context
Qwen3.6 35B mixture-of-experts (3B active) at FP8 - fits a single RTX PRO 6000 Blackwell with 262K context.
Get API KeyGet in touch
Ready to get started?
Get in touch with our team to discuss your requirements and find the right solution for your infrastructure.