Which models can I use?

Start with open embeddings for images and text; bring your own checkpoints for customization.

Do I need a VM or can I use containers?

Use Container Mode for fast setup and reproducibility. Use a VM if you need full OS control.

Can I reserve capacity?

Yes. Reserve for up to 3 months with upfront payment and discounted rates.

Add Multimodal Search to Your Apps with Cloud‑Hosted GPUs.

Generate embeddings for images and text, build a vector index, and serve blazing‑fast search without maintaining on‑prem hardware.

Launch a GPU View Docs

No rate limitsPay-as-you-goPre-Configured Containers

Why Use CloudRift for
Multimodal Search?

Embedding and reranking workloads need low-latency, GPU-accelerated infrastructure. CloudRift lets you choose: run your own stack on rented GPUs, or deploy via managed inference endpoints. Scale workers as traffic grows — your stack, your pace.

✓ Image + text embeddings
✓ VM or container flexibility
✓ No local installs
✓ Scale to multi‑GPU

How It Works

Run on Rented GPUs

Spin up your own stack using RTX 4090, 5090, or 6000 Pro. Full control over models, containers, and workflows.

See Pricing →

Use Managed Endpoints

Skip infra setup with OpenAI-compatible endpoints for embeddings and reranking. Fast to launch, easy to scale.

Run Inference →

Mix and Match

Start with inference, switch to self-hosted when needed—or run both in parallel. CloudRift gives you options.

Read Docs →

What our Partners say

From customers scaling embeddings on CloudRift.

Aamir Shakir

Co-founder @ Mixedbread

We're using CloudRift at Mixedbread to run the inference for our state-of-the art embedding & machine learning models. The service is amazing, it is extremely stable, the GPUs are affordable & provision fast. Also the specs around the GPUs such as CPU, Memory and Network are the best. We really enjoy the personal & fast support.

Get Started Contact Us