How to run Oobabooga WebUI on a rented GPU

By Heiko HeiligMay 23, 2025
open-source-aigpu-rentalcloud-computingllm

I Ran a Local LLM in the Cloud (and Didn’t Break Anything)

There’s this moment that keeps happening to me now that I’ve started working in AI and cloud tech. I’ll be reading up on something, maybe messing with a model or following some tutorial, and suddenly I stumble into a whole new tool, ecosystem, or weird rabbit hole I didn’t even know existed.

That’s exactly how I found Oobabooga WebUI.

I was researching alternatives to ChatGPT — not because I’m mad at it, but because I wanted to actually see what the models were doing under the hood. And then, in some our company repos there’s a tutorial on “Oobabooga,” (and its written all seriously and matter-of-factly— like that name isn’t completely unhinged).

Turns out, it’s an incredibly useful (and completely open-source) way to run your own large language models in your browser.

So obviously, I had to try it. And since I didn’t want to turn my laptop into a space heater, I rented a GPU instead.

Here’s how that went — and how you can try it, too.

Why Use Oobabooga Instead of ChatGPT?

Don’t get me wrong — ChatGPT is great. It’s fast, polished, and easy to use. But it’s also kind of a black box. You don’t control the model. You can’t see how anything works. You can give it some general instructions that apply to all your chats and you can use plugins, but still, the inner machinations of its mind remain an enigma.

Oobabooga gives you the opposite experience. It’s a browser-based WebUI that lets you run local LLMs on your own terms. You can:

  • Load any model from Hugging Face
  • Fine-tune it on your own data
  • Customize how generation works (sampling, temperature, prompts, etc.)
  • Host your own OpenAI-compatible API
  • And best of all: explore outputs with no filters, no usage caps, and no third-party limitations
Screenshot of the website huggingface.ai, showing its dashboard.
Hugging Face is one of the biggest ai communities out there. Here, you will find plenty of models to try out.

If you’re building something serious — or you just want to understand what’s actually happening when a model “thinks” — this might be a good way to do it.

Why Not Just Run It Locally?

You can. But… you probably shouldn’t.
Unless you’re rocking a top-tier GPU, running larger models locally turns into a nightmare fast.

Try loading anything beyond 7B parameters and your machine starts wheezing: Memory fills up and atency goes through the roof. An then as a result, you spend more time tweaking your setup than actually using the model.

That’s why I used a rented GPU from a GPU marketplace. You rent a high-end GPU (RTX 4090, RTX 5090) for as long as you need (you pay be the minute instead of a monthly fee) — and run Oobabooga like you would locally, minus the hardware drama.

Here’s how to get up and running using CloudRift.

Step 1: Rent a GPU

  1. Install the Rift CLI (instructions here)
  2. Go to cloudrift.ai
  3. Create an Account and add credits (10USD is more than enough to get started.
  4. Create a new Instance in Container Mode
  5. Pick your GPU (RTX 4090 should be good)
  6. Under Select Software, choose No Container
  7. Deploy your instance

5. Once your machine is live, run rift cluster info in your terminal to grab your IP, or check the CloudRift dashboard where your instance summary is shown.

Step 2: Launch Oobabooga via Docker

Open your terminal and run this:

rift docker run -p 7860 -e EXTRA_LAUNCH_ARGS="--listen --verbose" -it --name oobabooga atinoda/text-generation-webui:default-nvidia

This:

  • Pulls the latest Oobabooga image
  • Exposes it on port 7860
  • Starts it in interactive mode so you can see logs and debug stuff if needed
Screenshot of the CLI, as commands are entered to show the cluster Configuration of the rented GPUs.
This is what you’ll see in the console as your Oogabooga image is being pulled. At the top, you can see the ‘rift cluster info’ command, that lists all the rented GPUs.
Tip: Want to keep it running after closing your terminal? Press Ctrl+P, then Ctrl+Q to safely detach.

If you want to specify which executor to use (in case you’ve rented multiple GPUs), just add -x <executor-name>.

Step 3: Open the Interface

In your browser, go to:

http://<your-server-ip>:7860

If you’re not sure what your IP is, just run:

rift cluster info

It’ll be right there. You can also check it in your CloudRift Workspace:

Screenshot of the CloudRift WebUI, showing the summary of the rented workspace and highlighting the IP Adress.

If everything worked out, you’ll see this in your browser:

A screenshot of the User Interface of Oogabooga WebUI in its pristine state.

Step 4: Load a Model

Once you’re in the UI, go to model and in the model loader on the right, paste in something like:

microsoft/Phi-3-mini-128k-instruct

(It’s lightweight, fast, and great for experimenting.)

Click Download, wait for it to finish, and then hit Reload. The model should now appear in your dropdown. Load it, and you’re good to go.

A screenshot of the models-page of the Oogabooga WebUI, highlighting the fields needed to progress with the tutorial.

You can switch models later, load LoRAs or adapters, or point it to a custom fine-tuned checkpoint if you have one.

Step 5: Start Building Stuff

You’re now running a real model, on real infrastructure, under your full control.

Head to the Chat tab to start prompting, or use the Notebook tab for more structured input/output testing.

You’ll notice the speed is solid, the interface is flexible, and the whole thing just works — which, frankly, still feels like magic when it comes to local AI setups.

Final Thoughts

The whole process took me maybe 15 minutes when I did it for the first time. And now I have a zero-restriction, totally customizable, OpenAI-compatible language model running in my browser — without cooking my laptop or paying for a cloud subscription I don’t need.

Screenshot of Oogabooga WebUI as it generated a response to the User’s query.

Oobabooga is weirdly named, but seriously capable.
And running it on a rented GPU just removes any hardware anxiety.