How to run Oobabooga WebUI on a rented GPU
I Ran a Local LLM in the Cloud (and Didn’t Break Anything)
There’s this moment that keeps happening to me now that I’ve started working in AI and cloud tech. I’ll be reading up on something, maybe messing with a model or following some tutorial, and suddenly I stumble into a whole new tool, ecosystem, or weird rabbit hole I didn’t even know existed.
That’s exactly how I found Oobabooga WebUI.
I was researching alternatives to ChatGPT — not because I’m mad at it, but because I wanted to actually see what the models were doing under the hood. And then, in some our company repos there’s a tutorial on “Oobabooga,” (and its written all seriously and matter-of-factly— like that name isn’t completely unhinged).
Turns out, it’s an incredibly useful (and completely open-source) way to run your own large language models in your browser.
So obviously, I had to try it. And since I didn’t want to turn my laptop into a space heater, I rented a GPU instead.
Here’s how that went — and how you can try it, too.
Why Use Oobabooga Instead of ChatGPT?
Don’t get me wrong — ChatGPT is great. It’s fast, polished, and easy to use. But it’s also kind of a black box. You don’t control the model. You can’t see how anything works. You can give it some general instructions that apply to all your chats and you can use plugins, but still, the inner machinations of its mind remain an enigma.
Oobabooga gives you the opposite experience. It’s a browser-based WebUI that lets you run local LLMs on your own terms. You can:
- Load any model from Hugging Face
- Fine-tune it on your own data
- Customize how generation works (sampling, temperature, prompts, etc.)
- Host your own OpenAI-compatible API
- And best of all: explore outputs with no filters, no usage caps, and no third-party limitations

If you’re building something serious — or you just want to understand what’s actually happening when a model “thinks” — this might be a good way to do it.
Why Not Just Run It Locally?
You can. But… you probably shouldn’t.
Unless you’re rocking a top-tier GPU, running larger models locally turns into a nightmare fast.
Try loading anything beyond 7B parameters and your machine starts wheezing: Memory fills up and atency goes through the roof. An then as a result, you spend more time tweaking your setup than actually using the model.
That’s why I used a rented GPU from a GPU marketplace. You rent a high-end GPU (RTX 4090, RTX 5090) for as long as you need (you pay be the minute instead of a monthly fee) — and run Oobabooga like you would locally, minus the hardware drama.
Here’s how to get up and running using CloudRift.
Step 1: Rent a GPU
- Install the Rift CLI (instructions here)
- Go to cloudrift.ai
- Create an Account and add credits (10USD is more than enough to get started.
- Create a new Instance in Container Mode
- Pick your GPU (RTX 4090 should be good)
- Under Select Software, choose No Container
- Deploy your instance
5. Once your machine is live, run rift cluster info in your terminal to grab your IP, or check the CloudRift dashboard where your instance summary is shown.
Step 2: Launch Oobabooga via Docker
Open your terminal and run this:
rift docker run -p 7860 -e EXTRA_LAUNCH_ARGS="--listen --verbose" -it --name oobabooga atinoda/text-generation-webui:default-nvidia
This:
- Pulls the latest Oobabooga image
- Exposes it on port 7860
- Starts it in interactive mode so you can see logs and debug stuff if needed

Tip: Want to keep it running after closing your terminal? Press Ctrl+P, then Ctrl+Q to safely detach.
If you want to specify which executor to use (in case you’ve rented multiple GPUs), just add -x <executor-name>.
Step 3: Open the Interface
In your browser, go to:
http://<your-server-ip>:7860
If you’re not sure what your IP is, just run:
rift cluster info
It’ll be right there. You can also check it in your CloudRift Workspace:

If everything worked out, you’ll see this in your browser:

Step 4: Load a Model
Once you’re in the UI, go to model and in the model loader on the right, paste in something like:
microsoft/Phi-3-mini-128k-instruct
(It’s lightweight, fast, and great for experimenting.)
Click Download, wait for it to finish, and then hit Reload. The model should now appear in your dropdown. Load it, and you’re good to go.

You can switch models later, load LoRAs or adapters, or point it to a custom fine-tuned checkpoint if you have one.
Step 5: Start Building Stuff
You’re now running a real model, on real infrastructure, under your full control.
Head to the Chat tab to start prompting, or use the Notebook tab for more structured input/output testing.
You’ll notice the speed is solid, the interface is flexible, and the whole thing just works — which, frankly, still feels like magic when it comes to local AI setups.
Final Thoughts
The whole process took me maybe 15 minutes when I did it for the first time. And now I have a zero-restriction, totally customizable, OpenAI-compatible language model running in my browser — without cooking my laptop or paying for a cloud subscription I don’t need.

Oobabooga is weirdly named, but seriously capable.
And running it on a rented GPU just removes any hardware anxiety.