Actively Hiring

We're actively reviewing applications for this position. Apply now to join our team!

AI & ML Engineer

LLMInferencePythonGPUGenAI

Remote | Full-time or Internship

About the Role

We are looking for an AI & ML Engineer to help push the boundaries of LLM inference. You will optimize model performance, explore emerging GenAI algorithms, and improve our GPU rental platform for ML engineers. You’ll also help tell our story through technical content and benchmarks.

What You Will Do

Optimize LLM inference pipelines for speed, cost, and scalability
Explore and implement GenAI techniques like speculative decoding, RAG, and quantization
Collaborate with systems engineers on GPU scheduling and memory optimization
Improve the developer experience for ML users on our GPU platform
Write technical articles, benchmarks, and guides for the ML community

What We Are Looking For

Hands-on experience with LLM inference frameworks (e.g., vLLM, FasterTransformer, DeepSpeed)
Strong understanding of GPU performance tuning, CUDA, and mixed precision
Proficiency in Python and ML tooling (PyTorch, Hugging Face Transformers, etc.)
Excellent written communication and ability to explain technical concepts
Bonus: Contributions to open source or published ML research/content

Apply Now

Get an Advantage in the Process

We actively interview hackathon participants and bug bounty contributors. Show us your skills in action through our AI Hackathon competition or by solving critical issues in our bug bounty program.

Join AI Hackathon 2025 View Bug Bounties