Actively Hiring
We're actively reviewing applications for this position. Apply now to join our team!
AI & ML Engineer
LLMInferencePythonGPUGenAI
Remote | Full-time or Internship
About the Role
We are looking for an AI & ML Engineer to help push the boundaries of LLM inference. You will optimize model performance, explore emerging GenAI algorithms, and improve our GPU rental platform for ML engineers. You’ll also help tell our story through technical content and benchmarks.
What You Will Do
- Optimize LLM inference pipelines for speed, cost, and scalability
- Explore and implement GenAI techniques like speculative decoding, RAG, and quantization
- Collaborate with systems engineers on GPU scheduling and memory optimization
- Improve the developer experience for ML users on our GPU platform
- Write technical articles, benchmarks, and guides for the ML community
What We Are Looking For
- Hands-on experience with LLM inference frameworks (e.g., vLLM, FasterTransformer, DeepSpeed)
- Strong understanding of GPU performance tuning, CUDA, and mixed precision
- Proficiency in Python and ML tooling (PyTorch, Hugging Face Transformers, etc.)
- Excellent written communication and ability to explain technical concepts
- Bonus: Contributions to open source or published ML research/content
Get an Advantage in the Process
We actively interview hackathon participants and bug bounty contributors. Show us your skills in action through our AI Hackathon competition or by solving critical issues in our bug bounty program.