AI & ML Engineer

LLMInferencePythonGPUGenAI

Remote | Full-time or Internship

About the Role

We are looking for an AI & ML Engineer to help push the boundaries of LLM inference. You will optimize model performance, explore emerging GenAI algorithms, and improve our GPU rental platform for ML engineers. You'll also help tell our story through technical content and benchmarks.

What You Will Do

  • Optimize LLM inference pipelines for speed, cost, and scalability
  • Explore and implement GenAI techniques like speculative decoding, RAG, and quantization
  • Collaborate with systems engineers on GPU scheduling and memory optimization
  • Improve the developer experience for ML users on our GPU platform
  • Write technical articles, benchmarks, and guides for the ML community

What We Are Looking For

  • Hands-on experience with LLM inference frameworks (e.g., vLLM, FasterTransformer, DeepSpeed)
  • Strong understanding of GPU performance tuning, CUDA, and mixed precision
  • Proficiency in Python and ML tooling (PyTorch, Hugging Face Transformers, etc.)
  • Excellent written communication and ability to explain technical concepts
  • Bonus: Contributions to open source or published ML research/content

This position is currently filled.
We'll reopen applications in the future.