Machine Learning Engineer

Learn more about Bespoke Labs
Bespoke Labs

Bespoke Labs

Machine Learning Engineer

Remote
Full Time
Paid
  • Responsibilities

    About Us

    We are AI researchers and builders who understand how to curate data and RL environments that truly improve models. We curated OpenThoughts, one of the best open reasoning datasets, and have trained SOTA models such as Bespoke-MiniCheck and Bespoke-MiniChart.

    We are embarked on a journey to build Environments that are entire digital worlds that can be used to push the frontier of agents.

    What You'll Be Working On

    You will work directly with our research team on RL environment and task creation for agent training. This means designing observation spaces, action spaces, reward signals, and success criteria for new environments — and building the infrastructure that makes world-scale RL training possible. This is a high-ownership role; you will be building novel systems, not maintaining legacy ones.

  • Qualifications

    Must-Have Skills

    3+ years of ML engineering experience — model training, fine-tuning, or post-training pipelines in research or production

    Strong Python and deep learning proficiency (PyTorch preferred; familiar with training loops, optimizers, mixed precision)

    Hands-on experience with LLM post-training — SFT, RLHF, PPO, DPO, or reward model training — and understanding of how training data quality affects model behavior

    Familiarity with RL frameworks (Gymnasium, dm_env) and the ability to design or modify reward functions for agent training objectives

    Experience running experiments at scale on cloud or HPC (AWS, GCP, SLURM, or Ray)

    Solid understanding of evaluation methodology — held-out sets, benchmark design, avoiding train/eval contamination

  • Desired skills

    Good to Have

    Experience with multi-turn or agentic LLM systems (tool use, function calling, agent loops)

    Familiarity with preference data collection and annotation pipelines

    Prior work on RL-from-human-feedback or model-based RL at scale

    Contributions to open-source training frameworks (e.g., trlX, OpenRLHF, verl)

    Experience reading and implementing methods from recent ML papers quickly

  • Compensation
    Hourly compensation for core projects typically ranges between USD $20 and $60 per hour. Final rates are determined based on experience, skills evaluation, location, project scope, and overall requirements
  • Industry
    Information Technology and Services
  • About Us

    Bespoke Labs is a Mountain View based Series A AI Research/Data Curation for Agents Lab. We're working with Frontier AI Labs, and F500 Cos to advance the capabilities of AI Agents.