Data Engineer

Learn more about Bespoke Labs

Bespoke Labs

Data Engineer

Remote

Full Time

Paid

Responsibilities
About Us

We are AI researchers and builders who understand how to curate data and RL environments that truly improve models. We curated OpenThoughts, one of the best open reasoning datasets, and have trained SOTA models such as Bespoke-MiniCheck and Bespoke-MiniChart.

We are embarked on a journey to build Environments that are entire digital worlds that can be used to push the frontier of agents.

What You'll Be Working On

You will work directly with our research team on RL environment and task creation for agent training. This means designing observation spaces, action spaces, reward signals, and success criteria for new environments — and building the infrastructure that makes world-scale RL training possible. This is a high-ownership role; you will be building novel systems, not maintaining legacy ones.
Qualifications
Must-Have Skills

3+ years of data engineering experience — pipelines, ETL, data modeling in production or research settings

Strong Python proficiency (numpy, pandas, Parquet, HDF5 are daily tools)

Familiarity with at least one RL framework (Gymnasium / OpenAI Gym, dm_env, or equivalent) and working knowledge of RL environment structure — observation/action spaces, reward signals, episode logic

Experience with data versioning and experiment tracking (DVC, MLflow, W&B, or similar)

Comfortable with Docker and cloud infrastructure (AWS or GCP)

Solid grasp of ML storage formats: Parquet, HDF5, JSON Lines
Desired skills
Good to Have

Experience building or wrapping custom Gymnasium environments from scratch

Familiarity with RLHF or preference data pipelines

Exposure to distributed training infrastructure (Ray, SLURM)

Contributions to open-source ML tooling or research infrastructure

Understanding of reward shaping, sparse vs. dense rewards, and episode termination logic
Compensation
Hourly compensation for core projects typically ranges between USD $25 and $60 USD per hour. Final rates are determined based on experience, skills evaluation, location, project scope, and overall requirements.
Industry
Information Technology and Services
About Us
Bespoke Labs is a Mountain View based Series A AI Research/Data Curation for Agents Lab. We're working with Frontier AI Labs, and F500 Cos to advance the capabilities of AI Agents.