Senior AI/ML engineer
SKILLS
FULL DESCRIPTION
[Employer hidden — view at passion-project.co.uk] is the world’s #1 health & fitness app worldwide on a mission to build a better future for female health. Backed by a $200M investment led by General Atlantic, we became the first product of our kind to reach a $1B valuation in 2024 – and we’re not slowing down.
The job
We are looking for a Senior Software Engineer with deep expertise in AI/ML infrastructure to join our AI Platform team and help build the GenAI platform that powers every AI feature at [Employer hidden].
You will bridge core infrastructure, data engineering, and LLM development to deliver production-grade medical safety judges, fine-tuning pipelines, evaluation frameworks, and real-time personalisation. The team operates 60+ LLM-based evaluation judges, develops proprietary fine-tuned health models, and maintains active partnerships with Databricks, Google, OpenAI, Anthropic, and AWS.
What you’ll do
- LLM Judge Ecosystem: build and scale Judge-as-a-Service, prompt registries, calibration pipelines, and evaluation orchestration using MLflow 3.x
- Fine-Tuning and Serving: develop LoRA/SFT/preference optimisation pipelines for health-domain models (Llama, Gemma, MedGemma) and manage model serving at scale on Databricks
- Data and Evaluation Pipelines: build synthetic Q&A generation, golden test sets, reward function engineering, and Delta table schemas in Unity Catalog for reliable, reproducible evaluation data
- Infrastructure: maintain Terraform-managed AWS infrastructure (EKS, S3, IAM), Databricks AI Gateway, and CI/CD pipelines (GitHub Actions) with evaluation gates and progressive rollout
- Cross-Functional Impact: collaborate with Product, Security, Analytics, and Medical teams, develop internal SDKs and APIs consumed by 5+ teams, and engage directly with technology partners on pre-release capabilities
Experience and skills
Must have:
- Engineering maturity: 7+ years of software engineering, 4+ years focused on ML/AI platforms
- LLM experience: recent hands-on work with at least one of: fine-tuning, prompt engineering, LLM evaluation, or model serving
- Technical stack: strong Python across production services and data pipelines, data engineering fundamentals (Spark, Delta tables, Parquet)
- Platform and infrastructure: Databricks (MLflow, Unity Catalog, Model Serving), AWS (EKS/Kubernetes, IAM), Terraform, GitHub Actions
- Cross-domain flexibility: comfort working across ML, data engineering, and infrastructure. You don’t need to be expert in all three, but you contribute wherever the team needs it
Nice to have:
- LLM evaluation frameworks (judges, graders, calibration methodology) or fine-tuning techniques (LoRA, RLHF/DPO, model distillation)
- ML data engineering: synthetic data generation, evaluation dataset design, annotation pipelines
- Healthcare, regulated industry, or safety-critical AI systems experience
- Prompt optimisation frameworks (DSPy or similar), feature stores (Tecton)