Senior Platform Engineer (Infrastructure)

🔒 Confidential Employer
Posted 21 March 2026
LOCATION
London
TYPE
Full-time
LEVEL
Mid-Senior level
CATEGORY
Technology
This employer holds a UK Home Office sponsor license — sponsorship for this specific role is at the employer’s discretion

SKILLS

AWS EKS Terraform Python Go Linux Observability Scalability Automation

FULL DESCRIPTION

Senior Platform Engineer (Infrastructure)

London

Engineering – Infrastructure /

Full-time / On-site

Role overview

[Employer hidden — view at passion-project.co.uk] is entering its next phase of global scale, and the Platform team plays a vital role in making that growth fast, safe, and sustainable.

As a Senior Platform Engineer, you will take end-to-end ownership of how our platform and services are built, deployed, and operated. Your focus is clear: remove friction for engineers, reduce operational risk, and ensure the platform scales alongside the business.

Your mission is to ensure engineers always have the right tools, patterns, and infrastructure so they can deliver faster, safer, and with confidence.

We care deeply about customer impact, even when the work is internal — platform improvements ultimately exist to help [Employer hidden] deliver better experiences to customers.

To foster a collaborative environment that thrives on face-to-face interactions and teamwork, all [Employer hidden] employees work from our dog-friendly office 4 days per week, with the flexibility to work remotely on Wednesday each week. London office address: The Bower, The Tower,  207 Old St, London EC1V 9NR

Our Tech Stack:

  • Docker
  • AWS EKS
  • Kafka / AutoMQ for asynchronous messaging
  • Elixir & Ruby for core services
  • gRPC for inter-service communication
  • GraphQL for API ingress
  • Next.js / TypeScript for frontend
  • PostgreSQL (RDS) for persistent storage
  • GitHub Actions (primary CI), with Jenkins & Argo Workflows for legacy pipelines
  • Redis for key/value storage
  • Terraform for infrastructure as code
  • Python and GoLang for our Platform tools
  • Datadog, Sentry for observability and incident response

What you will be doing:

Platform Architecture & Scale

  • Define and evolve infrastructure architecture to support multi-region deployments
  • Design systems for resilience, scalability, and operational simplicity
  • Explore and apply AI-assisted approaches to reduce operational toil, improve reliability, and support platform decision-making where it creates clear value

Observability & Operations

  • Extend monitoring and observability capabilities across services
  • Make operational data easy to access, understand, and act on
  • Build tooling for safe deployments, fast rollbacks, and reduced operational toil using software methodologies and approaches
  • Experiment with AI-assisted detection, triage, or automation to improve signal quality and reduce manual effort

Ownership & Leadership

  • Scope, lead, and deliver platform initiatives autonomously
  • Drive cross-team projects with clear outcomes and accountability

Enablement & Knowledge Sharing

  • Raise the bar through documentation, runbooks, and internal knowledge sharing
  • Help teams learn, align, and operate more effectively together

We’d love to hear from you if you:

  • Have 5+ years of experience building, operating, and troubleshooting production-grade systems, with a strong focus on reliability, observability, and scalability
  • Bring hands-on experience with AI/ML or LLM-based tools in a platform or operational context (e.g. automation, observability, developer experience, or incident response)
  • Demonstrate strong programming and automation skills in Python, Go, or Bash, and use Infrastructure as Code (Terraform) to build repeatable, low-risk systems
  • Have experience with AWS (EKS, RDS, S3, CloudFront), or equivalent platforms on GCP or Azure
  • Are comfortable working close to the system with solid Linux and networking fundamentals (TCP/IP, DNS, firewalls, load balancing, VPNs)
  • Have practical experience designing or improving observability (metrics, logs, traces) using tools such as Datadog, Grafana, ELK, Sentry, and OpsGenie
  • Take ownership of complex technical problems, form clear opinions backed by data, and drive solutions through implementation and collaboration
  • Work effectively across teams, communicate technical concepts clearly, and apply systems thinking to make platforms easier to use, safer to operate, and simpler to scale
  • Have experience with Claude code

Interview Process

  • Screen Call - Video call with a member from the Talent Team - 45-60 minutes
  • 1st Stage - Technical Video call/In-person interview with our Engineers - 90 minutes
  • 2nd Stage - Technical Video call/In-person interview with Engineers - Up to 2 hours
  • Final Stage - Video call/In-person meet with CTO - 60 minutes

*We aim to finalise the entire interview process and deliver feedback within 4 weeks.*

- Every job application received is reviewed manually by our talent team. While we strive to assess applications within 7 days, the sheer volume of talented individuals expressing interest may occasionally extend this timeframe

Inclusive workforce

At [Employer hidden], we are creating a culture where individuals of all backgrounds feel comfortable.

We want all [Employer hidden] people to feel included and truly empowered to contribute fully to our vision and goals. Everyone who applies will receive fair consideration for employment.

We do not discriminate based on race, colour, religion, sex, sexual orientation, age, marital status, gender identity, national origin, disability, or any other applicable legally protected characteristics in the location in which the candidate is applying.

If you have any accessibility requirements that would make you more comfortable during the interview process and/or once you join, please let us know so that we can support you.

Sign up free — access 45,000+ UK sponsor-licensed jobs