Senior Platform Engineer (Infrastructure)

🔒 Confidential Employer

Posted 21 March 2026

LOCATION

London

TYPE

Full-time

LEVEL

Mid-Senior level

SKILLS

AWS EKS Terraform Python Go Linux Observability Scalability Automation

FULL DESCRIPTION

Senior Platform Engineer (Infrastructure)

London

Engineering – Infrastructure /

Full-time / On-site

Role overview

[Employer hidden — view at passion-project.co.uk] is entering its next phase of global scale, and the Platform team plays a vital role in making that growth fast, safe, and sustainable.

As a Senior Platform Engineer, you will take end-to-end ownership of how our platform and services are built, deployed, and operated. Your focus is clear: remove friction for engineers, reduce operational risk, and ensure the platform scales alongside the business.

Your mission is to ensure engineers always have the right tools, patterns, and infrastructure so they can deliver faster, safer, and with confidence.

We care deeply about customer impact, even when the work is internal — platform improvements ultimately exist to help [Employer hidden] deliver better experiences to customers.

To foster a collaborative environment that thrives on face-to-face interactions and teamwork, all [Employer hidden] employees work from our dog-friendly office 4 days per week, with the flexibility to work remotely on Wednesday each week. London office address: The Bower, The Tower, 207 Old St, London EC1V 9NR

Our Tech Stack:

Docker
AWS EKS
Kafka / AutoMQ for asynchronous messaging
Elixir & Ruby for core services
gRPC for inter-service communication
GraphQL for API ingress
Next.js / TypeScript for frontend
PostgreSQL (RDS) for persistent storage
GitHub Actions (primary CI), with Jenkins & Argo Workflows for legacy pipelines
Redis for key/value storage
Terraform for infrastructure as code
Python and GoLang for our Platform tools
Datadog, Sentry for observability and incident response

What you will be doing:

Platform Architecture & Scale

Define and evolve infrastructure architecture to support multi-region deployments
Design systems for resilience, scalability, and operational simplicity
Explore and apply AI-assisted approaches to reduce operational toil, improve reliability, and support platform decision-making where it creates clear value

Observability & Operations

Extend monitoring and observability capabilities across services
Make operational data easy to access, understand, and act on
Build tooling for safe deployments, fast rollbacks, and reduced operational toil using software methodologies and approaches
Experiment with AI-assisted detection, triage, or automation to improve signal quality and reduce manual effort

Ownership & Leadership

Scope, lead, and deliver platform initiatives autonomously
Drive cross-team projects with clear outcomes and accountability

Enablement & Knowledge Sharing

Raise the bar through documentation, runbooks, and internal knowledge sharing
Help teams learn, align, and operate more effectively together

We’d love to hear from you if you:

Have 5+ years of experience building, operating, and troubleshooting production-grade systems, with a strong focus on reliability, observability, and scalability
Bring hands-on experience with AI/ML or LLM-based tools in a platform or operational context (e.g. automation, observability, developer experience, or incident response)
Demonstrate strong programming and automation skills in Python, Go, or Bash, and use Infrastructure as Code (Terraform) to build repeatable, low-risk systems
Have experience with AWS (EKS, RDS, S3, CloudFront), or equivalent platforms on GCP or Azure
Are comfortable working close to the system with solid Linux and networking fundamentals (TCP/IP, DNS, firewalls, load balancing, VPNs)
Have practical experience designing or improving observability (metrics, logs, traces) using tools such as Datadog, Grafana, ELK, Sentry, and OpsGenie
Take ownership of complex technical problems, form clear opinions backed by data, and drive solutions through implementation and collaboration
Work effectively across teams, communicate technical concepts clearly, and apply systems thinking to make platforms easier to use, safer to operate, and simpler to scale
Have experience with Claude code

Interview Process

Screen Call - Video call with a member from the Talent Team - 45-60 minutes
1st Stage - Technical Video call/In-person interview with our Engineers - 90 minutes
2nd Stage - Technical Video call/In-person interview with Engineers - Up to 2 hours
Final Stage - Video call/In-person meet with CTO - 60 minutes

*We aim to finalise the entire interview process and deliver feedback within 4 weeks.*

- Every job application received is reviewed manually by our talent team. While we strive to assess applications within 7 days, the sheer volume of talented individuals expressing interest may occasionally extend this timeframe

Inclusive workforce

At [Employer hidden], we are creating a culture where individuals of all backgrounds feel comfortable.

We want all [Employer hidden] people to feel included and truly empowered to contribute fully to our vision and goals. Everyone who applies will receive fair consideration for employment.

We do not discriminate based on race, colour, religion, sex, sexual orientation, age, marital status, gender identity, national origin, disability, or any other applicable legally protected characteristics in the location in which the candidate is applying.

If you have any accessibility requirements that would make you more comfortable during the interview process and/or once you join, please let us know so that we can support you.