Site Reliability Engineer - Private Cloud Compute

🔒 Confidential Employer

Posted 6 May 2026

LOCATION

London

TYPE

Full-time

LEVEL

Mid-Senior level

SKILLS

Kubernetes Python Prometheus Docker Linux Distributed Systems Automation Troubleshooting

FULL DESCRIPTION

Site Reliability Engineer - Private Cloud Compute

[Employer hidden — sign up to reveal] is hiring a Site Reliability Engineer for Private Cloud Compute in London, England. Full-time, on-site. Posted: 29 Apr 2026. Role Number: 200660427-2114.

Description

We're looking for a hardworking and passionate SRE Engineer to join this amazing team. You will be an accomplished builder and problem-solver, eager to tackle challenging technical problems. You have a deep understanding of SRE principles and the expertise required to operate services at [Employer hidden — sign up to reveal] scale with a high degree of operational excellence. This role will allow you to directly contribute to shaping the future of how we build and run our services on a global scale. You will possess strong technical skills to dive deep into complex systems while also understanding and contributing to higher-level business and product goals. We seek high-quality engineers with a diverse set of experiences and skill sets. Our customers count on us to provide extraordinary availability, scalability, and security for services. If you'd like to positively influence millions of customers' experience of [Employer hidden — sign up to reveal] through your technical contributions, this is the job for you.

Responsibilities

Deploy, support and monitor new and existing services, platforms, and application stacks.
Use scale testing to measure, tune and optimization system performance.
Enhance, architect, author, and deliver software to improve the availability, scalability and security of [Employer hidden — sign up to reveal]'s internet services.
Build and run systems, infrastructure and applications through automation.
Participate in periodic on-call duties.

Minimum Qualifications

Strong sense of ownership, customer service, and integrity proven through clear communication.
BS in Computer Science or related field, or equivalent employment.
4+ years experience in managing and scaling distributed systems in a public, private, or hybrid cloud environment.
Strong experience with deploying, supporting and supervising new and existing services, platforms, and application stacks.
Excellent troubleshooting and problem solving skills.
Experience with scale testing, disaster recovery, and capacity planning.
Passion for eliminating repetitive manual processes using automation to improve them through repeated iteration.
Confirmed ability to write programs using a high-level programming language like: Java, Go, Python, or Perl.
Proclivity towards efficient programming emphasizing improvement via complexity analysis.
Experience with Kubernetes, Nginx, Envoy, Prometheus, and/or Docker.

Preferred Qualifications

Understanding of standard networking protocols and components such as: HTTP, DNS, ECMP, TCP/IP, ICMP, the OSI Model, Subnetting and Load Balancing strategies.
Understanding of the Linux Operating System, including Kernel, Memory, Process, Threads, Static / Shared Libraries, IPC, Signals.
Experience handling large numbers of diverse systems with configuration management systems like: Puppet, Chef, Ansible, or Salt.

At [Employer hidden — sign up to reveal], we're not all the same. And that's our greatest strength. We draw on the differences in who we are, what we've experienced and how we think. Because to create products that serve everyone, we believe in including everyone. Therefore, we are committed to treating all applicants fairly and equally. As a registered Disability Confident employer, we will work with applicants to make any reasonable accommodations. [Employer hidden — sign up to reveal] will consider for employment all qualified applicants with criminal backgrounds in a manner consistent with applicable law. Learn more.