Test Environment Manager

🔒 Confidential Employer
Posted 22 March 2026
LOCATION
London
TYPE
Full-time
LEVEL
Mid-Senior level
CATEGORY
IT Services
This employer holds a UK Home Office sponsor license — sponsorship for this specific role is at the employer’s discretion

SKILLS

Prometheus Grafana Splunk Jenkins Terraform Ansible

FULL DESCRIPTION

Job Information

- Date Opened 04/12/2025 - Job Type Permanent - Industry IT Services - Work Experience 5+ years - City London - Province City of London - Country United Kingdom - Postal Code EC1A

About Us

We provide end-to-end IT solutions and services including Applications services, Data & Analytics services, AI/ML Technologies and Professional services in the UK and EU market.

Job Description

Experience Required: 15+ Years

Role Overview

The Test Environment Manager (TEM) is a senior, engineering-focused role responsible for transforming and managing the entire non-production environment landscape. This role ensures that test environments are reliable, scalable, automated, observable, and aligned with modern SDLC and DevOps practices. The TEM drives technical excellence, SRE-inspired culture, and continuous improvement across development, QA, and operations teams.

Key Operational ResponsibilitiesEnvironment Automation & Lifecycle Management

- Design and implement Infrastructure as Code (IaC) to fully automate provisioning, configuration, and teardown of test environments. - Integrate environment automation seamlessly into CI/CD pipelines to enable on-demand, self-service environment delivery.

Reliability & Observability

- Define and maintain Service Level Objectives (SLOs) and key Service Level Indicators (SLIs), such as environment availability, provisioning time, and stability metrics. - Monitor environment health using observability tools (Prometheus, Grafana, Splunk, etc.) and proactively identify and resolve performance issues or bottlenecks.

Incident & Problem Management

- Lead incident response for environment-related issues, driving quick resolution and facilitating blameless post-mortems. - Implement permanent fixes based on root cause analysis and reduce repeat incidents.

Automation & Toil Reduction

- Identify repetitive, manual environment tasks and eliminate them through automation, improving engineering efficiency and reducing operational burden.

Strategic & Cultural ResponsibilitiesContinuous Improvement

- Analyze environment performance data, incident trends, and post-mortem outcomes to drive ongoing enhancements and innovation.

Reliability Management

- Apply an “error budget” framework to balance velocity and reliability across teams. - Shift priorities between stability improvements and feature delivery based on reliability KPIs.

Culture & Collaboration

- Promote a culture of shared ownership, blameless problem-solving, and strong cross-team collaboration among development, QA, DevOps, and SRE teams.

Capacity Planning & Scalability

- Forecast environment capacity requirements based on usage trends, test cycles, and upcoming projects. - Ensure infrastructure elasticity and scalability to meet future demands.

Test Data Management Integration

- Partner with Test Data Management teams to ensure test data is consistent, compliant, refreshed automatically, and aligned with environment provisioning needs.

Technical Skills & Experience

- Monitoring & Observability: Expertise with Prometheus, Grafana, Splunk, ELK/EFK, or similar platforms. - CI/CD & Automation Tools: Strong experience with Jenkins, GitLab CI, GitHub Actions, and configuration management tools (Terraform, Ansible, etc.). - Cloud & Container Platforms: Deep understanding of cloud infrastructure (AWS preferred), Kubernetes, Docker, and serverless technologies. - Scripting & Programming: Proficiency in Python, Bash, or similar scripting languages for automation and environment tooling. - Systems & Networking: Strong knowledge of Linux systems, networking concepts, DNS, load balancing, and database operations.

Soft Skills & Leadership Qualities

- Leadership & Influence: Ability to drive SRE and environment best practices across multiple teams and technical domains. - Analytical Problem-Solving: Strong debugging, troubleshooting, and decision-making skills under time-sensitive conditions. - Communication Excellence: Clear and effective communication with technical and non-technical stakeholders. - Adaptability & Proactiveness: Ability to stay ahead of evolving technologies, tools, and environment architectures.

Summary

This role is ideal for a seasoned engineering leader who brings strong technical depth, SRE mindset, automation-first thinking, and the ability to shape and modernize complex test environment landscapes.

Sign up free — access 45,000+ UK sponsor-licensed jobs