Test Environment Manager
SKILLS
FULL DESCRIPTION
Job Information
- Date Opened 04/12/2025 - Job Type Permanent - Industry IT Services - Work Experience 5+ years - City London - Province City of London - Country United Kingdom - Postal Code EC1A
About Us
We provide end-to-end IT solutions and services including Applications services, Data & Analytics services, AI/ML Technologies and Professional services in the UK and EU market.
Job Description
Experience Required: 15+ Years
Role Overview
The Test Environment Manager (TEM) is a senior, engineering-focused role responsible for transforming and managing the entire non-production environment landscape. This role ensures that test environments are reliable, scalable, automated, observable, and aligned with modern SDLC and DevOps practices. The TEM drives technical excellence, SRE-inspired culture, and continuous improvement across development, QA, and operations teams.
Key Operational ResponsibilitiesEnvironment Automation & Lifecycle Management
- Design and implement Infrastructure as Code (IaC) to fully automate provisioning, configuration, and teardown of test environments. - Integrate environment automation seamlessly into CI/CD pipelines to enable on-demand, self-service environment delivery.
Reliability & Observability
- Define and maintain Service Level Objectives (SLOs) and key Service Level Indicators (SLIs), such as environment availability, provisioning time, and stability metrics. - Monitor environment health using observability tools (Prometheus, Grafana, Splunk, etc.) and proactively identify and resolve performance issues or bottlenecks.
Incident & Problem Management
- Lead incident response for environment-related issues, driving quick resolution and facilitating blameless post-mortems. - Implement permanent fixes based on root cause analysis and reduce repeat incidents.
Automation & Toil Reduction
- Identify repetitive, manual environment tasks and eliminate them through automation, improving engineering efficiency and reducing operational burden.
Strategic & Cultural ResponsibilitiesContinuous Improvement
- Analyze environment performance data, incident trends, and post-mortem outcomes to drive ongoing enhancements and innovation.
Reliability Management
- Apply an “error budget” framework to balance velocity and reliability across teams. - Shift priorities between stability improvements and feature delivery based on reliability KPIs.
Culture & Collaboration
- Promote a culture of shared ownership, blameless problem-solving, and strong cross-team collaboration among development, QA, DevOps, and SRE teams.
Capacity Planning & Scalability
- Forecast environment capacity requirements based on usage trends, test cycles, and upcoming projects. - Ensure infrastructure elasticity and scalability to meet future demands.
Test Data Management Integration
- Partner with Test Data Management teams to ensure test data is consistent, compliant, refreshed automatically, and aligned with environment provisioning needs.
Technical Skills & Experience
- Monitoring & Observability: Expertise with Prometheus, Grafana, Splunk, ELK/EFK, or similar platforms. - CI/CD & Automation Tools: Strong experience with Jenkins, GitLab CI, GitHub Actions, and configuration management tools (Terraform, Ansible, etc.). - Cloud & Container Platforms: Deep understanding of cloud infrastructure (AWS preferred), Kubernetes, Docker, and serverless technologies. - Scripting & Programming: Proficiency in Python, Bash, or similar scripting languages for automation and environment tooling. - Systems & Networking: Strong knowledge of Linux systems, networking concepts, DNS, load balancing, and database operations.
Soft Skills & Leadership Qualities
- Leadership & Influence: Ability to drive SRE and environment best practices across multiple teams and technical domains. - Analytical Problem-Solving: Strong debugging, troubleshooting, and decision-making skills under time-sensitive conditions. - Communication Excellence: Clear and effective communication with technical and non-technical stakeholders. - Adaptability & Proactiveness: Ability to stay ahead of evolving technologies, tools, and environment architectures.
Summary
This role is ideal for a seasoned engineering leader who brings strong technical depth, SRE mindset, automation-first thinking, and the ability to shape and modernize complex test environment landscapes.