Site Reliability Engineering Manager
SKILLS
FULL DESCRIPTION
Site Reliability Engineering Manager
Company: [Employer hidden — view at passion-project.co.uk] Solutions
Location: Edinburgh (Remote)
Job Type: Full-time
Experience Level: Mid-Senior level
Salary: Salary not provided
About the Role
We are seeking a talented and proactive individual who can contribute to our team’s success and foster a spirit of collaboration within the company. The ideal candidate will be a hands-on expert in the full observability stack, with deep experience in leveraging metrics, logs, and distributed traces to proactively ensure optimal end-user experience and system health. Candidates must demonstrate an advanced ability to lead the diagnosis of complex, multi-service production incidents, translating deep technical analysis into actionable insights for engineering and product teams. Excellent crisis management and a proven ability to remain calm and decisive under pressure during production outage.
- Deep understanding of Observability and Application Performance Management
- Deep understanding of cloud architecture, microservices, pipelines/workflows
- Proficiency in at least one higher-order programming language
- Expertise with cloud software delivery pipelines, including GitHub
- Ability to independently and collaboratively solve problems in a dynamic, fast-paced environment
- Strong communication skills, capable of conveying technical concepts to a diverse audience
- Proven track record of communicating with senior customer and internal stakeholders
If you’d like to join our team but feel that you don’t quite meet all of the preferred skills, we’d still love to hear why you think you’d be a great addition to our team
Desirable
- Bachelor/Masters degree in Computer Science, Engineering or a related subject
- Experience with microservices architecture
- Previous working experience in an Agile environment (Scrum)
What the job involves
The Evidence and Devices Engineering team is responsible for [Employer hidden] Solutions’ body cameras, vehicle cameras and evidence management software leveraged in public safety and enterprise applications worldwide. We are a thriving and growing company in search of a Site Reliability Engineering (SRE) Manager to join our evidence management team on a remote basis. This full-time position offers the opportunity to play a significant role in the development and growth of our software engineering projects. You will lead a squad of talented SRE's responsible for the uptime, performance, and scalability of our core platform. This is a pivotal Player-Coach role that bridges high-level strategy with hands-on execution. You will be central to defining and implementing our next-generation observability strategy. Simultaneously, you will be the primary advocate for your team, dedicated to their career growth, well-being, and maintaining a positive team culture.
Reliability Architecture:
Lead the design and implementation of our next-generation observability strategy. Analysis of the end user experience, system performance, availability, uptime
Observability:
Drive the strategy for monitoring, logging, and tracing
Code & Automation:
Actively contribute to the codebase. If a task is done manually more than twice, you will lead the initiative to automate it
Incident Command:
Serve as a primary incident commander for high-severity outages. Lead post-mortems (RCAs) to ensure we learn from every failure
How to Apply
Apply