Site Reliability Engineer

🔒 Confidential Employer

Posted 8 May 2026

LOCATION

Not specified

TYPE

Full-time

LEVEL

Mid-Senior level

SKILLS

AWS (EKS, networking, IAM) Kubernetes Helm Terraform Go (or Python, Bash) Monitoring and observability tooling Distributed systems Incident response

FULL DESCRIPTION

Title: Site Reliability Engineer

Company: [Employer hidden — sign up to reveal]

Location: EMEA (Remote)

Type: Full-time

Experience: Mid-Senior level

Who is [Employer hidden — sign up to reveal]

[Employer hidden — sign up to reveal] is a pan-African fintech enabling local payment channels used by the mass market to both local and international merchants. We operate in a highly regulated, partner-driven environment and are in scale-up mode, with ambitious growth plans across multiple African markets. Through our payments API we already facilitate over 4 million transactions a day across 20 countries in Sub-Saharan Africa. At [Employer hidden — sign up to reveal], there is an entrepreneurial spirit coupled with a modern and professional working culture. We work as a remote team and have team members in Europe, Africa and Asia.

What is the role?

As a Site Reliability Engineer at [Employer hidden — sign up to reveal], you will own the reliability and scalability of a high-throughput payments platform operating across fragmented and often unreliable external systems. This is not a traditional DevOps role. You will be responsible for ensuring that payments either succeed or fail cleanly, never leaving the system in an inconsistent state, while maintaining performance and resilience at scale.

Responsibilities

Own the reliability, availability, and performance of the production payments platform
Participate in on-call rotations to ensure system availability
Define, implement, and continuously improve SLOs, SLAs, and alerting
Design systems for failure (graceful degradation, retries, idempotency, backoff strategies)
Lead incident response end-to-end, including postmortems and preventative improvements
Improve system observability across metrics, logging, and distributed tracing
Build and maintain scalable infrastructure using infrastructure as code
Automate operational workflows to reduce manual intervention and increase system resilience
Collaborate closely with engineering and product teams to ensure reliability is built into system design

Requirements

5+ years of experience in Site Reliability Engineering, DevOps, or infrastructure engineering roles
Strong experience operating production systems with real uptime and reliability requirements
Experience with distributed systems and understanding of failure modes at scale
Deep knowledge of AWS (e.g. EKS, networking, IAM, scaling patterns, observability)
Strong production grade experience with Kubernetes and Helm
Experience with Terraform or similar infrastructure-as-code tools
Proficiency in at least one programming language (e.g. Go, Python, Bash)
Experience with monitoring and observability tooling (metrics, logs)
Strong problem-solving skills and a proactive, ownership-driven mindset
Excellent written and verbal communication skills in English

Nice to have: Experience in payments, fintech, or other high-availability transaction systems.

What success looks like

Systems remain stable under sudden traffic spikes and partial infrastructure failures
Incidents are resolved quickly with clear root cause and follow-up improvements
Strong observability provides clear insight into system behaviour
SLOs are well-defined, measurable, and consistently met
Payments either succeed or fail cleanly — never leaving inconsistent states

Why [Employer hidden — sign up to reveal]?

Help improve financial access in Africa
Being part of an amazing team that shapes company’s culture as a great place to be
An ambitious, talented, and diverse team who always has your back
We grow fast, and you will grow fast with us
Competitive remuneration
35 days of paid leave per year (inclusive of public holidays) and more.