Site Reliability Engineer
SKILLS
FULL DESCRIPTION
Title: Site Reliability Engineer
Company: [Employer hidden — sign up to reveal]
Location: EMEA (Remote)
Type: Full-time
Experience: Mid-Senior level
Who is [Employer hidden — sign up to reveal]
[Employer hidden — sign up to reveal] is a pan-African fintech enabling local payment channels used by the mass market to both local and international merchants. We operate in a highly regulated, partner-driven environment and are in scale-up mode, with ambitious growth plans across multiple African markets. Through our payments API we already facilitate over 4 million transactions a day across 20 countries in Sub-Saharan Africa. At [Employer hidden — sign up to reveal], there is an entrepreneurial spirit coupled with a modern and professional working culture. We work as a remote team and have team members in Europe, Africa and Asia.
What is the role?
As a Site Reliability Engineer at [Employer hidden — sign up to reveal], you will own the reliability and scalability of a high-throughput payments platform operating across fragmented and often unreliable external systems. This is not a traditional DevOps role. You will be responsible for ensuring that payments either succeed or fail cleanly, never leaving the system in an inconsistent state, while maintaining performance and resilience at scale.
Responsibilities
- Own the reliability, availability, and performance of the production payments platform
- Participate in on-call rotations to ensure system availability
- Define, implement, and continuously improve SLOs, SLAs, and alerting
- Design systems for failure (graceful degradation, retries, idempotency, backoff strategies)
- Lead incident response end-to-end, including postmortems and preventative improvements
- Improve system observability across metrics, logging, and distributed tracing
- Build and maintain scalable infrastructure using infrastructure as code
- Automate operational workflows to reduce manual intervention and increase system resilience
- Collaborate closely with engineering and product teams to ensure reliability is built into system design
Requirements
- 5+ years of experience in Site Reliability Engineering, DevOps, or infrastructure engineering roles
- Strong experience operating production systems with real uptime and reliability requirements
- Experience with distributed systems and understanding of failure modes at scale
- Deep knowledge of AWS (e.g. EKS, networking, IAM, scaling patterns, observability)
- Strong production grade experience with Kubernetes and Helm
- Experience with Terraform or similar infrastructure-as-code tools
- Proficiency in at least one programming language (e.g. Go, Python, Bash)
- Experience with monitoring and observability tooling (metrics, logs)
- Strong problem-solving skills and a proactive, ownership-driven mindset
- Excellent written and verbal communication skills in English
Nice to have: Experience in payments, fintech, or other high-availability transaction systems.
What success looks like
- Systems remain stable under sudden traffic spikes and partial infrastructure failures
- Incidents are resolved quickly with clear root cause and follow-up improvements
- Strong observability provides clear insight into system behaviour
- SLOs are well-defined, measurable, and consistently met
- Payments either succeed or fail cleanly — never leaving inconsistent states
Why [Employer hidden — sign up to reveal]?
- Help improve financial access in Africa
- Being part of an amazing team that shapes company’s culture as a great place to be
- An ambitious, talented, and diverse team who always has your back
- We grow fast, and you will grow fast with us
- Competitive remuneration
- 35 days of paid leave per year (inclusive of public holidays) and more.