Site Reliability Technical Lead
Site Reliability Technical Lead focuses on incident management: lead root cause analysis (rca) when issues occur and contribute to optimizing the incident response process and framework.
What the role involves
- Incident Management: Lead Root Cause Analysis (RCA) when issues occur and contribute to optimizing the incident response process and framework.
- Automation: Drive automation initiatives across the team to reduce operational toil and improve system efficiency.
- Planning and Delivery: Lead technical estimation and feasibility assessments, ensuring plans are realistic and aligned with team capacity. Contribute to structured release planning.
Skills and requirements
- Platform Scale: Proven experience operating platforms serving a high volume of requests (~1000 req/sec).
Candidate fit
- automation discipline, troubleshooting, documentation, and confidence with operational reliability
Additional role context
- Our MIS and school management tools are already making a difference in over 7,000 schools and trusts.
- Cloud Systems: Extensive expertise with AWS and distributed cloud architectures.
- System Design: Expert understanding of distributed systems, microservices, and resilience patterns.
Known job details
- Pay: £80,000 - £90,000
Help us keep Jobs247 accurate, safe, and useful for job seekers.
Search for more Site Reliability Technical Lead jobs from Arbor Education in GB.