Peaple Talent has partnered with a global digital consultancy who are currently recruiting for a Site Reliability Engineer based out of their offices in central London. You will be working within a great team of engineers and will be responsible for safeguarding production environments, performing all functions of a SRE team from architecture design (or redesign), automation, providing observability tooling, defining and monitoring SLOs, production support and incident management.
Responsibilities:
- Balancing feature development velocity and reliability with well-defined SLOs.
- Running the production environment by monitoring availability and taking a holistic view of system health.
- Driving the incident management process and supporting a blameless post-mortems culture.
- Partnering with development teams to improve services via rigorous testing and release procedures.
- Participating in system design consulting, platform management, and capacity planning.
- Creating sustainable systems and services through automation and uplifts.
Qualifications:
- A degree in Computer Science or related technical field involving coding and/or systems engineering.
- Proficiency in one or more of the following: Go, Python, C, C++, Java, Perl, Ruby or shell scripting.
- Experience with algorithms, data structures and software design and/or experience with UNIX operating systems internals and/or networking.
- Excellent communication skills. You’ll be able to act as a bridge between internal teams and external stakeholders.
- Excellent problem-solving skills.
Preferred Qualifications:
- Experience with distributed systems design, maintenance, and troubleshooting.
- Hands-on experience with debugging and optimizing code, as well as automation.
- Strong interpersonal skills, drive, and ownership.
- Coding skills beyond simple scripts.