Site Reliability Engineer

·
Full time
Location: Leeds
·
Job offered by: Prometheus Group
·
Category:
Prometheus Group is a leading global provider of comprehensive and intuitive enterprise asset management (EAM) software solutions that work within ERP systems and span the full work management life cycle for maintenance and operations. Our straightforward functionality, graphical visualization, and simple processes enable customers to increase productivity, ensure safety, reduce costs, and improve reporting. Prometheus Group has excellent books of business opportunities to advance and excel in your career, as we work with the largest companies in the world. Job Summary The site reliability engineer is responsible for ensuring the availability and performance of the Prometheus Group hosted customer sites. Additionally, the site reliability engineer is responsible for managing all the underlying infrastructure including Kubernetes cluster upgrades, the decommissioning of the infrastructure, incident management, and root cause analysis and remediation. Responsibilities Work as a part of a response team to resolve reported issues. Pro-actively identify problems and/or gaps in the deployed applications and infrastructure and develop disruption preventive measures. Continue to develop and deliver tools to continuously enhance monitoring capabilities. Assist in the roll-out and deployment of new product features and installations to facilitate our rapid iteration and constant growth. Identify ways to resolve common issues by developing and deploying automation to respond to common human interactions. Work closely with development and DevOps teams to ensure that platforms are designed with 'operability' and 'observability' in mind. Function well in a fast-paced, rapidly changing environment. Required Qualifications Bachelor’s in computer science, information technology, software engineering, or a related field. 3+ years of working experience as a software developer, AWS cloud engineer, or AWS infrastructure engineer. 3+ years of hands-on experience with managing Kubernetes clusters and Docker containers. 3+ years of hands-on experience managing and troubleshooting Linux servers. 2-3 years of automation experience in Terraform, Python, or Ansible. 2+ years of MS SQL and PostgreSQL database instance management and troubleshooting experience. Strong critical thinking skills. Strong troubleshooting experience involving Kubernetes clusters, Docker containers, and Linux. Demonstrable experience working with remote monitoring and logging tools, including but not limited to Dynatrace, Grafana, and Pingdom. Preferred Qualifications Strong interpersonal communication skills (including listening, speaking, and writing) and ability to work well in a diverse, team-focused environment with other SREs, developers, IT operations, and engineers. Ability to work well in high-pressure situations. Knowledge of data structures, relational and non-relational databases, networking, Linux internals, filesystems, web architecture, and related topics. Kubernetes Certified Administrator or related certification is a plus. Benefits Overview Gym Kickback Incentive (Up to £25 per month)

#J-18808-Ljbffr

Recent Jobs

London (On site) · Full time

Are you a smart, driven professional who takes pride in making a difference in local communities? Turner & Townsend’s Real Estate division is experiencing significant growth and we’re looking for an experienced industry professional with health project experience to join our high-performing and collaborative Project Management team. Why Join Us? Impactful Work: Contribute to social [...]Read More... from Assistant Project Manager – Healthcare See details

Chasetown (On site) · Full time

My client, Autosmart International are a manufacturing success story! Site Operations Manager – leading fast-paced manufacturing and warehousing About Our Client Autosmart International is a manufacturing success story, leading the field in vehicle cleaning products. We are the No.1 choice of automotive trade customers across the UK. We have doubled in size in the last [...]Read More... from Site Operations Manager See details

London (On site) · Full time

CSS are looking for an experienced duty officer to join our client’s team who are a local council responsible for all areas within the Tendering district. Working hours: All shifts are 8 hours long with various start times available: Monday to Friday – start times between 6AM – 3PM Saturday & Sunday – 6AM – [...]Read More... from Duty Officer See details