We are at a pivotal point in our tech progression. We're looking to grow the technical estate, replace existing systems with new and scale and develop the platform. It's an exciting time to join the team in this captivating period.
As a Lead Infrastructure Engineer you will be part of the Partnerize Technical Operations team, which works with the business, development, and IT functions. You will be working within a team of Infrastructure Engineers responsible for designing, building and implementing solutions for the platform and providing the required training to support the solution. You will also be responsible for ensuring all issues/problems are addressed in a timely manner by the team.
You should have a keen interest in problem-solving accompanied with experience in Linux systems, an understanding of distributed highly available systems, advanced database management knowledge (MySQL, Postgres) in replicated environments, experience of taking code to production including building the environment. The candidate should also have experience in Debian packaging, as well as experience with data streaming and queuing technologies (Kafka, RabbitMQ).
We’re looking for a Lead Infrastructure Engineer with a desire to learn. For us, it's more about the person and the character than the role. We need people that will get excited about taking us to the next stage of our evolution helping us with their specific skills and experience but learning new ones along the way. You will need to be based within communicable distance of the Newcastle Office.
As a Lead Infrastructure Engineer at Partnerize, You Will:
Deliver coaching sessions to the team/individuals.
Scope the work coming into the Infrastructure Solutions team and delegate to the team members appropriately.
Provide primary operational support and engineering for multiple large, distributed software applications.
Produce detailed documentation on the system design with visual representation to support.
Measure and optimise new implemented systems performance, with an eye toward pushing our capabilities forward.
Build software and systems to manage platform infrastructure and applications.
Improve reliability, quality, and time-to-market of our suite of software solutions.
Partner with development teams to improve services through rigorous testing and release procedures.
Participate in system design consulting, platform management, and capacity planning.
Work closely with development and tech teams ensuring technical issues and projects are correctly managed.
Deliver large technical projects.
Act as an escalation for other support teams within TechOps.
Be responsible for continuous improvement, continuous delivery and continuous integration.
Plan, prioritise, and estimate tasks using Jira.
Mentor and guide junior team members, fostering a culture of continuous learning.
Participate in the On-Call Rotation.
Essential Knowledge, Skills and Experience
Ideally 3-5 years of experience in a Site Reliability Engineer (SRE) or DevOps Engineer role.
Proficiency with containerization and orchestration tools (e.g., Kubernetes, Nomad, Helm, Docker).
Expertise in deploying and managing services, with a solid grasp of best practices.
Strong Python scripting skills and experience with configuration management tools (Ansible, YAML, Helm Charts, Terraform).
Advanced experience with installing, configuring, and monitoring Linux systems in physical data centres.
Understanding of modern event-driven microservice architectures using systems like Kafka.
Adapts to changing work environments, work priorities and organisational needs.
Enjoys working in a fast-paced environment.
Demonstrated ability to mentor and coach team members.
Ability to prioritise workload including taking the lead on incidents and problem management.
Experience in assessing work coming into the team and knowing when to delegate appropriately.
Represent TechOps to the wider business and liaise with other departments.
Advanced experience of managing and maintaining databases (MySQL, PostgreSQL, Redis).
Proficient with CI/CD pipelines, version control systems and configuration management (Ansible, Docker, Git, version control etc.).
Experience in designing, building and maintaining on-prem private cloud solutions and external cloud providers, with a track record of maintaining a highly available network/system.
Ability to troubleshoot, diagnose and solve issues independently, able to spot and diagnose problems, identifying areas for improvement, and performance bottlenecks.
Proven ability to create and maintain cohesive documentation as learning and experience is gained.
Experience as part of a team maintaining a live production infrastructure, demonstrating actively contributing to achieving team results.
Ability to prioritise workload including occasional incidents and RCA’s.
Working experience with the ITIL practices.
Desirable Knowledge, Skills and Experience
Openness to learn the required technologies.
An interest in development, new technologies and innovation.
Supporting development teams in the refactoring of technical debt.
Experience with monitoring systems (Zabbix, Prometheus).
Experience of JIRA and Confluence.
Load balancers (HAProxy).
Nginx or web server technologies.
Gluster or storage technologies.
Elasticsearch technologies - especially Kibana.
Experience with Apache Kafka and Druid.
At Partnerize, we recognize it is unrealistic for a candidate to fulfil 100% of the criteria in this job description. We encourage you to apply if you feel you meet the majority of the requirements above. We know that skills evolve over time, so if you have a keen appetite to learn and evolve alongside us over time, come join our team!
Core Skills:
Kubernetes, Docker, Helm
Other Skills:
Python, Ansible, Kafka
Seniority:
Lead
#J-18808-Ljbffr