Job Description
We are looking for an expert Automation Tools Engineer to join our diverse team of cloud and infrastructure automation engineers. In this role, you will design, implement, and maintain tools and services for automating the deployment and configuration of engineering infrastructure and platforms. You’ll work with software and hardware engineering, and IT teams to build and maintain robust systems that support our technology initiatives and solutions!
Responsibilities:
Design, implement, and run automation tools such as Gerrit, Cloudbees, Hashicorp Vault, GitLab, Jenkins, Ansible and Terraform Enterprise platforms used for automating the provisioning and configuration of engineering services.
Build and manage monitoring tools and platforms such as Prometheus, Grafana, Azure Monitoring, AWS CloudWatch, Dynatrace/Datadog and similar tools that forms our AIOps stack.
Develop and maintain automation scripts (Python, Bash, Shell, etc.) and tools (GitLab, Hashicorp Terraform, Hashicorp Vault, etc.) to streamline & improve infrastructure deployment, monitoring, and management processes, using Infrastructure as Code (IaC).
Analyse system performance and implement improvements to improve cost efficiency and user experience.
Participate in on-call rotations to ensure 24/7 system availability.
Maintain detailed documentation (HLDs and LLDs) of infrastructure, processes, and procedures to facilitate learning and operational continuity.
Adopt a continuous learning mentality to stay updated with industry trends and new technologies to improve operational performance.
Required Skills and Experience:
Experience in deploying, maintaining, and integrating automation tools such as Gerrit, GitLab, Jenkins/Cloudbees, Hashicorp Vault, Ansible, and Terraform Enterprise.
Experience working with public cloud platforms (AWS, Azure, or GCP), containerisation technologies (Docker, Kubernetes, Rancher, Fleet, and Cloudbees, etc.), and monitoring solutions (Prometheus, Grafana, OpenTelemetry, etc.).
Proficiency in monitoring tools and platforms such as Prometheus, Grafana, AWS CloudWatch, Azure Monitor, Datadog, Dynatrace, etc.
Skilled in Linux/Windows OS administration and scripting/programming (Bash, Python, and Go).
Excellent analytical and problem-solving abilities with a proactive approach to identifying and resolving issues.
Nice to Have Skills:
Familiarity of experience working in a HW or SW engineering organization.
Experience in running a large distributed systems environment in the cloud and on-premises data centers.
Familiarity with ITIL practices and incident management frameworks.
#J-18808-Ljbffr