Description and Requirements
This role is open for the Edinburgh, Scotland location only. Candidates must be based there, as the position requires working from the office at least three days per week (3:2 hybrid policy).
The Lenovo AI Technology Center (LATC)—Lenovo’s global AI Center of Excellence—is driving our transformation into an AI-first organization. We are assembling a world-class team of researchers, engineers, and innovators to position Lenovo and its customers at the forefront of the generational shift toward AI. Lenovo is one of the world’s leading computing companies, delivering products across the entire technology spectrum, spanning wearables, smartphones (Motorola), laptops (ThinkPad, Yoga), PCs, workstations, servers, and services/solutions. This unmatched breadth gives us a unique canvas for AI innovation, including the ability to rapidly deploy cutting-edge foundation models and to enable flexible, hybrid-cloud, and agentic computing across our full product portfolio. To this end, we are building the next wave of AI core technologies and platforms that leverage and evolve with the fast-moving AI ecosystem, including novel model and agentic orchestration & collaboration across mobile, edge, and cloud resources. This space is evolving fast and so are we. If you’re ready to shape AI at a truly global scale, with products that touch every corner of life and work, there’s no better time to join us.
SummaryLenovo is seeking a highly skilled AI Infrastructure Engineer / AI Operations Engineer to join our growing team. This critical role will focus on designing, building, and maintaining the infrastructure and tools necessary for efficient AI model development, deployment, and operation. Your expertise will enable our data scientists and engineers to focus on high-priority tasks while ensuring seamless operation of AI models in production. If you are passionate about making Smarter Technology For All, come help us realize our Hybrid AI vision!
Responsibilities:
AI Platform Engineering & Operations
Design, deploy, and maintain scalable Kubernetes/OpenShift based AI and ML platforms, supporting diverse AI/ML and cloud native workloads. Implement and manage GitOps-driven platform configuration using ArgoCD and Helm. Proficient in Linux system administration, including package management, user/group management, file system navigation, shell scripting Bash), and system configuration systemd, networking). MLOps & Model Lifecycle ManagementBuild and automate ML pipelines using KubeFlow Pipelines, Tekton, and Python SDKs. Support deployment and serving of AI/ML models using KServe, Knative, and NVIDIA Triton (where applicable). Integrate model registry, workflow automation, and end to end ML lifecycle tooling.Automation, Observability & ReliabilityDevelop automation using Python, Ansible, Terraform, and CI/CD pipelines.Implement monitoring and alerting with Prometheus, Grafana, and AlertManager for AI workloads and platform health. Optimise the AI platform for performance, reliability, and scalability.Cloud & Infrastructure IntegrationDeploy and operate hybrid/multi cloud Kubernetes environments across AWS, GCP, and on prem infrastructure. Implement identity, RBAC, and enterprise security integrations Azure AD, LDAP). Collaboration & Customer SuccessWork across AI engineering, DevOps, data science, and platform teams to ensure smooth operation and feature delivery.Provide technical guidance to stakeholders and support customer deployments in production environments.Required Qualifications:Bachelor’s degree in Computer Science, Engineering, or related field.8+ years of DevOps / Cloud Native engineering experience, with major focus on Kubernetes and containerised workloads. Deep expertise with Kubernetes / OpenShift administration, including cluster configuration, operators, networking, and security. Strong experience with GitOps, including ArgoCD and Helm. Hands on experience with MLOps tooling, for example KServe, Kubeflow, Tekton, Knative, and ML pipeline automation. Proficiency in Python, Bash scripting, and automation frameworks (Ansible, Terraform).Experience with cloud platforms including AWS/GCP/Azure. Strong observability experience with Prometheus & Grafana or similar.Excellent problem solving, communication, and stakeholder engagement skillsBonus PointsExperience with Red Hat OpenShift AI ecosystem. Knowledge of model serving patterns Triton ensemble models, OCI artifact based LLM serving). Certifications such as CKA, CKS, GCP ACE, AWS SAA, or Red Hat OpenShift specialsations.Experience with data engineering or AI/ML workflow orchestration.Deployment and management of scaled CI/CD monorepo patterns.Deployment and management of click to deploy Internal Developer Portals such as BackstageWhat we offer:
Opportunities for career advancement and personal developmentAccess to a diverse range of training programsPerformance-based rewards that celebrate your achievementsFlexibility with a hybrid work model (3:2) that blends home and office lifeElectric car salary sacrifice scheme Life insurance #LATC