ML Infrastructure Engineer

·
Full time
Location: London
·
Job offered by: Millennium
·
Category: IT & Technology
ML Infrastructure Engineer This role is a member of the AI/ML Infrastructure Engineering team and will be dedicated to implementing and supporting AI/ML infrastructure solutions in cloud and on-premise environments. The role will work directly with infrastructure teams and potentially face off with data scientists, machine learning engineers, application developers, and quantitative analysts by functioning as both a solutions architect and a professional services engineer. This is a hands-on developer role, and candidates ideally have had experience deploying and supporting their own production-ready AI/ML models in cloud environments as well as automating the build and management of a broad range of cloud infrastructure using tools like Terraform. Candidates should be familiar with developing unit and functional tests, have experience designing and implementing CI/CD tools with infrastructure as code pipelines, and have knowledge of Linux systems administration, containerization, networking, security, automated configuration and state management, cross-system orchestration, configuration management, logging, metrics, monitoring, and alerting. Principal Responsibilities: Architect, develop and maintain internal AI/ML infrastructure components, frameworks, and offerings Architect, develop and maintain AI/ML solutions for customers in cloud environments Help customers architect, develop and maintain their own AI/ML solutions in cloud environments Implement CI/CD pipelines which include application tests, security tests, and gates Implement availability, security, performance monitoring, and alerting of AI/ML solutions Automate data resiliency and replication for AI/ML models Manage multiple environments and promote code between them Automate systems configuration and orchestration using tools such as Terraform, Chef, Ansible, or Salt Automate creation of machine images and containers Required Qualifications/Skills: 6+ years of experience designing and supporting production cloud environments Experience consulting with customers to develop AI/ML solutions Experience developing collaboratively, including infrastructure as code, preferably in Python Systems engineering knowledge, including understanding of Linux, security, and networking Cloud templating tools such as Terraform Experience with AI/ML frameworks (e.g., TensorFlow, PyTorch) Experience with distributed computing tools (e.g., Ray, Dask) Experience with model serving tools (e.g., vLLM, KFServing) Experience with building, monitoring, and alerting on logs and metrics Cloud Networking including connectivity, routing, DNS, VPCs, proxies, and load balancers Cloud Security including IAM, Certificate Management, and Key Management Excellent written and verbal communication skills Excellent troubleshooting and analytical skills Self-starter able to execute independently, on a deadline, and under pressure

#J-18808-Ljbffr

Recent Jobs

London (On site) · Full time

Are you a smart, driven professional who takes pride in making a difference in local communities? Turner & Townsend’s Real Estate division is experiencing significant growth and we’re looking for an experienced industry professional with health project experience to join our high-performing and collaborative Project Management team. Why Join Us? Impactful Work: Contribute to social [...]Read More... from Assistant Project Manager – Healthcare See details

Chasetown (On site) · Full time

My client, Autosmart International are a manufacturing success story! Site Operations Manager – leading fast-paced manufacturing and warehousing About Our Client Autosmart International is a manufacturing success story, leading the field in vehicle cleaning products. We are the No.1 choice of automotive trade customers across the UK. We have doubled in size in the last [...]Read More... from Site Operations Manager See details

London (On site) · Full time

CSS are looking for an experienced duty officer to join our client’s team who are a local council responsible for all areas within the Tendering district. Working hours: All shifts are 8 hours long with various start times available: Monday to Friday – start times between 6AM – 3PM Saturday & Sunday – 6AM – [...]Read More... from Duty Officer See details