AI/ML DevOps Engineer

·
Full time
Location: London
·
Job offered by: Millennium
·
Category: IT & Technology
AI/ML DevOps Engineer Our Infrastructure AI and Data Engineering Team is responsible for providing the foundational firm-wide AI Enablement platform. We are transitioning this platform onto K8s and we are seeking an experienced DevOps Engineer to lead this effort. The ideal candidate will help drive our cloud-native infrastructure initiatives and lead the implementation of DevOps best practices across our organization. This is a unique opportunity to not only join one of the leading hedge funds in the world, but to provide leadership on the core AI Enablement platform which is used by every aspect of the business on a daily basis. Key Responsibilities: Design and implement high-availability solutions for critical AI infrastructure Partner with AI/ML teams to optimize platform performance and scalability Drive architectural decisions for the next generation of the AI platform Lead the development and maintenance of CI/CD pipelines using tools like Jenkins or GitHub Actions Architect and implement Infrastructure as Code (IaC) solutions using Terraform or similar tools Optimize container orchestration platforms (Kubernetes) and microservices architecture Improve and maintain monitoring, alerting and incident response systems (Datadog, OpsGenie) Lead incident response and participate in on-call rotation Mentor junior team members and contribute to technical documentation Collaborate with development team to improve deployment processes and system reliability Required Qualifications: 5+ years of experience in DevOps, Site Reliability Engineering, or similar roles Strong experience with cloud platforms (AWS/GCP/Azure) Expert knowledge of containerization (Docker) and orchestration (Kubernetes and Helm) Proficiency in Infrastructure as Code and configuration management tools Experience with high-performance, low-latency systems Track record of successfully delivering large-scale infrastructure projects Experience with CI/CD tools and methodologies Deep understanding of networking, security, and system architecture Excellent troubleshooting and analytical skills. Strong communication skills to collaborate with various stakeholders Preferred Qualifications: Experience in financial services or hedge fund environment Experience with Python (FastAPI) Knowledge of machine learning operations (MLOps) Experience with data processing frameworks and big data technologies Experience with MultiCloud and/or On-Prem Kubernetes Experience running CUDA-enabled accelerated workloads

#J-18808-Ljbffr

Recent Jobs

London (On site) · Full time

Are you a smart, driven professional who takes pride in making a difference in local communities? Turner & Townsend’s Real Estate division is experiencing significant growth and we’re looking for an experienced industry professional with health project experience to join our high-performing and collaborative Project Management team. Why Join Us? Impactful Work: Contribute to social [...]Read More... from Assistant Project Manager – Healthcare See details

Chasetown (On site) · Full time

My client, Autosmart International are a manufacturing success story! Site Operations Manager – leading fast-paced manufacturing and warehousing About Our Client Autosmart International is a manufacturing success story, leading the field in vehicle cleaning products. We are the No.1 choice of automotive trade customers across the UK. We have doubled in size in the last [...]Read More... from Site Operations Manager See details

London (On site) · Full time

CSS are looking for an experienced duty officer to join our client’s team who are a local council responsible for all areas within the Tendering district. Working hours: All shifts are 8 hours long with various start times available: Monday to Friday – start times between 6AM – 3PM Saturday & Sunday – 6AM – [...]Read More... from Duty Officer See details