HPC Platform Engineer

·
Full time
Location: London
· ·
Category: IT & Technology
HPC Platform Engineer The firm is developing a cutting-edge high-performance computing (HPC) platform to support our portfolio managers, developers, quantitative analysts, and data scientists, enabling seamless scaling of compute capabilities both on-premise and in the cloud. We seek a senior, hands-on engineer who is customer-focused and an advocate for customer-driven solutions. The ideal candidate will have a strong understanding of physical and cloud-based infrastructure, experience in automating infrastructure, and proficiency in service and infrastructure lifecycle management. They will engage with teams to understand their requirements, drive development for our HPC platforms, and collaborate with other teams for integration. The candidate should also have expertise in Linux systems administration, container orchestration, networking, security, and infrastructure-as-code. Experience integrating, testing, and optimizing the integration of HPC with storage and data platforms is also essential. Principal Responsibilities Collaborate within a customer-focused team to design, develop, test, and deploy HPC infrastructure in alignment with business needs. Foster strong relationships with quantitative, software engineering, and data science teams to ensure the HPC Platforms effectively meet their requirements. Engage with business units to promote understanding and drive adoption of our HPC offerings. Qualifications / Desired Skills Deep understanding of Linux operating systems, with substantial practical experience in performance tuning, specifically related to HPC workloads. Experience consulting with business units around the execution of HPC workloads. Experience with HPC cluster schedulers, such as Slurm, Grid engine, MOAB, PBS. Experience with dynamically scaling, partitioning, and resource management within HPC environments. Experience with and a strong understanding of containers and container orchestration, Kubernetes, container runtimes, etc. Experience contributing to a shared code base, including infrastructure as code. Experience with configuration management and automation tools, such as Chef, Ansible, Salt, Packer. Experience with building monitoring and alerting on logs and metrics. Excellent written and verbal communications. Excellent troubleshooting and analytical skills. Self-starter able to execute independently, on a deadline, and under pressure.

#J-18808-Ljbffr

Recent Jobs

London (On site) · Full time

Are you a smart, driven professional who takes pride in making a difference in local communities? Turner & Townsend’s Real Estate division is experiencing significant growth and we’re looking for an experienced industry professional with health project experience to join our high-performing and collaborative Project Management team. Why Join Us? Impactful Work: Contribute to social [...]Read More... from Assistant Project Manager – Healthcare See details

Chasetown (On site) · Full time

My client, Autosmart International are a manufacturing success story! Site Operations Manager – leading fast-paced manufacturing and warehousing About Our Client Autosmart International is a manufacturing success story, leading the field in vehicle cleaning products. We are the No.1 choice of automotive trade customers across the UK. We have doubled in size in the last [...]Read More... from Site Operations Manager See details

London (On site) · Full time

CSS are looking for an experienced duty officer to join our client’s team who are a local council responsible for all areas within the Tendering district. Working hours: All shifts are 8 hours long with various start times available: Monday to Friday – start times between 6AM – 3PM Saturday & Sunday – 6AM – [...]Read More... from Duty Officer See details