Design and Deploy Infrastructure: Develop and maintain scalable, high-performance cloud-based infrastructure for ML workloads and serving ML APIs or client endpoints.
Cloud Platforms: Deploy, manage, and optimize cloud-based infrastructure (AWS, Azure, GCP). Setup ML nodes for local development and distributed training workloads, maintain compatibility between the two.
System Management: Install, configure, and monitor servers.
Storage management: Optimize various types of shared / local storage maintaining big data for ML workloads.
Containerization and Orchestration: Manage and scale containerized applications using Docker, Kubernetes, Terraform, etc.
Collaboration: Work closely with the rest of the technical team to ensure smooth orchestration of the ML and production workloads.
Incident Response: Respond to cloud / production incidents, perform analysis, and implement solutions to prevent recurrence.
Key Qualifications: 3 years professional experience in a cloud-related role, preferred ML-related.
Proficiency in writing scripts (Bash, PowerShell, Python, …) to automate tasks.
Proficiency in cloud platforms (e.g., AWS, GCP, Azure).
Proficiency in containerization (e.g., Docker, Kubernetes).
Proficiency in orchestrating a cloud.
Preferred Qualifications Familiarity with Python (Jupyter) and ML frameworks (PyTorch).
Familiarity with cloud monitoring tools (e.g., Prometheus, Grafana).
Familiarity with cloud-based database systems (Amazon RDS, Aurora, Redshift, Google Cloud SQL, Spanner, …) and data-visualisation tools (Amazon QuickSight, Apache Superset).
Familiarity with CI/CD tools (e.g., CircleCI).
At SpAItial, we are committed to creating a diverse and inclusive workplace. We welcome applications from people of all backgrounds, experiences, and perspectives. We are an equal opportunity employer and ensure all candidates are treated fairly throughout the recruitment process.
#J-18808-Ljbffr