We are looking for a motivated MLOps Engineer to join our team, working remotely from Canada (Western Timezone only Pacific or Mountain time zones) . As an MLOps Engineer, you will bridge the gap between data science and operations, ensuring seamless integration, deployment, and management of machine learning models in production environments. Your mission will be to automate, scale, and monitor the entire ML lifecycle, leveraging your expertise in cloud infrastructure, Dev Ops practices, and scripting to deliver efficient, reliable, and secure data-driven solutions that support business innovation.
Key Responsibilities Architect, provision, and automate infrastructure on both hyperscaler CSPs and NCP for AI/ML workloads. Build, optimize, and maintain end-to-end machine learning pipelines (CI/CD/CT) for continuous integration, delivery, and training in high-throughput, GPU -driven environments. Advance Infrastructure as Code (Ia C) methods with tools such as Terraform, Ansible, and proprietary SDKs/APIs.
Manage the deployment and orchestration of large-scale clusters, GPU scheduling, VM automation, and data/storage/network for multi-cloud landscapes.
Containerize, serve, and monitor ML models using Slurm, Docker, Kubernetes (including Helm and advanced GPU scheduling).
Implement comprehensive monitoring, model/data drift detection, and operational analytics tailored to high-performance compute platforms. (OTEL, DCGM)
Ensure robust security, compliance, identity management, and audit readiness in mixed cloud environments. (SOC2)
Collaborate across engineering, AI research, and operations, producing clear technical documentation and operational runbooks.
Main Requirements
6+ years of infrastructure, cloud, or MLOps experience, with at least 1 year in NCP platforms (e.g., Core Weave, Nebius, Lambda Labs, Yotta).
Expertise in CSPs (AWS, Azure, GCP) and NCPs (specialized GPU/AI clouds).
Strong proficiency in Ia C (Terraform, Ansible, Pulumi) and Dev Ops principles.
Deep hands-on experience orchestrating and monitoring GPU-accelerated workloads and large-scale Slurm or Kubernetes based environments.
Strong Go/Python (or comparable scripting language) and solid Linux/Unix administration.
Proven experience in ML pipeline and model deployment in heterogeneous or multi-cloud AI setups.
Excellent teamwork, stakeholder management, and communication for cross-disciplinary project delivery.
Preferred Skills
Familiarity with GPU-as-a-Service, job orchestration, MLflow/W& B, and advanced monitoring (OTEL, ELK, LGTM, DCGM).
Industry certifications in major clouds (AWS/GCP/Azure).
Experience supporting enterprise-grade business continuity, disaster recovery, and compliance in mixed cloud environments.
Why choose us
An international community bringing together more than 110 different nationalities
An environment where trust is central: 70% of our leaders started their careers at the entry level
A strong training system with our internal Academy and more than 250 modules available
A dynamic work environment that frequently comes together for internal events (afterworks, team buildings, etc.)
Amaris Consulting promotes equal opportunities. We are committed to bringing together people from diverse backgrounds and creating an inclusive work environment. In this regard, we welcome applications from all qualified individuals, regardless of sex, sexual orientation, race, ethnicity, beliefs, age, marital status, disability, or other characteristics.