Job Overview
Reporting directly to the Head of Software Infrastructure, you will design, build, and maintain the infrastructure that powers our machine learning and AI workloads. You’ll play a critical role in enabling our ML engineers to train, deploy, and monitor models at scale, while ensuring our infrastructure is secure, cost-effective, and optimized for performance.
This is a hands-on role where you’ll contribute to both the evolution of our ML infrastructure and the implementation of new AI capabilities, directly shaping how Monarch leverages machine learning to deliver exceptional customer experiences.
What You’ll Do:
-
Maintain, improve, and scale cloud infrastructure that supports both traditional applications and ML/AI workloads.
-
Partner with ML engineers to design and deploy specialized resources for model training, inference, and data pipelines.
-
Implement automated infrastructure solutions using Terraform/OpenTofu to accelerate environment provisioning and resource management.
-
Introduce and integrate modern AI infrastructure capabilities, including vector databases, model observability tools, and GPU/accelerator workloads.
-
Provide technical guidance on ML workload architecture, security, and performance optimization, aligning with Monarch’s culture of continuous improvement.
A Partnership with AI Engineering:
-
You Own: The core cloud infrastructure (IaC), networking, secrets management, Kubernetes/GPU orchestration, and shared platform services.
-
AI Eng Owns: The LLM runtime, retrieval architecture (vector stores, indexing), evaluation frameworks, safety guardrails, prompt/model versioning, AI observability, and cost/latency optimization.
-
Together You Own: SLAs/SLOs, rollout strategies, incident response protocols, and capacity planning for all AI services.
What You’ll Bring:
-
4+ years of professional experience with cloud infrastructure (AWS or GCP preferred).
-
2+ years of professional experience deploying and managing ML workloads in the cloud.
-
Proficiency in Python for automation, scripting, and tooling.
-
Advanced hands-on experience with Infrastructure-as-Code tools (Terraform or OpenTofu) in production environments.
-
Strong problem-solving skills, ability to work autonomously, and a collaborative mindset.
-
Experience in cloud networking and security best practices for data-intensive workloads.
-
Clear verbal and written communication, cross-functional collaboration, analytical thinking, ability to manage multiple priorities, self-motivated, and proactive.
Are you interested in this position?
Apply by clicking on the “Apply Now” button below!
#AlbionarcJobs#FintechJobs
#AsiaJobs#MiddleEastCareers
#TechTalent#FintechRecruitment
#FinanceOpportunities#