Principal ML Engineer
The Opportunity
As a Principal ML Engineer, you will lead the technical architecture and engineering
strategy for integrating sophisticated AI into high-stakes Healthcare Information Systems
(HIS). We are looking for a seasoned builder who prioritizes reliability, system
performance, and automated scalability over hype.
While many focus on the 'science' of modeling, your mission is the engineering of the
ecosystem. You will architect the robust MLOps pipelines and cloud infrastructure
required to move models from experimental notebooks into mission-critical clinical
environments. You are the bridge between raw data and resilient, production-grade AI
services.
Key Responsibilities
1. MLOps & System Architecture
• Production Lifecycle: Lead the design and implementation of end-to-end ML
lifecycles, focusing on automated CI/CD pipelines, model versioning (MLflow/DVC),
and reproducible experimentation.
• Inference at Scale: Architect high-performance serving layers for both LLMs and
classical models, ensuring low-latency and high-availability in a secure healthcare
cloud environment.
• Agentic Orchestration: Build the underlying infrastructure for agent-based
reasoning systems, ensuring these 'Agentic' workflows are traceable, auditable,
and integrated into existing HIS.
2. Data Engineering & Infrastructure
• Data Reliability: Design robust data pipelines (ETL/ELT) to process healthcarespecific
formats (FHIR, HL7, DICOM) into high-quality features for real-time and
batch inference.
• Hybrid Infrastructure: Manage and optimize cloud-native infrastructure
(AWS/Azure/GCP) using Infrastructure as Code (Terraform/Pulumi) to support heavy
compute workloads.
• System Integrity: Implement comprehensive monitoring and observability
frameworks to detect data drift, model decay, and system bottlenecks before they
impact clinical outcomes.
3. Technical Leadership & Governance
• Engineering Authority: Serve as the lead architect for the ML platform, ensuring all
systems are HIPAA/HITRUST compliant and follow 'security-by-design' principles.
• Operational Excellence: Establish rigorous standards for code quality,
containerization (Docker/Kubernetes), and system documentation across the
engineering organization.
• Strategic Mentorship: Elevate the team by fostering a culture of 'ML as
Engineering,' guiding junior engineers in building maintainable, modular, and
scalable software.
Candidate Profile
Education & Experience:
• Academic Background: Master’s or PhD in Computer Science, Software
Engineering, or a related technical field.
• Proven Track Record: 10+ years of experience in software engineering, with at least
6 years dedicated to deploying and maintaining large-scale ML systems in
production (not just research or POCs).
Core Technical Stack:
• MLOps & Cloud: Expert-level experience with Cloud Providers (AWS/GCP/Azure)
and orchestration tools (Kubernetes, Kubeflow, or Airflow).
• Engineering & Programming: Expert-level Python and Java/Go (or similar). Deep
proficiency in backend frameworks and system design patterns.
• Data Engineering: Strong experience with Spark, Snowflake/Databricks, and
building scalable feature stores.
• Applied AI: Hands-on experience deploying Generative AI (LLMs) and Agentic
frameworks (LangChain/LangGraph) within a containerized microservices
architecture.
The 'Principal' Edge (Preferred):
• Hardware Optimization: Experience with GPU optimization, quantization, or
specialized serving frameworks (vLLM, TGI).
• Security & Compliance: Deep understanding of cybersecurity best practices within
regulated industries (Healthcare, Finance, or Defense).
• Distributed Systems: Proven ability to design systems that handle massive
concurrency and distributed data processing