Principal ML Engineer

The Opportunity

As a Principal ML Engineer, you will lead the technical architecture and engineering

strategy for integrating sophisticated AI into high-stakes Healthcare Information Systems

(HIS). We are looking for a seasoned builder who prioritizes reliability, system

performance, and automated scalability over hype.

While many focus on the 'science' of modeling, your mission is the engineering of the

ecosystem. You will architect the robust MLOps pipelines and cloud infrastructure

required to move models from experimental notebooks into mission-critical clinical

environments. You are the bridge between raw data and resilient, production-grade AI

services.

Key Responsibilities

1. MLOps & System Architecture

• Production Lifecycle: Lead the design and implementation of end-to-end ML

lifecycles, focusing on automated CI/CD pipelines, model versioning (MLflow/DVC),

and reproducible experimentation.

• Inference at Scale: Architect high-performance serving layers for both LLMs and

classical models, ensuring low-latency and high-availability in a secure healthcare

cloud environment.

• Agentic Orchestration: Build the underlying infrastructure for agent-based

reasoning systems, ensuring these 'Agentic' workflows are traceable, auditable,

and integrated into existing HIS.

2. Data Engineering & Infrastructure

• Data Reliability: Design robust data pipelines (ETL/ELT) to process healthcarespecific

formats (FHIR, HL7, DICOM) into high-quality features for real-time and

batch inference.

• Hybrid Infrastructure: Manage and optimize cloud-native infrastructure

(AWS/Azure/GCP) using Infrastructure as Code (Terraform/Pulumi) to support heavy

compute workloads.

• System Integrity: Implement comprehensive monitoring and observability

frameworks to detect data drift, model decay, and system bottlenecks before they

impact clinical outcomes.

3. Technical Leadership & Governance

• Engineering Authority: Serve as the lead architect for the ML platform, ensuring all

systems are HIPAA/HITRUST compliant and follow 'security-by-design' principles.

• Operational Excellence: Establish rigorous standards for code quality,

containerization (Docker/Kubernetes), and system documentation across the

engineering organization.

• Strategic Mentorship: Elevate the team by fostering a culture of 'ML as

Engineering,' guiding junior engineers in building maintainable, modular, and

scalable software.

Candidate Profile

Education & Experience:

• Academic Background: Master’s or PhD in Computer Science, Software

Engineering, or a related technical field.

• Proven Track Record: 10+ years of experience in software engineering, with at least

6 years dedicated to deploying and maintaining large-scale ML systems in

production (not just research or POCs).

Core Technical Stack:

• MLOps & Cloud: Expert-level experience with Cloud Providers (AWS/GCP/Azure)

and orchestration tools (Kubernetes, Kubeflow, or Airflow).

• Engineering & Programming: Expert-level Python and Java/Go (or similar). Deep

proficiency in backend frameworks and system design patterns.

• Data Engineering: Strong experience with Spark, Snowflake/Databricks, and

building scalable feature stores.

• Applied AI: Hands-on experience deploying Generative AI (LLMs) and Agentic

frameworks (LangChain/LangGraph) within a containerized microservices

architecture.

The 'Principal' Edge (Preferred):

• Hardware Optimization: Experience with GPU optimization, quantization, or

specialized serving frameworks (vLLM, TGI).

• Security & Compliance: Deep understanding of cybersecurity best practices within

regulated industries (Healthcare, Finance, or Defense).

• Distributed Systems: Proven ability to design systems that handle massive

concurrency and distributed data processing