JobHire
face icon
Register to automatically apply for this and similar jobs
Register
star

MLOps Engineer (Machine Learning Operations Engineer)

Unreal Gigs

San Francisco, california


Job Details

Full-time


Full Job Description

Are you passionate about bringing the best of machine learning and DevOps together to create reliable, scalable, and efficient AI systems? Do you thrive on automating machine learning pipelines, deploying models at scale, and ensuring that AI solutions deliver value in production environments? If you’re excited about optimizing the entire machine learning lifecycle—from development to deployment and beyond—then our client has the perfect opportunity for you. We’re looking for an MLOps Engineer (aka The AI Infrastructure Maestro) to design, automate, and manage robust machine learning pipelines that power the next generation of AI-driven products.

As an MLOps Engineer at our client, you’ll be the glue that connects data scientists, machine learning engineers, and operations teams, ensuring that machine learning models are efficiently deployed, monitored, and maintained. You’ll lead efforts to create scalable infrastructure for AI, automate workflows, and develop tools that enable continuous integration and continuous delivery (CI/CD) of ML models.

Key Responsibilities:

  1. Build and Automate ML Pipelines:
    • Design, develop, and manage automated ML pipelines for data ingestion, preprocessing, model training, and deployment. You’ll implement CI/CD pipelines for ML models using tools like Kubernetes, Docker, Jenkins, or GitLab CI.
  2. Model Deployment and Monitoring:
    • Deploy machine learning models into production environments and ensure they perform efficiently at scale. You’ll set up monitoring systems to track model performance, detect drift, and retrain models when necessary.
  3. Optimize Model Training and Scalability:
    • Work with machine learning engineers to optimize model training processes, leveraging distributed computing and parallelism. You’ll implement solutions to scale model training and deployment using cloud platforms (AWS, GCP, Azure) and container orchestration tools.
  4. Collaboration with Data Scientists and Engineers:
    • Collaborate closely with data scientists and machine learning engineers to understand their needs, improve workflows, and integrate model artifacts into production environments. You’ll ensure smooth transitions from development to production.
  5. Infrastructure as Code and Automation:
    • Automate the provisioning, scaling, and maintenance of AI infrastructure using Infrastructure as Code (IaC) tools such as Terraform, Ansible, or CloudFormation. You’ll ensure infrastructure is resilient, scalable, and cost-efficient.
  6. Monitoring, Logging, and Security:
    • Implement robust monitoring and logging systems to track the health of models in production. You’ll ensure models and data meet compliance, security, and governance standards, keeping systems secure while maintaining performance.
  7. Performance Optimization and Troubleshooting:
    • Identify bottlenecks in the ML workflow and propose optimizations to improve efficiency and reduce costs. You’ll troubleshoot issues related to model deployment, infrastructure performance, and data integration.

Requirements

Required Skills:

  • MLOps and DevOps Expertise: Strong experience with machine learning operations (MLOps) and DevOps practices, including CI/CD pipelines, containerization (Docker, Kubernetes), and infrastructure automation. You can efficiently deploy, monitor, and maintain machine learning models at scale.
  • Cloud Platforms and Infrastructure: Expertise in cloud platforms such as AWS, GCP, or Azure, with experience in building scalable infrastructure for machine learning workloads. You’re comfortable working with services like S3, EC2, SageMaker, Dataflow, or equivalent cloud ML services.
  • Programming and Automation Tools: Proficiency in scripting and programming languages like Python, Bash, and Terraform. You have hands-on experience with automation tools (Ansible, Terraform, Jenkins) and machine learning libraries like TensorFlow, PyTorch, or Scikit-learn.
  • Data and Model Monitoring: Strong experience with monitoring tools like Prometheus, Grafana, or ELK Stack to track data pipelines and model performance in production environments.
  • Collaboration and Communication: Excellent communication skills, with the ability to work closely with data scientists, software engineers, and IT teams to streamline ML workflows and resolve deployment challenges.

Educational Requirements:

  • Bachelor’s or Master’s degree in Computer Science, Engineering, Machine Learning, or a related field. Equivalent experience in machine learning operations or DevOps is highly valued.
  • Certifications or additional coursework in cloud computing, MLOps, or DevOps (e.g., AWS Certified DevOps Engineer, Google Cloud Professional Machine Learning Engineer) are a plus.

Experience Requirements:

  • 3+ years of experience in MLOps, DevOps, or cloud infrastructure management, with hands-on experience deploying machine learning models into production.
  • Proven track record of automating ML pipelines, building scalable infrastructure for AI models, and optimizing model performance in production environments.
  • Experience working with machine learning teams and a strong understanding of the full ML lifecycle, from development to deployment.

Benefits

  • Health and Wellness: Comprehensive medical, dental, and vision insurance plans with low co-pays and premiums.
  • Paid Time Off: Competitive vacation, sick leave, and 20 paid holidays per year.
  • Work-Life Balance: Flexible work schedules and telecommuting options.
  • Professional Development: Opportunities for training, certification reimbursement, and career advancement programs.
  • Wellness Programs: Access to wellness programs, including gym memberships, health screenings, and mental health resources.
  • Life and Disability Insurance: Life insurance and short-term/long-term disability coverage.
  • Employee Assistance Program (EAP): Confidential counseling and support services for personal and professional challenges.
  • Tuition Reimbursement: Financial assistance for continuing education and professional development.
  • Community Engagement: Opportunities to participate in community service and volunteer activities.
  • Recognition Programs: Employee recognition programs to celebrate achievements and milestones.

Get 10x more interviews and get hired faster.

JobHire.AI is the first-ever AI-powered job search automation platformthat finds and applies to relevant job openings until you're hired.

Registration