Software Engineer, AI Training and Infrastructure
Skild AI
Pittsburgh, pennsylvania
Job Details
Not Specified
Full Job Description
Company Overview:
Skild AI is a startup focused on creating large-scale foundation models for robotics with the goal of developing general-purpose robotic intelligence. We view data-driven machine learning methods as the key to unlocking general-purpose capabilities for the widespread deployment of robots to perform economically useful tasks within society. Our team consists of individuals with varying levels of experience and backgrounds, from new graduates to domain experts. Relevant industry experience is important, but ultimately less so than your demonstrated abilities and attitude. We are looking for passionate individuals who are eager to explore uncharted waters and contribute to our innovative projects. Both generalists and specialists within specific sub-fields of robotics (planning, SLAM, manipulation, computer vision, etc.) are encouraged to apply.
Position Overview:
We are looking for a Software Engineer to work at the forefront of developing and optimizing the software infrastructure and tools necessary for training cutting-edge AI models. You will focus on building robust, scalable, and efficient training pipelines and frameworks that support the entire machine learning lifecycle, from data preparation to model deployment. You will collaborate with researchers and machine learning engineers to ensure seamless integration and operation of training systems, pushing the boundaries of what AI can achieve in real-world robotics applications. You will explore new ways to efficiently make use of many types of data in our training pipeline.
Responsibilities:
- Develop and maintain robust, scalable and distributed training pipelines (data preprocessing, training orchestration, and model evaluation) and frameworks for large-scale AI models.
- Optimize training processes for performance and resource utilization, ensuring scalability and reliability.
- Collaborate with researchers and machine learning engineers to integrate state-of-the-art algorithms and techniques into training pipelines.
- Monitor and analyze training, identifying bottlenecks and proposing solutions to improve efficiency and performance.
- Ensure the robustness and reliability of the training infrastructure, including automated testing and continuous integration.
Preferred Qualifications:
- BS, MS or higher degree in Computer Science, Robotics, Engineering or a related field, or equivalent practical experience.
- Proficiency in Python, C++, or similar and at least one deep learning library such as PyTorch, Tensorflow, JAX, etc.
- Strong background in distributed computing, parallel processing techniques, handling large-scale datasets and data preprocessing.
- Deep understanding of state-of-the-art machine learning techniques and models.
- Experience with cloud-based training environments (AWS, Google Cloud, Azure).
- Experience in developing and maintaining software tooling and infrastructure for machine learning.
- Deep understanding and practical experience with software engineering principles, including algorithms, data structures, and system design.
- Experience with continuous integration and automated testing frameworks.