Site Reliability Engineer
SchooLinks
N/A
Job Details
Full-time
Full Job Description
Are you passionate about ensuring the smooth running of software systems? Do you enjoy working in a fast-paced environment where you can make a real impact? If so, SchooLinks is looking for a talented Site Reliability Engineer to join our team!
At SchooLinks, we are a leading provider of college and career readiness solutions for today's students. Our modern software platform helps students explore career options, discover colleges, and navigate the college application process. We are dedicated to empowering students to make informed decisions about their future and achieve their academic and career goals.
As a Site Reliability Engineer, you will play a crucial role in ensuring the reliability, scalability, and performance of our software systems. You will work closely with our development team to design, implement, and maintain the infrastructure necessary to support our platform. This is a unique opportunity to contribute to a cutting-edge software product and make a real impact on the lives of students.
This is a US-based role (Central and Eastern timezones)
Key Tech: AWS, Django (Python), React (Javascript), MySQL, Redis, Docker
Responsibilities
- Maintain and actively monitor our infrastructure to ensure the high availability of our platform.
- Troubleshoot and resolve production issues, ensuring minimal downtime and impact on system performance.
- Develop and maintain monitoring systems to ensure the health and performance of our applications.
- Improve our system architecture to reduce costs while balancing security and performance.
- Work on designing and tracking metrics for platform uptime.
- Increase the observability of our system by capturing relevant metrics and logs.
- Implement and maintain intrusion detection, automated remediation, and patch management systems.
- Improve our CI/CD systems to speed up unit test execution and deployments with proper change and release management processes.
- Work on our SOC2 and compliance initiatives.
- Manage backup and disaster recovery mechanisms to protect the integrity of our data.
Requirements
- Bachelor's degree in Computer Science, Engineering, or a related field.
- Proven experience as a Site Reliability Engineer or in a similar role.
- Strong knowledge of cloud infrastructure and experience with AWS
- Experience with Infrastructure as Code tools such as Terraform
- Experience with containerization technologies such as Docker.
- Familiarity with configuration management tools (e.g., Ansible, Chef).
- Experience in designing multi-tenant database solutions, designing for failover, fault-tolerance, and disaster recovery
- Experience and knowledge of SOC2 controls and compliance requirements
- Excellent troubleshooting and problem-solving skills.
- Strong communication and collaboration skills.
Benefits
- Competitive Salary
- Part of a remote-friendly company
- Flexible working hours and healthy asynchronous working practices
- Long-term employment with considerations for yearly promotion and raises
- For US-based candidates:
- Full health (healthcare, vision, dental, ClassPass, etc.)
- Company 401k Program with up to 1% matching