Site Reliability Engineering Manager
Two95 International Inc.
Sacramento, california
Job Details
Full-time
Full Job Description
Position – Site Reliability Engineering Manager
Location – Sacramento, CA
Type – Fulltime
Salary – $Market
ESSENTIAL JOB FUNCTIONS AND BASIC DUTIES- The SREM will ensure that reliability measures are incorporated into strategic IT plans and that expectations are clearly defined. The SREM will also be responsible for working with business and IT stakeholders to balance real-world risks with business drivers such as speed, agility, flexibility and performance. The ISM's job is composed of a broad range of activities in support of IT program initiatives, including:
- Strategic support
- Reliability liaison
- Architecture/engineering support
- Operational support
- Work with the Senior Director, Service Delivery to develop a reliability program and projects that address identified risks and platform reliability, automation, and scale requirements.
- Manage the process of gathering, analyzing and assessing the current and future reliability landscape, as well as providing the Service Delivery Senior Director with a realistic overview of risks in the enterprise environment.
- Work with the Service Delivery Senior Director to develop budget projections based on short- and long-term goals and objectives.
- Monitor and report on reliability standards, as well as the enforcement of policies within the IT department.
- Propose changes to existing policies and procedures to ensure operating efficiency and regulatory compliance.
- Manage a staff of reliability engineering professionals, hire and train new staff, conduct performance reviews, and provide leadership and coaching, including technical and personal development programs for team members.
Requirements
Reliability Liaison
- Assist resource owners and IT staff in understanding and responding to reliability concerns experienced. Provide reliability communication, awareness and training for audiences, which may range from senior leaders to field staff. Work as a liaison with vendors and the legal and purchasing departments to establish mutually acceptable contracts and service-level agreements. Manage production issues and incidents and participate in problem and change management forums.
- Work with various stakeholders to identify information asset owners to classify data and systems as part of a reliability framework implementation. Serve as an active and consistent participant in the systems reliability governance process.
- Work with the Services Delivery Senior Director and other IT and business stakeholders to define metrics and reporting strategies that effectively communicate successes and progress of the reliability program. Provide support and guidance for legal and regulatory compliance efforts, including audit support.
Architecture/Engineering Support
- Consult with other IT and reliability staff reports to ensure that reliability is factored into the evaluation, selection, installation and configuration of hardware, applications and software. Recommend and coordinate the implementation of technical controls to support and enforce defined reliability practices and policies.
- Research, evaluate, design, test, recommend or plan the implementation of new or updated reliability hardware or software, and analyze its impact on the existing environment; provide technical and managerial expertise for the administration of reliability tools. Work with the enterprise architecture team to ensure that there is a convergence of business, technical and reliability requirements; liaise with IT management to align existing technical installed base and skills with future architectural requirements.
- Develop a strong working relationship with the reliability engineering team reporting to this position to develop and implement controls and configurations aligned with reliability policies and legal, regulatory and audit requirements.
Operational Support
- Coordinate, measure and report on the technical aspects of reliability engineering management. Manage outsourced vendors that provide reliability functions for compliance with contracted service-level agreements. Manage and coordinate operational components of incident management, including detection, response and reporting. Maintain a knowledgebase comprising a technical reference library, reliability trends and practices, and laws and regulations.
- Manage the day-to-day activities of reliability management, identify risk tolerances, recommend treatment plans and communicate information about residual risk. Manage reliability projects and provide expert guidance on reliability matters for other IT projects. Ensure audit trails, system logs and other monitoring data sources are reviewed periodically and are in compliance with policies and audit requirements.
- Design, coordinate and oversee reliability testing procedures to verify the reliability of systems, networks and applications, and manage the remediation of identified risks.
- Performs other duties as directed.
EDUCATION AND EXPERIENCE:
Bachelor’s or Master’s degree in Reliability Engineering, Computer Science, Information systems, or related discipline, plus a minimum of seven years of IT experience, five years of which must be in a reliability engineering role, and at least two years in a supervisory capacity, or an equivalent combination of education and experience.
Benefits
Note: If interested please send your updated resume and include your salary requirement along with your contact details with a suitable time when we can reach you. If you know of anyone in your sphere of contacts, who would be a perfect match for this job then, we would appreciate if you can forward this posting to them with a copy to us.
We look forward to hearing from you at the earliest!