Job Details

Lead, Site Reliability Engineer


Date Opened: 09/09/2022

Job Type:

Job Number: 220001FU

Job Description

What This Position is All About

The Site Reliability Engineering Lead role assists in the planning, monitoring, and controlling the day-to-day operations and delivery aspects of the Site Reliability Engineering teams.  The role assists in managing team productivity and works to ensure the optimal health of the The Bay eCommerce & CRM platforms by overseeing platform performance, resilience, and stability. This role is also an active participant in all aspects of Site Reliability Engineering, including technical vision, telemetry and observation decisions, automation strategy, solution delivery, and platform incident and problem management.  This is a leadership role with both technical and people leadership responsibilities.  As such, this role participates in short and long-term systems planning, teams and organizational planning.a

Who You Are:

  • Bachelor’s Degree in Computer Science or equivalent
  • Azure/AWS, Microsoft, RedHat, certifications and knowledge of ITIL/MOF practices
  • Highly experienced with monitoring, logging & telemetry tools like New Relic, Splunk, ELK, Nagios, SolarWinds, Prometheus, AWS Cloudwatch, Datadog, etc. 
  • Advanced understanding of Networking, Content Delivery Networks (CDN, e.g. Akamai, Cloudflare), and Cloud Platforms.
  • Understanding hand-on experience in the monitoring of streaming platform technologies, like Apache Kafka. 
  • Highly experience with automation and tools such as (but not limited to) Jenkins, Chef, Terraform, Ansible, etc.
  • Expert in architecting, creating and supporing Automation (PowerShell, Python, Ruby, AWK, SED, etc.) to run health-checks and self-healing capabilities for the platforms.
  • Advanced experience in the use of the following platforms and tools:
  • Cloud: MS Azure/AWS Cloud
  • Networking fundamentals: TCP/IP, DNS, WINS, DHCP, etc. 
  • Collaboration & Change Management tools: Jira, ServiceNow, Cherwell, etc.
  • Databases: (Oracle, MS SQL, Teradata, DB2, etc.)
  • 8+ years of experience working in global organizations with the ability to effectively communicate with executives, leaders and individual contributors across the organization.
  • 5+ years of SRE experience working on telemetry, observation, self-healing solutions, and platform automation.


Job Qualifications

Thank you for your interest with HBC. We look forward to reviewing your application.


HBC provides equal employment opportunities (EEO) to all employees and applicants for employment.