Senior Site Reliability Engineer
Job Description
Job Description
We are looking for a highly motivated and experienced Site Reliability Engineer to join our team. The ideal candidate will have a strong understanding of Site Reliability Engineering principles and practices, as well as experience with a variety of programming languages and technologies.
The responsibilities of this role include:
- Designing, building, and maintaining the infrastructure used by all ML services
- Working cross-functionally with various platform teams, ML teams, and product partners to build the next generation of high-availability ML services in the cloud
- Building and maintaining observability and test tooling, including logging, monitoring, distributed tracing, alerting, and offline test tools
- Practicing continuous learning and agile delivery to stay informed and focused on our deliverables
- Supporting GKE services and maintenance, including software upgrades, performance tuning, and GKE cluster tuning and optimization
- Building GKE tooling and automating deployments
The ideal candidate will have the following skills and qualifications:
- Bachelor's degree in computer science or a related field
- 5+ years of experience in Site Reliability Engineering
- Strong understanding of Site Reliability Engineering principles and practices
- Experience with a variety of programming languages and technologies, including Java, Scala, Python, Go, and Kubernetes
- Excellent problem-solving and debugging skills
- Ability to work independently and as part of a team
- Experience with search technologies such as Lucene/Solr or Elasticsearch is a plus
- Experience with supporting ML Services is a plus
- Experience with Unix/Linux operating systems and networking stack (e.g., TCP/IP, routing, network topologies and hardware, SDN) is a plus
- Experience with Grafana is a plus
If you are a highly motivated and experienced Site Reliability Engineer who is looking for a challenging and rewarding opportunity, we encourage you to apply.
Note: We have removed all mentions of the client's name from this job description to protect their confidentiality.
This is a remote, work from home position. This role is to be filled outside the state of Colorado.
#LI-RS1