Sr. Site Reliability Engineer
Sky Solutions LLC
Bentonville, Arkansas, United States
Job type: fulltime
Job industry: I.T. & Communications
Sky Solutions is looking for a passionate and experienced Site Reliability Engineer (SRE) to join the team, focusing on architectural design and devising solutions to improve service reliability, including service health monitoring and response capabilities, automation of deployment, configuration, recovery, and more.
- Engineer solutions that protect service health and prevent customer impact proactively Define service level objectives (SLOs) and service level indicators (SLIs) to represent and measure service quality Identify and implement solutions to reduce incident mitigation time, including telemetry generation, diagnostic tools and automated recovery options Design and maintain production monitoring systems.
- Write code to help instrument and monitor the health, and performance, of workflow services and processes to constantly improve customer experience Define and champion Continuous Integration (CI)/Continuous Deployment (CD), and Service (Regression and Scale) test Automation Apply availability, performance, and scalability expertise to make improvements and ensure services continue to grow according to expectations Automate common, repeatable tasks at large scale to streamline operational procedures.
- Leverage orchestration management (Zookeeper, Mesos), and configuration management (Chef, Puppet, Ansiable).
- Introduce and maintain continuity and recoverability capabilitiesEngage in live site incident response efforts to drive mitigation and resolution
- BS degree in Computer Science or related technical field involving coding, DevOps or equivalent practical experience.
- 3+ years production level experience with distributed applications at scale in public cloud (AWS and/or Azure)
- Experience in one (and preferably more) of the following languages: C, C++, Java, Python, Go, Perl or Ruby Experience implementing service health monitoring, dependency mapping and data integrity validations. Kafka, and/or Cassandra cluster monitoring strongly desired.
- Ability to debug and optimize code as well as automate routine tasks
- Build & Deployment: Red and Green deployment
- Experience w/ Containerization technologies: Docker and Kubernetes
- Experience designing and implementing build and release pipelines for continuous delivery with automated validation pre- and post- deployment
- Strong debugging skills and methodological approach towards complex problem solving
- Excellent verbal and written communication skills, and high attention to detail
Sky Solutions is a strategic consulting, staffing and technology services company headquartered in the Washington D.C. Metro Area. We deploy the optimal resources, expertise and technologies to help organizations improve their business performance. Sky Solutions is committed to creating innovative, flexible solutions for government and commercial clients. As a Small Disadvantaged Business (SDB) with Woman and Minority Owned certifications, we provide key staff to employers to meet their business-critical needs while promoting diversity and equal opportunity in employment.- provided by Dice