SRE at Goldman Sachs
Site Reliability Engineering (SRE) is an engineering discipline that combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems.
At Goldman Sachs, SRE is responsible for the availability and reliability of our firm’s most critical platform services, and ensures they meet the requirements of our internal and external users.
Discover more
Read the post "Observability at Scale" on our Developer blog to learn how we bring reliability foundation to our engineering ecosystem.
Join the team
Join the SRE team at Goldman Sachs and engineer the future of finance. Apply to our open positions below:
- Site Reliability Engineer - Marquee
- Site Reliability Engineer - NY
- Site Reliability Engineer - SLC
- Site Reliability Engineer - Toronto
At Goldman Sachs, we look for engineers who are motivated to collaborate with our businesses to build and run sustainable production systems, which can evolve and adapt to changes in our fast-paced, global business environment.
- Balance feature development velocity and reliability, with well-defined SLOs.
- Run the production environment by monitoring availability and taking a holistic view of system health.
- Drive incident management process and support a blameless post-mortem culture.
- Partner with development teams to improve services via rigorous testing and release procedures.
- Participate in system design consulting, platform management, and capacity planning.
- Create sustainable systems and services through automation and uplifts.
Stay in touch
Connect with the SRE team at sre-hiring@gs.com for current and future career opportunities. We are looking forward to seeing you join SRE at Goldman Sachs.