What works for Google, what works for Facebook, and what works for Netflix may not be the right thing for the rest of us. Putting too much weight behind the opinions of a few large organizations can bite you. The same is true of charting a path forward based on the experience of a few individuals, without being aware of the broader landscape. This is why I'm a huge fan of studies that are more broadly based, like Accelerate and the accompanying yearly State of DevOps reports. Do what works in your context, but stay informed of what is working well for others, to make better and better choices as you go.
One particular area that has been getting more refined is Site Reliabilty Engineering. There are three great books I read over the last year that provide a peak into some experiences and experiments.
This trio of books is a treasure trove of ideas, techniques, practices, and organizational approaches for improving both the delivery of value to production, and how the teams around that are organized:
- Site Reliability Engineering: How Google Runs Production Systems,
- The Site Reliability Engineering Workbook: Practical Ways to Implement SRE
- Seeking SRE: Conversations About Running Production Systems at Scale.
SRE is all about applying a software development mindset to infrastructure and operations.
I particularly enjoyed 'Seeking SRE' which is a series of essays. Each chapter stands on its own, and several are based on years of history and experience reports at well known companies.