Directing traffic: Demystifying internet-scale load balancing Common techniques used to balance network traffic come with advantages and trade-offs.
What breaks our systems: A taxonomy of black swans Find and fix outlier events that create issues before they trigger severe production problems.
What MMORPGs can teach us about leveling up a heroic developer team The team-building skills that make winning gaming guilds also produce successful work teams.
How Instagram is scaling its infrastructure across the ocean Scaling up your infrastructure is especially challenging when the next data center is on another continent.
What is an SRE and how does it relate to DevOps? The SRE role is common in large enterprises, but smaller businesses need it, too.
We already have nice things, and other reasons not to write in-house ops tools Let's look at the pitfalls of writing in-house ops tools, the circumstances that justify it, and how to do it better.