The Value of a Meteor-Ready Plan for Disaster Resilience

4 years ago thenewstack.io

Summary: This is a summary of an article originally published by The New Stack. Read the full original article here →

https://www.blameless.com/ In site reliability engineering (SRE), we believe that some failure is inevitable. Complex systems receiving updates will eventually experience incidents that you can’t anticipate.

Today, we’ll talk about a third type of readiness: having backup systems and redundancies to quickly restore function when things go very wrong.

No matter what goes wrong, you just switch to the backup system for a bit, and then everything is fine, right?

Consider https://www.blameless.com/incident-response/on-call-team-faces-worst-case-sunday-scaries shared with me by an engineer about a nightmare outage caused by a total wipe of their databases.

DevOps Articles