Curated articles, resources, tips and trends from the DevOps World.
Summary: This is a summary of an article originally published by The New Stack. Read the full original article here →
https://www.blameless.com/ In site reliability engineering (SRE), we believe that some failure is inevitable. Complex systems receiving updates will eventually experience incidents that you can’t anticipate.
Today, we’ll talk about a third type of readiness: having backup systems and redundancies to quickly restore function when things go very wrong.
No matter what goes wrong, you just switch to the backup system for a bit, and then everything is fine, right?
Consider https://www.blameless.com/incident-response/on-call-team-faces-worst-case-sunday-scaries shared with me by an engineer about a nightmare outage caused by a total wipe of their databases.
Made with pure grit © 2024 Jetpack Labs Inc. All rights reserved. www.jetpacklabs.com