Curated articles, resources, tips and trends from the DevOps World.
Summary: This is a summary of an article originally published by the source. Read the full original article here →
But even more difficult was to figure out what’s going on under the hood, and how to prevent it from happening again.
This is what brought us to think about troubleshooting in the context of three pillars: I’m going to dive into how we envision these three pillars, and how they helped us to conceive of what’s needed to be able to properly troubleshoot real-world Kubernetes stacks that are the hallmark of complex, distributed systems.
To try and derive some understanding of what actually happened in the system that triggered this failure, developers will start by analyzing the changes to the system and what was changed that could have caused this to happen.
We then take a look at the fancy metrics, dashboards, and data that we created for just this very moment, to extract some kind of understanding of what is going wrong, based on tangible data sources.
Made with pure grit © 2024 Jetpack Labs Inc. All rights reserved. www.jetpacklabs.com