Å·±¦ÓéÀÖ

Jump to ratings and reviews
Rate this book

Distributed Systems Observability

Rate this book
Network infrastructure is in the midst of a paradigm-shift. As systems become more distributed, methods for building and operating them are rapidly evolving—and that makes visibility into our services and infrastructure more important than ever.

In this practical e-book, author Cindy Sridharan examines new monitoring tools that, while promising, bring their own set of technical and organizational challenges.

Platforms such as Kubernetes have solved several problems that traditional monitoring tools used to flag, but partial, implicit, and “soft� failure modes have risen along with the overall complexity of the system.

This e-book provides an honest overview of monitoring challenges and trade-offs to help you choose the best observability strategy for your distributed system.

36 pages, ebook

Published May 1, 2018

2 people are currently reading
280 people want to read

About the author

Cindy Sridharan

1Ìýbook8Ìýfollowers

Ratings & Reviews

What do you think?
Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars
10 (10%)
4 stars
38 (41%)
3 stars
35 (38%)
2 stars
6 (6%)
1 star
2 (2%)
Displaying 1 - 19 of 19 reviews
Profile Image for Srinivasan Nanduri.
462 reviews10 followers
July 15, 2024
A good read to get into the subject of Observability in the Cloud Native world. Here are the key notes:

- **Observability Defined**: The ability to understand the internal state of a system by examining its outputs.
- **Importance**: Critical for managing complex distributed systems, ensuring reliability, performance, and user satisfaction.

### Chapter 1: The Three Pillars of Observability

1. **Logs**:
- Unstructured or semi-structured data.
- Useful for understanding discrete events and debugging.
2. **Metrics**:
- Numeric data measured over intervals.
- Essential for performance monitoring and alerting.
3. **Traces**:
- Show the path and duration of requests.
- Crucial for understanding dependencies and performance bottlenecks.

### Chapter 2: Instrumentation for Observability

- **Best Practices**:
- Instrumenting code to capture relevant data.
- Using open standards like OpenTelemetry.
- Ensuring low overhead and minimal performance impact.
- **Tools**:
- Prometheus for metrics.
- Fluentd or Logstash for logs.
- Jaeger or Zipkin for traces.

### Chapter 3: Building an Observability Pipeline

- **Data Collection**: Agents and libraries gather telemetry data.
- **Data Storage**: Efficient, scalable storage solutions.
- **Data Analysis**: Tools and platforms to query and visualize data.
- **Alerting**: Automated alerts based on predefined thresholds and anomalies.

### Chapter 4: Monitoring and Alerting

- **Alerting Strategies**:
- Threshold-based alerts for known conditions.
- Anomaly detection for unusual patterns.
- **Alert Fatigue**: Importance of tuning alerts to reduce noise.
- **Dashboards**: Real-time visualization of key metrics and health indicators.

### Chapter 5: Correlating Data

- **Cross-Referencing Logs, Metrics, and Traces**: Provides a comprehensive view of system health.
- **Contextual Data**: Enriching telemetry with metadata for better analysis.
- **Root Cause Analysis**: Identifying and addressing underlying issues quickly.

### Chapter 6: Observability in Practice

- **Case Studies**: Real-world examples of observability implementation.
- **Challenges**: Common pitfalls such as data silos, lack of standardization, and high cardinality issues.
- **Best Practices**:
- Centralized observability platforms.
- Regular audits and updates to observability strategies.
- Cross-team collaboration for comprehensive coverage.

### Conclusion

- **Evolving Needs**: Observability must adapt to changing architectures and technologies.
- **Future Trends**: Increasing role of AI/ML in enhancing observability, predictive analytics, and automated remediation.

### Key Takeaways

- **Holistic Approach**: Observability requires integrating logs, metrics, and traces.
- **Proactive Monitoring**: Continuous monitoring and alerting to preemptively address issues.
- **Collaboration**: Effective observability involves coordination across development, operations, and business teams.
- **Scalability**: Solutions must be scalable to handle growing data volumes and complexity.

### Recommended Tools and Technologies

- **Metrics**: Prometheus, Grafana
- **Logs**: Elasticsearch, Fluentd, Logstash
- **Traces**: Jaeger, Zipkin
- **Standards**: OpenTelemetry
Profile Image for Erkan Erol.
104 reviews15 followers
October 6, 2018
Very informative in introduction level

My Clippings
---
Monitoring of yore might have been the preserve of operations engineers, but
observability isn’t purely an operational concern. This is a book authored by a
software engineer, and the target audience is primarily other software developers,
not solely operations engineers or site reliability engineers (SREs).
---
complex systems fail in complex ways
---
Indeed, tracing is most successfully deployed in organizations that use a core set
of languages and frameworks uniformly across the company
---
The goal of an Observability team is not to collect logs, metrics, or traces. It is to
build a culture of engineering based on facts and feedback, and then spread that
culture within the broader organization.
Profile Image for Ivano.
5 reviews1 follower
August 22, 2018
Short and to the point. The author introduces the problems faced in designing and operating distributed systems, outlining what is considered a good approach.
The book shortly references modern tools of the trade and highlights the fact that observability can only be achieved by the whole team during all phases of the software lifetime, from design to rollout.
This is a solid introduction to the topic, don't expect to find a detailed analysis of all the steps to reach the goal set out by the title.
13 reviews
April 7, 2019
Short and pretty straight forward. Observability is just not about monitoring, testing and event logs, It is more about finding the things which aren't seen by the tools. It's like iterating over and over and improving the product based on the business and tech requirements. For some businesses monitoring and alerting are fine but for some finding, the needle-in-a-haystack could be the issue. As she said choose your observability targets and improve the product.
Profile Image for Marcin Jałbrzykowski.
20 reviews9 followers
January 6, 2020
This book is a brief and high-level introduction to the concept of Observability in the distributed systems. All three pillars of Observability have been discussed. I think it is a good starting point to more advanced books about Observability like "The Site Reliability Workbook: Practical Ways to Implement SRE"
Profile Image for Michael Koltsov.
109 reviews69 followers
June 24, 2020
The best illustration for this book would be one of the Captain Obvious memes. Thanks to the book I was able to find a few tools I've never heard before, but that's it. Nothing new, nothing interesting in particular. It'd be better if this book was simply compressed into a series of blog posts. In reality, the author retells most of the books core concepts in a Medium post.
Profile Image for Allisonperkel.
832 reviews37 followers
November 20, 2021
This was a really wonderful write up on how to think about views into your system and how to think about the process of building and testing your system as it comes together. As a means to get your thoughts flowing this is a perfect quick read.
Profile Image for Damon.
6 reviews
January 7, 2023
A good primer for observability that gets surprisingly dense. It’s a bit heavy on mentions of specific tools that are unfamiliar to me but still improved my understanding of the basic concepts of observability.
Profile Image for David Carpinteiro.
91 reviews
November 1, 2024
Presents a very clean and concise idea and techniques for properly observing and debugging a distributed system.
The book is small but quite clear.
I would say the only thing lacking is some examples as it's quite theoretical.
Profile Image for ingrid.
37 reviews30 followers
September 2, 2018
It’s a good book, but feels more like a high-level introduction and less like a guide. I would’ve loved more examples and lessons learned perhaps
2 reviews1 follower
February 11, 2019
Clear and concise guide to the authors perspective on observability and how to begin the journey.
Profile Image for Dave Cremins.
82 reviews
December 4, 2020
Good introduction but is short on technical details on how to instrument for large distributed systems
Profile Image for kevin.
108 reviews6 followers
December 20, 2020
Good short read of logging, metrics and traces. Should just read it :-)
Profile Image for Carter.
597 reviews
December 27, 2021
This book is a short read; it seems relatively more well informed, though from such a short work, it is unclear, what the basis of this line of reasoning is. Recommended.
10 reviews2 followers
Read
February 7, 2022
short, but a very nice introduction. see also the excellent Testing in prod blog post series
10 reviews
February 17, 2022
Awesome insights and philosophical discussion on 70% of the book, but only the last chapter is really about observability. It could have more on good practices and case studies
This entire review has been hidden because of spoilers.
Profile Image for Mr. Musale.
64 reviews10 followers
December 29, 2020
A very informative book and I highly recommended it for anyone interested in distributed systems but they don't know where to start.
Displaying 1 - 19 of 19 reviews

Can't find what you're looking for?

Get help and learn more about the design.