As you can see in the diagram above, Stackdriver Trace lets you examine the complete code path and determine the root of the high latency call.
Examining application output using Stackdriver Logging
The final telemetry component that Istio provides is the ability to direct logs to Stackdriver Logging. By themselves, logs are useful for examining application status or debugging individual functions and processes. And with Istio’s telemetry components sending metrics, trace data, and logging output to Stackdriver, you can tie all of your application’s events together. Istio’s Stackdriver integration allows you to quickly navigate between monitoring dashboards, request traces, and application logs. Taken together, this information gives you a more complete picture of what your app is doing at all times, which is especially useful when an incident or policy violation occurs.
Stackdriver Logging’s integration comes full circle with Stackdriver Monitoring by giving you the ability to create metrics based on structured log messages. That means you can create specific log-based metrics, then add them to your monitoring dashboards right alongside your other application monitoring metrics. And Stackdriver Logging also provides additional integrations with other parts of Google Cloud—specifically, the ability to automatically export logs to Cloud Storage or BigQuery for retention and follow-on ad-hoc analysis, respectively. Stackdriver Logging also supports integration with Cloud Pub/Sub where each output log entry is exported as an individual pub/sub message, which can then be analyzed in real-time using Cloud Dataflow or Cloud Dataproc.
Coming soon: SLOs and service monitoring using Stackdriver
So far we’ve reviewed the various mechanisms Stackdriver provides to assess your application’s SLIs; and now available for early access, Stackdriver will provide native support for setting SLOs against your specific service metrics. That means you will be able to set specific SLO targets for the metrics you care about, and Stackdriver will automatically generate SLI graphs, and track your target compliance over time. If any part of your workload violates your SLOs, you are immediately alerted to take action.
SRE isn’t about tools; it’s a lifestyle
Think of SRE as a set of practices, and not as a specific set of tools or processes. It’s a principled approach to managing software reliability and availability, through the constant awareness of key metrics (SLIs) and how those metrics are measured against your own targets (SLOs)—which you might use to provide guarantees to your customers (via SLAs). When you combine the power of Istio and Stackdriver and apply it to your own Kubernetes-based workloads, you end up with an in-depth view of your services and the ability to diagnose and debug problems before they become outages.
As you can see, Istio provides a number of telemetry features for your deployments. And when combined with deep Stackdriver integration, you can develop and implement your own SRE practices.
We haven’t even begun to scratch the surface on defining SRE and these terms so we’d recommend taking a look at SRE Fundamentals: SLIs, SLAs, and SLOs as well as SLOs, SLIs, SLAs, oh my – CRE life lessons for more background.
To try out the Istio and Stackdriver integration features we discussed here, check out the tutorial here. In our next post in the Service Mesh era series, we’ll take a deep-dive into Istio from an IT perspective and talk about some practical operator scenarios, like maintenance, upgrades, and debugging Istio itself.