This blog post is aimed at GoLang developers looking to improve their services’ observability. It skips the basics and jumps straight to advanced topics, such as asynchronous structured logging, metrics with exemplars, tracing with TraceQL, aggregating pprof and continuous profiling, microbenchmarks and basic statistics with benchstat, blackbox performance tests, and basic PID controllers for determining a system’s maximum load. We’ll also briefly touch on current research in the observability space, including active casual profiling and passive critical section detection.
The Three Pillars of Observability: Logs, Metrics, Traces
If you’re reading this, you likely don’t need a refresher on the basics of observability. Let’s dive into the non-obvious stuff and focus on making it as easy as possible to move between the three main observability surfaces. We’ll also discuss how to add tracing to the mix so that pprof data can be linked to tracing and back.
If you’re instead looking for a short and straightforward introduction to monitoring basics and ways to introduce basic observability into your service quickly, “Distributed Systems Observability” by Cindy Sridharan is a great place to start.
Structured Logging
Logging can become a bottleneck if you’re not using a zero-allocation logging library. If you haven’t already, consider using zap or zerolog – both are great choices.
zerolog | 767 ns/op | 552 B/op | 6 allocs/op |
---|---|---|---|
zap | 848 ns/op | 704 B/op | 2 allocs/op |
go-kit | 3614 ns/op | 2895 B/op | 66 allocs/op |
logrus | 5661 ns/op | 6092 B/op | 78 allocs/op |
Golang has also an ongoing proposal for introducing structured logging: slog. Be sure to check it out and provide feedback on the proposal!
Structured logging is essential for extracting data from logs. Adopting a json or logfmt format simplifies ad-hoc troubleshooting and allows for quick and dirty graphs/alerts while you work on proper metrics. Most log libraries also have ready-to-use hooks for gRPC/HTTP clients and servers and common database clients, which greatly simplifies their introduction into existing codebases.
If you find text-based formats inefficient, you can optimize your logging to a great extent. For instance, zerolog supports binary CBOR format, and Envoy has protobufs for their structured access logs.
In some cases, logs themselves can become performance bottlenecks. You don’t want your service to get stuck because Docker can’t pull events out of the stderr pipe fast enough when you enable DEBUG logs.
One solution is to sample your logs:
sampled := log.Sample(zerolog.LevelSampler{
DebugSampler: &zerolog.BurstSampler{
Burst: 5,
Period: 1*time.Second,
NextSampler: &zerolog.BasicSampler{N: 100},
},
})
Alternatively, you can make their emission fully asynchronous so they never block:
There is no ads to display, Please add some