The "logs and a Grafana dashboard" era is over. The modern observability stack is denser, more correlated, and more useful.
Logs were the first observability primitive. Then metrics became standard. Then distributed tracing arrived. The three-pillar model is mature, but the connection between the pillars is where the latest gains live.
The three pillars
- Logs: structured records of discrete events. Best for exact-cause root analysis.
- Metrics: numerical aggregates over time. Best for SLOs, alerting, and trend analysis.
- Traces: end-to-end request paths across services. Best for understanding distributed system behavior.
Exemplars: the connecting tissue
An exemplar is a metric data point that includes a pointer to a representative trace. When your latency histogram shows a p99 spike, clicking the spike opens an actual trace from a request that experienced it. This is the feature that turns dashboards from "interesting" to "actionable" in incident response.
OpenTelemetry: the standard
OpenTelemetry (OTel) is now the default instrumentation library across languages. Replace your vendor-specific Datadog, New Relic, or proprietary SDK with OTel; you can route the same data to any backend (Datadog, Honeycomb, Grafana Cloud, Jaeger). Vendor lock-in is no longer the price of observability.
What to instrument
- HTTP/gRPC servers (auto-instrument via OTel SDK).
- Database clients (auto-instrument).
- Outbound HTTP calls to dependencies.
- Background job queues — frame each job as a span.
- Critical business spans (e.g., "process_payment", "send_invoice") with attributes (customer_id, amount).
What to alert on
Alert on SLO burn rate, not on individual error counts. A multi-window, multi-burn-rate alerting policy from Google's SRE workbook is the gold standard. It catches both fast burns (immediate page) and slow burns (something is degraded) without false positives.
What we install
For most clients: OpenTelemetry SDK + Collector deployed as a sidecar or daemonset, routing to Grafana Cloud for storage and visualization. Total monthly cost for a mid-size workload: under $500. The signal-to-noise ratio improvement is dramatic.