Big Picture: Why bother with SLOs?

2022-12-12 Big picture SLO Reliability

Service Level Objectives (SLOs) are meant to be a common language between the user, the product manager and the engineer. Effectively, SLOs (and ultimately SLAs) are the customer-centric expressions of SLIs (Service Level indicators).

You can think of SLIs as a series of curated ‘inside the box’ metrics.

‘Curated’ because whilst many metrics deriving from our instrumentation tools are available, the usefulness of most only surfaces at the advanced stages of investigations. Many can be hard to reason about in isolation, discouraging their use to communicate a tangible view of the system’s status.

We use SLIs to understand our systems from our perspective, but they are indelibly tied to the SLO/SLA that encapsulates customer behaviours. The latter helps obfuscate the complexity of the implementation from the customer but still provides usefulness to us operationally.

SLO can also be a way for us to internally enforce a visible and explicit contract/guarantees to the team behind other services (upstream and downstream) and therefore inform top-level-SLOs (which provide a view of how individual components impact the system).

Customer satisfaction as the North Star

A good SLO maps to user happiness and gives us visibility into it. Since we are a mission-critical part of the development lifecycle for many companies, we tend to receive less slack than others would. The concept of reliability “budget” helps determine when too much velocity/not enough investment is the platform’s stability is bringing too much disruption in our customers’ experience.