Implementing Azure DevOps Solutions
上QQ阅读APP看书,第一时间看更新

Mean time to recovery

A third metric is the mean time to recovery. How long does it take you to restore service in case of a (partial) outage? In the past, companies focused on reducing the mean time between failures. This used to be the mean indicator of the stability of a product. However, this metric encourages limiting the number of changes going to production. The unwanted consequence often is that outages, though maybe rare, last long and are hard to fix.

Measuring the mean time to recovery shifts the attention to how quickly you can remediate an outage. If you can fix outages quickly, you achieve the same, namely, minimizing the amount of downtime without sacrificing the rate of change. The goal is to minimize the time to recovery.