This root cause analysis concerns a material change in the recorded Time to First byte (TTFB) for users. This RCA is triggered when your median TTFB breaks through the realtime expected bounds of normal operation (the Three-sigma limit). This is a significant change in infrastructure response and should be investigated further. The Three-sigma limit is a statistical calculation wherein the data are within three standard deviations from a mean.
Network issues often present as a specific source or destination exhibiting symptoms (e.g. high Time to First Byte for users in Australia). Global events, in which servers are slow to respond to users across multiple networks often point to infrastructure under load (cache engines, database servers, or general infrastructure fatigue).
Three-sigma limits are used to set the upper control limits in statistical quality control charts. Control charts are also known as Shewhart charts, named after Walter A. Shewhart, an American physicist, engineer and statistician (1891–1967).1 Control charts are based on the theory that even in perfectly designed processes, a certain amount of variability in output measurements is inherent.
Control charts determine if there is a controlled or uncontrolled variation in a process. Variations in process quality due to random causes are said to be in-control; out-of-control processes include both random and special causes of variation. Control charts are intended to determine the presence of special causes.
To measure variations, statisticians and analysts use a metric known as the standard deviation, also called sigma. Sigma is a statistical measurement of variability, showing how much variation exists from a statistical average. In a general sense, 3 sigma moves in a stable metric such as TTFB should occur less than .3% of the time (aka a rare event).
If the latest Median TTFB > expected upper limit of TTFB AND the latest TTFB is > 490ms AND more than 10 users were impacted.
Your webserver has experienced an abnormal increase in Time to First Byte, and the median Time to First Byte is now high.
If you run the infrastructure for this site, we suggest you look at server metrics (CPU/Memory and Network stats). If you are using a shared hosting provider (Kinsta, WordPress) or platform (Shopify®, Magento® Cloud, etc.) , or a Content Delivery Service (Cloudflare®, Akamai®, etc) we suggest reaching out to their support team with the attached data included in the Edgemesh Root Cause Analysis payload. Since this RCA is specifically tied to a material change in your site’s response time, you may be in the early stages of an incident.