Service errors and warnings

Nexthink constantly analyzes the state of services and provides information with respect to potential errors or warnings.

Connection-based services

Applies to platforms | Windows | macOS |

Type
Description

Failed connections

Device-level error

A device is marked in error state if it fails to connect to the destination for 60 seconds.

Application crashes

Device-level error

A device is marked in error state if the binary used to connect to the service experiences an application crash.

Network response time

Entity-level warning

All active devices in an entity are marked in warning state if the average network response time for the entity is 3 times greater than the automatically computed baseline for the previous 7 days.

Web-based services

Applies to platforms | Windows | macOS |

Type
Description

Application crashes

Device-level error

A device is marked in error state if the binary used to connect to the service experiences an application crash.

Failed HTTP request (5xx)

Entity-level error

All active devices in an entity are marked in error state if the total number of failed HTTP requests with status 5xx is 3 times greater than the automatically computed baseline for the previous 7 days.

Failed HTTP request (4xx)

Entity-level warning

All active devices in an entity are marked in warning state if the total number of failed HTTP requests with status 4xx is 3 times greater than the automatically computed baseline for the previous 7 days.

Redirected HTTP request (3xx)

Entity-level warning

All active devices in an entity are marked in warning state if the total number of redirected HTTP requests with status 3xx is 3 times greater than the automatically computed baseline for the previous 7 days.

Web request duration

Entity-level warning

All active devices in an entity are marked in warning state if the average web request duration for the entity is 3 times greater than the automatically computed baseline for the previous 7 days.

Computation of averages and detection of outliers

Metrics described in the table above as Entity-level errors and warnings are computed for a set of devices instead of individual devices. The goal is to reduce false positives on metrics which are subject to a high degree of variation. For instance, a device might experience a long network response time during a few connections, but this usually does not mean that the service is compromised for this device. By computing such metrics at the entity (or location) level, we can obtain a more accurate representation of the actual quality of service.

For every metric that is computed at the Entity-level, there are minimum limits defined for issuing warnings. These are absolute minimum values below which the service quality is guaranteed, even in the case of a baseline violation. For a given metric, if the baseline is very low because the service has been performing extremely well in the past, even in the case that the computed average for the period is 3 times higher than the baseline, a warning is not issued when the average does not exceed the minimum limit.

In addition, an algorithm is put in place to detect and eliminate outliers. If a limited number of devices cause the mean value to exceed the error or warning level, the algorithm removes them from the computation of the baseline. The maximum number of devices that the algorithm can consider as outliers depend on the total number of devices in the Entity:

  • 10% of the devices, if the total number of devices in the Entity is less than 100.

  • 10 devices, if the total number of devices in the Entity is greater than or equal to 100.


RELATED CONCEPT

Last updated