🆕 Custom health checks
Customize the health of your deployments.
Why use custom health checks?
- Control traffic and restarts by configuring failure thresholds to suit your needs.
- Define replica health with custom logic (e.g. fail after a certain number of 500s or a specific CUDA error).
By default, health checks run every 10 seconds to verify that each replica of your deployment is running successfully and can receive requests. If a health check fails for an extended period, one or both of the following actions may occur:
- Traffic is immediately stopped from reaching the failing replica.
- The failing replica is restarted.
The thresholds for each of these actions are configurable.
Custom health checks can be implemented in two ways:
- Configuring thresholds for when health check failures should stop traffic to or restart a replica.
- Writing custom health check logic to define how replica health is determined.
Configuring health checks
Parameters
You can customize the behavior of health checks on your deployments by setting the following parameters:
The duration that health checks must continuously fail before traffic to the failing replica is stopped.
stop_traffic_threshold_seconds
must be between 30
and 1800
seconds, inclusive.
How long to wait before running health checks.
restart_check_delay_seconds
must be between 0
and 1800
seconds, inclusive.
The duration that health checks must continuously fail before triggering a restart of the failing replica.
restart_threshold_seconds
must be between 30
and 1800
seconds, inclusive.
restart_check_delay_seconds
and restart_threshold_seconds
must not exceed 1800
seconds. Model and custom server deployments
Configure health checks in your config.yaml
.
You can also specify custom health check endpoints for custom servers. See here for more details.
Chains
Use remote_config
to configure health checks for your chainlet classes.
Writing custom health checks
You can write custom health checks in both model deployments and chain deployments.
Custom health checks in models
Custom health checks in chains
Health checks can be customized for each chainlet in your chain.
Health checks in action
Identifying 5xx errors
You might create a custom health check to identify 5xx errors like the following:
Custom health check failures are indicated by the following log:
Deployment restarts due to health check failures are indicated by the following log:
FAQs
Is there a rule of thumb for configuring thresholds for stopping traffic and restarting?
It depends on your health check implementation. If your health check relies on conditions that only change during inference (e.g., _is_healthy
is set in predict
), restarting before stopping traffic is generally better, as it allows recovery without disrupting traffic.
Stopping traffic first may be preferable if a failing replica is actively degrading performance or causing inference errors, as it prevents the failing replica from affecting the overall deployment while allowing time for debugging or recovery.
When should I configure restart_check_delay_seconds
?
Configure restart_check_delay_seconds
to allow replicas sufficient time to initialize after deployment or a restart. This delay helps reduce unnecessary restarts, particularly for services with longer startup times.
Why am I seeing two health check failure logs in my logs?
These refer to two separate health checks we run every 10 seconds:
- One to determine when to stop traffic to a replica.
- The other to determine when to restart a replica.
Does stopped traffic or replica restarts affect autoscaling?
Yes, both can impact autoscaling. If traffic stops or replicas restart, the remaining replicas handle more load. If the load exceeds the concurrency target during the autoscaling window, additional replicas are spun up. Similarly, when traffic stabilizes, excess replicas are scaled down after the scale down delay. See here for more details on autoscaling.
How does billing get affected?
You are billed for the uptime of your deployment. This includes the time a replica is running, even if it is failing health checks, until it scales down.
Will failing health checks cause my deployment to stay up forever?
No. If your deployment is configured with a scale down delay and the minimum number of replicas is set to 0, the replicas will scale down once the model is no longer receiving traffic for the duration of the scale down delay. This applies even if the replicas are failing health checks. See here for more details on autoscaling.
Was this page helpful?