Customize the health of your deployments.
stop_traffic_threshold_seconds
must be between 30
and 1800
seconds, inclusive.restart_check_delay_seconds
must be between 0
and 1800
seconds, inclusive.restart_threshold_seconds
must be between 30
and 1800
seconds, inclusive.restart_check_delay_seconds
and restart_threshold_seconds
must not exceed 1800
seconds. config.yaml
.
remote_config
to configure health checks for your chainlet classes.
_is_healthy
is set in predict
), restarting before stopping traffic is generally better, as it allows recovery without disrupting traffic.
Stopping traffic first may be preferable if a failing replica is actively degrading performance or causing inference errors, as it prevents the failing replica from affecting the overall deployment while allowing time for debugging or recovery.
restart_check_delay_seconds
?restart_check_delay_seconds
to allow replicas sufficient time to initialize after deployment or a restart. This delay helps reduce unnecessary restarts, particularly for services with longer startup times.