Skip to main content
Baseten runs health checks every 10 seconds on each replica of your deployment. When a health check fails long enough to cross a configured threshold, Baseten takes action: stopping traffic to the replica, restarting it, or both. You can customize health checks in two ways:

Health probes

Baseten uses three Kubernetes health probes: startup, readiness, and liveness. Each serves a different purpose in the replica lifecycle.
Diagram showing how startup, readiness, and liveness health probes work in Baseten

Startup probe

The startup probe confirms your model has finished initializing. For Truss models, initialization is complete when load() finishes and the optional is_healthy() check passes. For custom servers, the readiness endpoint must return a successful response. All readiness and liveness probes are delayed until the startup probe succeeds. The startup phase runs for 30 minutes by default. Extend it with startup_threshold_seconds up to 50 minutes (3000 seconds) for models that need more time to load. The startup probe uses the same endpoint as the readiness probe. You can’t configure a separate startup endpoint.

Readiness probe

The readiness probe determines whether a replica can accept traffic. When it fails, Kubernetes stops routing requests to the replica but doesn’t restart it. Configure the failure window with stop_traffic_threshold_seconds.

Liveness probe

The liveness probe determines whether a replica is still functioning. When it fails, Kubernetes restarts the replica to recover from deadlocks or hung processes. Configure the failure window with restart_threshold_seconds.
For most models, using the same endpoint (like /health) for both readiness and liveness probes is sufficient. The difference is the action taken: readiness controls traffic routing, liveness controls container lifecycle.

Health check configuration

Parameters

Customize health checks by setting these parameters:
startup_threshold_seconds
integer
default:1800
How long the startup phase runs before marking the replica as unhealthy. During this phase, readiness and liveness probes don’t run.startup_threshold_seconds must be between 10 and 3000 seconds, inclusive. Defaults to 30 minutes (1800 seconds).
stop_traffic_threshold_seconds
integer
default:1800
How long health checks must continuously fail before Baseten stops traffic to the failing replica.stop_traffic_threshold_seconds must be between 10 and 3000 seconds, inclusive. Defaults to 30 minutes (1800 seconds).
restart_threshold_seconds
integer
default:1800
How long health checks must continuously fail before Baseten restarts the failing replica.restart_threshold_seconds must be between 10 and 3000 seconds, inclusive. Defaults to 30 minutes (1800 seconds).
restart_check_delay_seconds
integer
deprecated
How long to wait before running health checks. Must be between 0 and 3000 seconds, inclusive.
restart_check_delay_seconds is deprecated. Use startup_threshold_seconds instead. The startup probe delays all health checks until your model passes its first readiness check, preventing unnecessary restarts during initialization.
The combined value of restart_check_delay_seconds and restart_threshold_seconds can’t exceed 3000 seconds.

Model and custom server deployments

Configure health checks in your config.yaml.
config.yaml
runtime:
  health_checks:
    startup_threshold_seconds: 2400
    restart_threshold_seconds: 600
    stop_traffic_threshold_seconds: 300
You can also specify custom health check endpoints for custom servers. See custom server configuration for details.

Chains

Use remote_config to configure health checks for your chainlet classes.
chain.py
class CustomHealthChecks(chains.ChainletBase):
    remote_config = chains.RemoteConfig(
        options=chains.ChainletOptions(
            health_checks=truss_config.HealthChecks(
                startup_threshold_seconds=2400,
                restart_threshold_seconds=600,
                stop_traffic_threshold_seconds=300,
            )
        )
    )

Custom health check logic

You can write custom health checks in both model deployments and chain deployments.
Custom health checks aren’t supported in development deployments.

Custom health checks in models

model.py
class Model:
    def is_healthy(self) -> bool:
        # Add custom health check logic for your model here
		pass

Custom health checks in chains

Health checks can be customized for each chainlet in your chain.
chain.py
@chains.mark_entrypoint
class CustomHealthChecks(chains.ChainletBase):
    def is_healthy(self) -> bool:
        # Add custom health check logic for your chainlet here
        pass

Health checks in action

5xx error detection

Create a custom health check to identify 5xx errors:
model.py
class Model:
    def __init__(self):
        ...
        self._is_healthy = True

    def load(self):
        # Perform load
        # Your custom health check won't run until after load completes
        ...

    def is_healthy(self):
        return self._is_healthy

    def predict(self, input):
        try:
            # Perform inference
            ...
        except Some5xxError:
            self._is_healthy = False
            raise
A custom health check failure produces this log:
Example health check failure log line
Jan 27 10:36:03pm md2pg Health check failed.
A restart from health check failure produces this log:
Example restart log line
Jan 27 12:02:47pm zgbmb Model terminated unexpectedly. Exit code: 0, reason: Completed, restart count: 1

FAQs

Is there a rule of thumb for configuring thresholds for stopping traffic and restarting?

It depends on your health check implementation. If your health check relies on conditions that only change during inference (for example, _is_healthy is set in predict), restarting before stopping traffic is generally better, as it allows recovery without disrupting traffic. Stopping traffic first may be preferable if a failing replica is actively degrading performance or causing inference errors, as it prevents the failing replica from affecting the overall deployment while allowing time for debugging or recovery.

When should I configure startup_threshold_seconds?

The default startup phase of 30 minutes is sufficient for most models. Configure startup_threshold_seconds if your model takes longer than 30 minutes to load weights or initialize. The maximum is 50 minutes (3000 seconds).
restart_check_delay_seconds is deprecated. If you’re currently using it, switch to startup_threshold_seconds, which delays health checks until your model is ready.

Why am I seeing two health check failure logs in my logs?

These refer to two separate health checks we run every 10 seconds:
  • One to determine when to stop traffic to a replica.
  • The other to determine when to restart a replica.

Does stopped traffic or replica restarts affect autoscaling?

Yes, both can impact autoscaling. If traffic stops or replicas restart, the remaining replicas handle more load. If the load exceeds the concurrency target during the autoscaling window, additional replicas are spun up. Similarly, when traffic stabilizes, excess replicas are scaled down after the scale down delay. See how autoscaling works for details.

How do health checks affect billing?

You’re billed for the uptime of your deployment. This includes the time a replica is running, even if it’s failing health checks, until it scales down.

Will failing health checks cause my deployment to stay up forever?

No. If your deployment is configured with a scale down delay and the minimum number of replicas is set to 0, the replicas will scale down once the model is no longer receiving traffic for the duration of the scale down delay. This applies even if the replicas are failing health checks. See scale to zero for details.

What happens when my deployment is loading?

When your deployment is loading, your custom health check won’t be running. Once load() is completed, we’ll start using your custom is_healthy() health check.