Sticky sessions enable users to have some control over which replicas are accepting traffic. How does this work?

  1. First, sticky sessions must be enabled for the user. Ask your account owner.
  2. To use sticky sessions, users must:
  • Generate a unique session ID
  • Pass that session ID in the request header field x-baseten-session-id
headers["x-baseten-session-id"] = session_id

How Sticky Sessions Work

Sticky sessions use a ring hash. What it means is that the session ID you generated is hashed to a value which maps to one of the active replicas in your pool. As long as the nodes stay the same, requests with the same session ID will map to the same replica.

Questions

What happens when a new replica comes up?

Some % of requests (how many?) will be re-routed from one of an existing replicas to the new one. Note that if using sticky sessions for features like KV cache reuse, some of these existing requests will be slower while the new replica catches up on the cache

What happens when a replica goes down?

Same as above