> ## Documentation Index
> Fetch the complete documentation index at: https://docs.baseten.co/llms.txt
> Use this file to discover all available pages before exploring further.

# Autoscaling

> Configure autoscaling to dynamically adjust replicas based on traffic while minimizing idle compute costs.

export const AutoscalerSimPanels = () => {
  const ref = React.useRef(null);
  const init = React.useRef(false);
  React.useEffect(() => {
    if (!ref.current || init.current) return;
    init.current = true;
    let cleanup = null, destroyed = false, timer = null, retries = 0;
    const tryMount = () => {
      if (destroyed || !ref.current) return;
      if (window._asEngine && window._asState && window._asSlots) {
        cleanup = mount();
      } else if (retries++ < 120) {
        timer = setTimeout(tryMount, 30);
      }
    };
    function mount() {
      const E = window._asEngine, S = window._asSlots;
      const math = document.createElement("div");
      const mHead = document.createElement("div");
      mHead.style.cssText = "display:flex;justify-content:space-between;align-items:baseline;gap:8px;cursor:pointer;user-select:none";
      const mTitle = document.createElement("span");
      mTitle.style.cssText = "display:flex;align-items:center;gap:6px;font:500 11px ui-monospace,Menlo,monospace;text-transform:lowercase;color:#869089";
      const mArrow = document.createElement("span");
      mArrow.textContent = "▸";
      mArrow.style.cssText = "font-size:9px;display:inline-block;transition:transform 0.2s ease";
      const mName = document.createElement("span");
      mName.textContent = "autoscaler math";
      mTitle.appendChild(mArrow);
      mTitle.appendChild(mName);
      const mSub = document.createElement("span");
      mSub.style.cssText = "font:400 10px ui-monospace,Menlo,monospace;color:#869089";
      mHead.appendChild(mTitle);
      mHead.appendChild(mSub);
      math.appendChild(mHead);
      const mBody = document.createElement("div");
      mBody.style.cssText = "display:none;margin:8px 0 0";
      math.appendChild(mBody);
      mHead.onclick = () => {
        const open = mBody.style.display !== "none";
        mBody.style.display = open ? "none" : "";
        mArrow.style.transform = open ? "" : "rotate(90deg)";
      };
      const mRows = [];
      for (let i = 0; i < 4; i++) {
        const r = document.createElement("div");
        r.style.cssText = "display:flex;justify-content:space-between;padding:5px 0;font:400 11.5px system-ui,-apple-system,sans-serif;align-items:baseline;gap:8px";
        if (i === 3) r.style.fontWeight = "500";
        const a = document.createElement("span");
        a.style.cssText = "color:#869089";
        const b = document.createElement("span");
        b.style.cssText = "font-family:ui-monospace,Menlo,monospace;font-size:11.5px;text-align:right;white-space:nowrap";
        r.appendChild(a);
        r.appendChild(b);
        mBody.appendChild(r);
        mRows.push({
          a,
          b,
          r
        });
      }
      const mCount = document.createElement("div");
      mCount.style.cssText = "margin:10px 0 0;opacity:0;transition:opacity 0.25s ease;min-height:58px";
      const cdLabel = document.createElement("div");
      cdLabel.style.cssText = "display:flex;justify-content:space-between;font:500 10.5px ui-monospace,Menlo,monospace;color:#1960d3;margin:0 0 4px";
      const cdL = document.createElement("span"), cdR = document.createElement("span");
      cdLabel.appendChild(cdL);
      cdLabel.appendChild(cdR);
      const cdTrack = document.createElement("div");
      cdTrack.style.cssText = "height:4px;border-radius:2px;background:rgba(33,118,255,0.15);overflow:hidden";
      const cdFill = document.createElement("div");
      cdFill.style.cssText = "height:100%;background:#2176ff;width:0%;transition:width 0.25s linear";
      cdTrack.appendChild(cdFill);
      const cdHelp = document.createElement("p");
      cdHelp.style.cssText = "margin:6px 0 0;font:400 11px/1.4 system-ui,-apple-system,sans-serif";
      mCount.appendChild(cdLabel);
      mCount.appendChild(cdTrack);
      mCount.appendChild(cdHelp);
      mBody.appendChild(mCount);
      S.math.appendChild(math);
      const params = document.createElement("div");
      const pHead = document.createElement("div");
      pHead.style.cssText = "display:flex;justify-content:space-between;align-items:baseline;gap:8px;cursor:pointer;user-select:none";
      const pTitle = document.createElement("span");
      pTitle.style.cssText = "display:flex;align-items:center;gap:6px;font:500 11px ui-monospace,Menlo,monospace;text-transform:lowercase;color:#869089";
      const pArrow = document.createElement("span");
      pArrow.textContent = "▸";
      pArrow.style.cssText = "font-size:9px;display:inline-block;transition:transform 0.2s ease";
      const pName = document.createElement("span");
      pName.textContent = "parameters";
      pTitle.appendChild(pArrow);
      pTitle.appendChild(pName);
      const ppSub = document.createElement("span");
      ppSub.style.cssText = "font:400 10px ui-monospace,Menlo,monospace;color:#869089";
      pHead.appendChild(pTitle);
      pHead.appendChild(ppSub);
      params.appendChild(pHead);
      const ppGrid = document.createElement("div");
      ppGrid.style.cssText = "display:none;grid-template-columns:repeat(auto-fit,minmax(300px,1fr));gap:4px 24px;margin:8px 0 0";
      params.appendChild(ppGrid);
      pHead.onclick = () => {
        const open = ppGrid.style.display !== "none";
        ppGrid.style.display = open ? "none" : "grid";
        pArrow.style.transform = open ? "" : "rotate(90deg)";
      };
      const pcRefs = {};
      E.PARAMS.forEach(p => {
        const row = document.createElement("div");
        row.style.cssText = "display:flex;align-items:center;gap:10px;padding:4px 0;min-width:0";
        row.title = p[5].replaceAll("`", "");
        const lbl = document.createElement("code");
        lbl.style.cssText = "font:500 11.5px ui-monospace,Menlo,monospace;letter-spacing:-0.28px;flex:1;min-width:0;overflow:hidden;text-overflow:ellipsis;white-space:nowrap";
        lbl.textContent = p[1];
        const val = document.createElement("span");
        val.style.cssText = "font:500 12px ui-monospace,Menlo,monospace;min-width:46px;text-align:right;font-variant-numeric:tabular-nums";
        const inp = document.createElement("input");
        inp.type = "range";
        inp.min = p[2];
        inp.max = p[3];
        inp.style.cssText = "width:110px;height:3px;accent-color:#0e863f;margin:0;flex:none";
        row.appendChild(lbl);
        row.appendChild(val);
        row.appendChild(inp);
        ppGrid.appendChild(row);
        inp.oninput = () => {
          const cfg = window._asState && window._asState.cfg;
          if (!cfg) return;
          let v = parseInt(inp.value, 10);
          if (p[0] === "minR") v = Math.min(v, cfg.maxR);
          if (p[0] === "maxR") v = Math.max(v, cfg.minR);
          cfg[p[0]] = v;
          val.textContent = v + p[4];
        };
        pcRefs[p[0]] = {
          lbl,
          val,
          inp,
          p
        };
      });
      S.params.appendChild(params);
      let visible = true, raf = 0;
      const obs = new IntersectionObserver(e => visible = e[0].isIntersecting, {
        threshold: 0.05
      });
      obs.observe(math);
      const tObs = new MutationObserver(() => {
        dirty = true;
      });
      tObs.observe(document.documentElement, {
        attributes: true,
        attributeFilter: ["class"]
      });
      let dirty = true, lastUpdate = 0;
      function update() {
        const st = window._asState;
        if (!st) return;
        const sim = st.sim, cfg = st.cfg, c = E.P();
        const m = E.computeMath(sim, cfg);
        mSub.textContent = "next decision in " + E.fmt(m.nextDecision, 0) + "s";
        E.setRich(mRows[0].a, "1. avg over `autoscaling_window`");
        mRows[0].b.textContent = E.fmt(m.avg, 1);
        mRows[1].a.textContent = "2. per-replica capacity";
        mRows[1].b.textContent = cfg.concurrency + " × " + E.fmt(cfg.utilization / 100, 2) + " = " + E.fmt(m.eff, 2);
        mRows[2].a.textContent = "3. desired = ⌈avg / capacity⌉";
        mRows[2].b.textContent = "⌈" + E.fmt(m.avg, 1) + " / " + E.fmt(m.eff, 2) + "⌉ = " + m.desired;
        mRows[3].a.textContent = "current → desired";
        const badge = m.desired > m.cur ? " · +" + (m.desired - m.cur) + " starting" : m.desired === m.cur ? " · stable" : " · step to " + (m.cur - m.willRemove);
        mRows[3].b.textContent = m.cur + " → " + m.desired + badge;
        mRows[3].b.style.transition = "color 0.3s ease";
        mRows[3].b.style.color = m.desired > m.cur ? c.p : m.desired < m.cur ? c.q : c.sub;
        mRows.forEach(r => {
          r.a.style.color = c.sub;
          r.r.style.borderBottom = r === mRows[3] ? "0" : "1px dashed " + c.brdM;
        });
        mCount.style.opacity = m.sdEta != null ? "1" : "0";
        if (m.sdEta != null) {
          E.setRich(cdL, "`scale_down_delay`");
          cdR.textContent = E.fmt(m.sdEta, 0) + "s left";
          cdFill.style.width = 100 * (1 - m.sdEta / Math.max(1, cfg.delay)) + "%";
          cdHelp.textContent = "Then removes " + m.willRemove + " replica" + (m.willRemove === 1 ? "" : "s") + " (at most " + cfg.sdRate + "%) and waits again.";
          cdHelp.style.color = c.sub;
        }
        ppSub.textContent = ppGrid.style.display === "none" ? "click to tune all 7 settings" : "changes apply at the next decision";
        Object.values(pcRefs).forEach(r => {
          if (parseInt(r.inp.value, 10) !== cfg[r.p[0]]) r.inp.value = cfg[r.p[0]];
          r.val.textContent = cfg[r.p[0]] + r.p[4];
          if (dirty) {
            r.lbl.style.color = c.txt;
            r.val.style.color = c.txt;
          }
        });
        dirty = false;
      }
      function loop(now) {
        raf = requestAnimationFrame(loop);
        if (!visible || now - lastUpdate < 250 && !dirty) return;
        lastUpdate = now;
        update();
      }
      update();
      raf = requestAnimationFrame(loop);
      return () => {
        cancelAnimationFrame(raf);
        obs.disconnect();
        tObs.disconnect();
        math.remove();
        params.remove();
      };
    }
    tryMount();
    return () => {
      destroyed = true;
      if (timer) clearTimeout(timer);
      if (cleanup) cleanup();
      init.current = false;
    };
  }, []);
  return <span ref={ref} />;
};

export const AutoscalerSimScenarios = () => {
  const ref = React.useRef(null);
  const init = React.useRef(false);
  React.useEffect(() => {
    if (!ref.current || init.current) return;
    init.current = true;
    let cleanup = null, destroyed = false, timer = null, retries = 0;
    const tryMount = () => {
      if (destroyed || !ref.current) return;
      if (window._asEngine && window._asState && window._asSlots) {
        cleanup = mount();
      } else if (retries++ < 120) {
        timer = setTimeout(tryMount, 30);
      }
    };
    function mount() {
      const E = window._asEngine, S = window._asSlots;
      const DEFAULT_CAP = "Sandbox mode: shape the traffic below, tune the parameters, or send a burst. Pick a scenario to stage a situation worth understanding.";
      const SANDBOX = {
        id: "sandbox",
        label: "Sandbox"
      };
      const LESSONS = [{
        id: "cold",
        label: "Cold start",
        pattern: "steady",
        knobs: {
          rate: 2,
          noise: 10
        },
        cfg: {
          minR: 0,
          maxR: 8,
          concurrency: 4,
          utilization: 70,
          window: 30,
          delay: 120,
          sdRate: 50
        },
        burst: [25, 40],
        cap: "A burst just hit a deployment running near `min_replica` 0. New replicas take about 30 seconds to come online, so the red band is requests queueing while they start. Set `min_replica` to 2 below and send another burst to see warm capacity absorb it."
      }, {
        id: "osc",
        label: "Oscillation",
        pattern: "jittery",
        knobs: {
          mean: 10,
          variance: 120
        },
        cfg: {
          minR: 0,
          maxR: 8,
          concurrency: 4,
          utilization: 70,
          window: 20,
          delay: 15,
          sdRate: 50
        },
        cap: "`scale_down_delay` is only 15 seconds, so every dip in this jittery traffic tears capacity down and every rebound cold-starts it back. Count the ▲▼ churn on the time axis. That flapping is oscillation, and each cycle pays a cold start."
      }, {
        id: "fix",
        label: "The fix",
        pattern: "jittery",
        knobs: {
          mean: 10,
          variance: 120
        },
        cfg: {
          minR: 0,
          maxR: 8,
          concurrency: 4,
          utilization: 70,
          window: 60,
          delay: 180,
          sdRate: 25
        },
        cap: "Same jittery traffic, but `scale_down_delay` is 180 seconds and `max_scale_down_rate` is 25%. Dips now expire before the countdown does, so replicas hold steady through the noise. Stability costs a little idle headroom; check the meters."
      }, {
        id: "head",
        label: "Headroom",
        pattern: "bursty",
        knobs: {
          baseline: 4,
          peak: 30,
          period: 60
        },
        cfg: {
          minR: 1,
          maxR: 8,
          concurrency: 4,
          utilization: 95,
          window: 30,
          delay: 120,
          sdRate: 50
        },
        cap: "`target_utilization_percentage` is 95, so replicas run nearly full before a scale-up triggers. Each burst outruns the threshold and queues while new replicas start. Drop it to 60 and the same bursts land in spare headroom instead."
      }, {
        id: "zero",
        label: "Scale to zero",
        pattern: "scheduled",
        knobs: {
          low: 0,
          high: 12,
          cycle: 240
        },
        cfg: {
          minR: 0,
          maxR: 8,
          concurrency: 4,
          utilization: 70,
          window: 30,
          delay: 30,
          sdRate: 50
        },
        cap: "Off-peak traffic falls to zero. Once `scale_down_delay` elapses, the deployment drains all the way to zero replicas and costs nothing, then the first request of the next wave pays the cold start. The meters show both sides of that trade."
      }];
      const cap = document.createElement("p");
      cap.style.cssText = "margin:0;font:400 12.5px/1.5 system-ui,-apple-system,sans-serif";
      S.cap.appendChild(cap);
      const mWrap = document.createElement("div");
      mWrap.style.cssText = "display:flex;align-items:baseline;gap:6px;flex-wrap:wrap;font:500 11px ui-monospace,Menlo,monospace";
      mWrap.title = "Replica-minutes billed and request-seconds spent queued over the trailing 5 simulated minutes";
      const mLabel = document.createElement("span");
      mLabel.style.cssText = "color:#869089;text-transform:lowercase";
      mLabel.textContent = "last 5 sim-min";
      const mCost = document.createElement("span");
      mCost.style.cssText = "font-variant-numeric:tabular-nums;white-space:nowrap";
      const mCostSub = document.createElement("span");
      mCostSub.style.cssText = "color:#869089";
      mCostSub.textContent = "paid ·";
      const mPain = document.createElement("span");
      mPain.style.cssText = "font-variant-numeric:tabular-nums;white-space:nowrap";
      const mPainSub = document.createElement("span");
      mPainSub.style.cssText = "color:#869089";
      mPainSub.textContent = "queued";
      mWrap.appendChild(mLabel);
      mWrap.appendChild(mCost);
      mWrap.appendChild(mCostSub);
      mWrap.appendChild(mPain);
      mWrap.appendChild(mPainSub);
      S.meters.appendChild(mWrap);
      let active = null;
      function setCaption(text) {
        E.setRich(cap, text);
      }
      function pick(l) {
        const st = window._asState;
        if (!st || !st.apply) return;
        if (l.id === "sandbox") {
          active = null;
          S.sandbox(true);
          setCaption(DEFAULT_CAP);
          styleChips();
          return;
        }
        active = l.id;
        S.sandbox(false);
        st.apply({
          pattern: l.pattern,
          knobs: l.knobs,
          cfg: l.cfg,
          reset: true
        });
        if (l.burst) setTimeout(() => {
          const s = window._asState;
          if (s && s.burst && active === l.id) s.burst(l.burst[0], l.burst[1]);
        }, 1200);
        setCaption(l.cap);
        styleChips();
      }
      [SANDBOX, ...LESSONS].forEach(l => {
        const b = document.createElement("button");
        b.textContent = l.label;
        b.style.cssText = "padding:5px 12px;border-radius:6px;cursor:pointer;font:500 12px system-ui,-apple-system,sans-serif";
        b.onclick = () => pick(l);
        b.dataset.id = l.id;
        S.chips.appendChild(b);
      });
      const burstBtn = document.createElement("button");
      burstBtn.textContent = "⚡ Send a burst";
      burstBtn.title = "Inject a 45-second traffic spike on top of the current pattern";
      burstBtn.onclick = () => {
        const s = window._asState;
        if (s && s.burst) s.burst();
        burstBtn.disabled = true;
        setTimeout(() => {
          burstBtn.disabled = false;
        }, 1600);
      };
      S.chips.appendChild(burstBtn);
      let visible = true, raf = 0;
      const obs = new IntersectionObserver(e => visible = e[0].isIntersecting, {
        threshold: 0.05
      });
      obs.observe(mWrap);
      const tObs = new MutationObserver(() => applyTheme());
      tObs.observe(document.documentElement, {
        attributes: true,
        attributeFilter: ["class"]
      });
      function styleChips() {
        const c = E.P();
        Array.from(S.chips.children).forEach(b => {
          if (b === burstBtn) {
            b.style.cssText = "padding:5px 13px;border-radius:6px;cursor:pointer;font:500 12px system-ui,-apple-system,sans-serif;white-space:nowrap;margin-left:auto;background:" + (E.isDark() ? "#0C1D13" : "#fff") + ";color:" + c.txt + ";border:1px solid " + c.w;
            return;
          }
          const on = b.dataset.id === (active == null ? "sandbox" : active);
          b.style.background = on ? E.isDark() ? "#005934" : "#0e863f" : E.isDark() ? "#0C1D13" : "#fff";
          b.style.color = on ? "#fff" : c.txt;
          b.style.border = "1px solid " + (on ? E.isDark() ? "#005934" : "#0e863f" : c.brd);
        });
      }
      function applyTheme() {
        const c = E.P();
        cap.style.color = c.txt;
        mCost.style.color = c.txt;
        styleChips();
        setCaption(active ? LESSONS.find(l => l.id === active).cap : DEFAULT_CAP);
      }
      let lastUp = 0;
      function update() {
        const st = window._asState;
        if (!st) return;
        const sim = st.sim;
        const H = sim.history;
        if (H.length < 2) return;
        let repS = 0, qS = 0, span = 0;
        for (let i = 1; i < H.length; i++) {
          const dt = H[i].t - H[i - 1].t;
          repS += (H[i].replicas || 0) * dt;
          qS += (H[i].queue || 0) * dt;
          span += dt;
        }
        mCost.textContent = E.fmt(repS / 60, 1) + " replica-min";
        mPain.textContent = E.fmt(qS, 0) + " req-sec";
        mPain.style.color = qS >= 1 ? E.isDark() ? "#ff6b6b" : "#c23b3b" : mCost.style.color;
      }
      function loop(now) {
        raf = requestAnimationFrame(loop);
        if (!visible || now - lastUp < 250) return;
        lastUp = now;
        update();
      }
      applyTheme();
      update();
      raf = requestAnimationFrame(loop);
      return () => {
        cancelAnimationFrame(raf);
        obs.disconnect();
        tObs.disconnect();
        cap.remove();
        mWrap.remove();
        S.chips.replaceChildren();
      };
    }
    tryMount();
    return () => {
      destroyed = true;
      if (timer) clearTimeout(timer);
      if (cleanup) cleanup();
      init.current = false;
    };
  }, []);
  return <span ref={ref} />;
};

export const AutoscalerSim = () => {
  const ref = React.useRef(null);
  const init = React.useRef(false);
  React.useEffect(() => {
    if (!ref.current || init.current) return;
    init.current = true;
    const root = ref.current;
    let cleanup = null, destroyed = false, timer = null, retries = 0;
    const tryMount = () => {
      if (destroyed || !ref.current) return;
      if (window._asEngine && window._asDraw) {
        cleanup = mount();
      } else if (retries++ < 60) {
        timer = setTimeout(tryMount, 30);
      }
    };
    function mount() {
      const E = window._asEngine;
      const cfg = {
        concurrency: 4,
        utilization: 70,
        window: 60,
        delay: 120,
        sdRate: 50,
        minR: 0,
        maxR: 8,
        coldStart: 30,
        procTime: 1.0
      };
      let pattern = "steady";
      let knobs = Object.fromEntries(E.PATTERNS.steady.knobs.map(k => [k[0], k[5]]));
      let sim = E.prefill(pattern, knobs, cfg);
      let burst = {
        t0: -1e9,
        dur: 45,
        amp: 0
      };
      function applySpec(spec) {
        if (spec.pattern && E.PATTERNS[spec.pattern]) {
          pattern = spec.pattern;
          knobs = Object.fromEntries(E.PATTERNS[pattern].knobs.map(x => [x[0], x[5]]));
        }
        if (spec.knobs) Object.assign(knobs, spec.knobs);
        if (spec.cfg) Object.assign(cfg, spec.cfg);
        if (spec.reset) {
          sim = E.prefill(pattern, knobs, cfg);
          burst = {
            t0: -1e9,
            dur: 45,
            amp: 0
          };
        }
        rebuildTabs();
        rebuildKnobs();
        applyTheme();
      }
      function fireBurst(amp, dur) {
        burst = {
          t0: sim.t,
          dur: dur || 45,
          amp: amp == null ? 20 : amp
        };
      }
      window._asState = {
        get sim() {
          return sim;
        },
        get cfg() {
          return cfg;
        },
        get pattern() {
          return pattern;
        },
        get knobs() {
          return knobs;
        },
        apply: applySpec,
        burst: fireBurst
      };
      const W = 1100, CH = 260;
      const card = document.createElement("div");
      const chipsSlot = document.createElement("div");
      chipsSlot.style.cssText = "display:flex;gap:6px;flex-wrap:wrap;align-items:center;margin:0 0 8px";
      card.appendChild(chipsSlot);
      const capSlot = document.createElement("div");
      capSlot.style.cssText = "margin:0 0 10px";
      card.appendChild(capSlot);
      const strip = document.createElement("div");
      const stripL = document.createElement("div");
      stripL.style.cssText = "display:flex;flex-direction:column;gap:6px;flex:1;min-width:240px";
      const stripLabel = document.createElement("span");
      stripLabel.style.cssText = "font:500 10px ui-monospace,Menlo,monospace;text-transform:lowercase;color:#869089";
      stripLabel.textContent = "traffic shape";
      const tabRow = document.createElement("div");
      tabRow.style.cssText = "display:flex;gap:6px;flex-wrap:wrap";
      const stripDesc = document.createElement("span");
      stripDesc.style.cssText = "font:400 12px system-ui,-apple-system,sans-serif;color:#869089";
      stripL.appendChild(stripLabel);
      stripL.appendChild(tabRow);
      stripL.appendChild(stripDesc);
      const knobsBox = document.createElement("div");
      knobsBox.style.cssText = "display:flex;gap:14px;flex-wrap:wrap";
      strip.appendChild(stripL);
      strip.appendChild(knobsBox);
      card.appendChild(strip);
      function rebuildTabs() {
        tabRow.replaceChildren();
        Object.keys(E.PATTERNS).forEach(k => {
          const b = document.createElement("button");
          b.style.cssText = "padding:4px 10px;border-radius:6px;cursor:pointer;font:500 11px system-ui,-apple-system,sans-serif";
          b.textContent = E.PATTERNS[k].label;
          b.onclick = () => {
            pattern = k;
            knobs = Object.fromEntries(E.PATTERNS[k].knobs.map(x => [x[0], x[5]]));
            sim = E.prefill(pattern, knobs, cfg);
            rebuildTabs();
            rebuildKnobs();
            applyTheme();
          };
          tabRow.appendChild(b);
        });
      }
      function rebuildKnobs() {
        knobsBox.replaceChildren();
        const SLOTS = 3;
        for (let i = 0; i < SLOTS; i++) {
          const kd = E.PATTERNS[pattern].knobs[i];
          const lbl = document.createElement("label");
          lbl.style.cssText = "display:flex;flex-direction:column;gap:2px;width:130px;font:500 11px ui-monospace,Menlo,monospace;color:#869089";
          if (!kd) {
            lbl.style.visibility = "hidden";
            knobsBox.appendChild(lbl);
            continue;
          }
          const top = document.createElement("div");
          top.style.cssText = "display:flex;justify-content:space-between";
          const name = document.createElement("span");
          name.textContent = kd[1];
          const valS = document.createElement("span");
          valS.style.cssText = "font-weight:500";
          valS.textContent = knobs[kd[0]] + kd[2];
          top.appendChild(name);
          top.appendChild(valS);
          const inp = document.createElement("input");
          inp.type = "range";
          inp.min = kd[3];
          inp.max = kd[4];
          inp.value = knobs[kd[0]];
          inp.style.cssText = "width:100%;height:3px;accent-color:#0e863f";
          inp.oninput = () => {
            const v = parseInt(inp.value, 10);
            knobs[kd[0]] = v;
            valS.textContent = v + kd[2];
          };
          lbl.appendChild(top);
          lbl.appendChild(inp);
          knobsBox.appendChild(lbl);
        }
        stripDesc.textContent = E.PATTERNS[pattern].desc;
      }
      const chartBox = document.createElement("div");
      chartBox.style.cssText = "min-width:0";
      const tValue = document.createElement("span");
      tValue.style.cssText = "display:block;text-align:right;font:500 11px ui-monospace,Menlo,monospace;color:#869089;font-variant-numeric:tabular-nums;margin:0 0 4px";
      chartBox.appendChild(tValue);
      const cv = document.createElement("canvas");
      cv.style.cssText = "display:block;width:100%;touch-action:pan-y";
      chartBox.appendChild(cv);
      const ctx = cv.getContext("2d");
      const dpr = window.devicePixelRatio || 1;
      cv.width = W * dpr;
      cv.height = CH * dpr;
      cv.style.height = CH + "px";
      ctx.scale(dpr, dpr);
      const leg = document.createElement("div");
      leg.style.cssText = "display:flex;gap:12px;flex-wrap:wrap;margin:6px 2px 0;font:500 10px ui-monospace,Menlo,monospace";
      function legKey(text) {
        const w = document.createElement("span");
        w.style.cssText = "display:inline-flex;align-items:center;gap:5px";
        const sw = document.createElement("span");
        const tx = document.createElement("span");
        tx.textContent = text;
        w.appendChild(sw);
        w.appendChild(tx);
        leg.appendChild(w);
        return {
          sw,
          tx
        };
      }
      const lAvg = legKey("in-flight (windowed avg)"), lThr = legKey("scale threshold"), lRep = legKey("ready replicas"), lStart = legKey("starting (cold start)"), lQ = legKey("queued"), lUp = legKey("scale up"), lDn = legKey("scale down");
      chartBox.appendChild(leg);
      const sideRow = document.createElement("div");
      sideRow.style.cssText = "display:flex;gap:10px;flex-wrap:wrap;align-items:flex-start;margin:12px 0 0";
      const mathSlot = document.createElement("div");
      mathSlot.style.flex = "1 1 260px";
      const metersSlot = document.createElement("div");
      metersSlot.style.flex = "1 1 200px";
      sideRow.appendChild(mathSlot);
      sideRow.appendChild(metersSlot);
      card.appendChild(chartBox);
      card.appendChild(sideRow);
      const paramsSlot = document.createElement("div");
      paramsSlot.style.cssText = "margin:12px 0 0";
      card.appendChild(paramsSlot);
      let sandboxOn = true;
      window._asSlots = {
        chips: chipsSlot,
        cap: capSlot,
        math: mathSlot,
        meters: metersSlot,
        params: paramsSlot,
        sandbox: v => {
          sandboxOn = v;
          strip.style.display = v ? "flex" : "none";
        }
      };
      const view = {
        maxLoad: 8,
        hover: null
      };
      cv.addEventListener("pointermove", e => {
        const r = cv.getBoundingClientRect();
        view.hover = (e.clientX - r.left) * (W / r.width);
      });
      cv.addEventListener("pointerleave", () => {
        view.hover = null;
      });
      root.appendChild(card);
      let visible = true, raf = 0;
      const obs = new IntersectionObserver(e => visible = e[0].isIntersecting, {
        threshold: 0.05
      });
      obs.observe(cv);
      const tObs = new MutationObserver(() => applyTheme());
      tObs.observe(document.documentElement, {
        attributes: true,
        attributeFilter: ["class"]
      });
      function applyTheme() {
        const c = E.P();
        card.style.cssText = "border:1px solid " + c.brd + ";border-radius:10px;padding:16px 18px;margin:14px 0;background:" + c.bg + ";max-width:" + W + "px";
        strip.style.cssText = "display:" + (sandboxOn ? "flex" : "none") + ";flex-wrap:wrap;align-items:center;gap:14px;padding:10px 14px;border:1px solid " + c.brdM + ";background:" + c.surf + ";border-radius:8px;margin:0 0 12px";
        const surf = "border:1px solid " + c.brdM + ";background:" + c.surf + ";border-radius:8px;padding:12px 14px";
        mathSlot.style.cssText = surf + ";flex:1 1 260px;min-width:0";
        metersSlot.style.cssText = surf + ";flex:1 1 200px;min-width:0";
        paramsSlot.style.cssText = surf + ";margin:12px 0 0";
        Array.from(tabRow.children).forEach((b, i) => {
          const k = Object.keys(E.PATTERNS)[i], active = k === pattern;
          b.style.background = active ? E.isDark() ? "#005934" : "#0e863f" : E.isDark() ? "#0C1D13" : "#fff";
          b.style.color = active ? "#fff" : c.txt;
          b.style.border = "1px solid " + (active ? E.isDark() ? "#005934" : "#0e863f" : c.brd);
        });
        lAvg.sw.style.cssText = "width:14px;border-top:2.5px solid " + c.q;
        lThr.sw.style.cssText = "width:14px;border-top:2px dashed " + c.w;
        lRep.sw.style.cssText = "width:9px;height:9px;border-radius:2px;background:" + c.p;
        lStart.sw.style.cssText = "width:9px;height:9px;border-radius:2px;background:" + c.rbf + ";border:1px dashed " + c.p;
        lQ.sw.style.cssText = "width:9px;height:9px;border-radius:2px;background:" + c.badBg + ";border:1px solid " + c.bad;
        lUp.sw.style.cssText = "color:" + c.p + ";font-size:8px";
        lUp.sw.textContent = "▲";
        lDn.sw.style.cssText = "color:" + c.sub + ";font-size:8px";
        lDn.sw.textContent = "▼";
        [lAvg, lThr, lRep, lStart, lQ, lUp, lDn].forEach(k => k.tx.style.color = c.sub);
      }
      function drawChart() {
        const D = window._asDraw;
        if (D) D.drawChart(ctx, sim, cfg, W, CH, view);
      }
      let last = performance.now();
      function loop(now) {
        raf = requestAnimationFrame(loop);
        if (!visible) {
          last = now;
          return;
        }
        const dt = Math.min(200, now - last) / 1000;
        last = now;
        const k = E.safeKnobs(pattern, knobs);
        let rps = E.PATTERNS[pattern].fn(sim.t, k);
        const bPh = (sim.t - burst.t0) / burst.dur;
        if (bPh >= 0 && bPh <= 1) rps += burst.amp * Math.sin(bPh * Math.PI);
        E.tickSim(sim, dt * E.SIM_SPEED, cfg, Number.isFinite(rps) ? rps : 0);
        tValue.textContent = "t = " + E.fmtTime(sim.t) + " · " + E.SIM_SPEED + "× wall time · " + Math.round(sim.inFlight) + " in-flight · " + sim.replicas + " ready" + (sim.starting.length ? " +" + sim.starting.length + " starting" : "");
        drawChart();
      }
      rebuildTabs();
      rebuildKnobs();
      applyTheme();
      drawChart();
      raf = requestAnimationFrame(loop);
      return () => {
        cancelAnimationFrame(raf);
        obs.disconnect();
        tObs.disconnect();
        card.remove();
        delete window._asState;
        delete window._asSlots;
      };
    }
    tryMount();
    return () => {
      destroyed = true;
      if (timer) clearTimeout(timer);
      if (cleanup) cleanup();
      init.current = false;
    };
  }, []);
  return <div ref={ref} />;
};

export const AutoscalerSimDraw = () => {
  React.useEffect(() => {
    if (window._asDraw) return;
    function drawChart(ctx, sim, cfg, W, CH, view) {
      const E = window._asEngine;
      if (!E || sim.history.length < 2) return;
      const c = E.P();
      const PAD = {
        l: 50,
        r: 50,
        t: 16,
        b: 26
      };
      const iW = W - PAD.l - PAD.r, iH = CH - PAD.t - PAD.b;
      const t1 = sim.t, t0 = Math.max(0, sim.t - E.HISTORY_S);
      const xS = t => PAD.l + (t - t0) / Math.max(1, t1 - t0) * iW;
      let tgt = 8;
      for (const h of sim.history) {
        const v = h.inFlight || 0;
        if (v > tgt) tgt = v;
        if ((h.rps || 0) > tgt) tgt = h.rps;
      }
      if (!Number.isFinite(tgt) || tgt < 1) tgt = 8;
      view.maxLoad = Math.max(tgt, view.maxLoad + (tgt - view.maxLoad) * 0.04);
      const maxLoad = view.maxLoad;
      const maxR = Math.max(cfg.maxR, 4);
      const yL = v => PAD.t + iH - Math.min(v, maxLoad) / maxLoad * iH;
      const yR = v => PAD.t + iH - v / maxR * iH;
      ctx.clearRect(0, 0, W, CH);
      const wx0 = xS(Math.max(t0, sim.t - cfg.window)), wx1 = xS(sim.t);
      ctx.fillStyle = c.qBg;
      ctx.fillRect(wx0, PAD.t, Math.max(0, wx1 - wx0), iH);
      ctx.font = "500 9px ui-monospace,Menlo,monospace";
      ctx.fillStyle = c.q;
      ctx.textAlign = "left";
      ctx.textBaseline = "top";
      ctx.fillText(cfg.window + "s window", wx0 + 4, PAD.t + 4);
      ctx.strokeStyle = c.grid;
      ctx.lineWidth = 0.6;
      ctx.setLineDash([2, 3]);
      for (let g = 0; g <= 4; g++) {
        const y = PAD.t + iH * (1 - g / 4);
        ctx.beginPath();
        ctx.moveTo(PAD.l, y);
        ctx.lineTo(W - PAD.r, y);
        ctx.stroke();
        ctx.fillStyle = c.sub;
        ctx.font = "500 9px ui-monospace,Menlo,monospace";
        ctx.textAlign = "end";
        ctx.textBaseline = "middle";
        ctx.fillText(Math.round(maxLoad * g / 4), PAD.l - 6, y);
        ctx.textAlign = "start";
        ctx.fillText(Math.round(maxR * g / 4), W - PAD.r + 6, y);
      }
      ctx.setLineDash([]);
      ctx.fillStyle = c.sub;
      ctx.textAlign = "end";
      ctx.textBaseline = "alphabetic";
      ctx.fillText("requests", PAD.l - 6, PAD.t - 4);
      ctx.textAlign = "start";
      ctx.fillText("replicas", W - PAD.r + 6, PAD.t - 4);
      const H = sim.history;
      const steps = key => {
        const p = [];
        let py = yR(H[0][key]);
        p.push([xS(H[0].t), py]);
        for (let i = 1; i < H.length; i++) {
          const x = xS(H[i].t);
          p.push([x, py]);
          py = yR(H[i][key]);
          p.push([x, py]);
        }
        return p;
      };
      const ready = steps("ready"), total = steps("replicas");
      const base = PAD.t + iH;
      ctx.fillStyle = c.pBg;
      ctx.beginPath();
      ctx.moveTo(ready[0][0], base);
      for (const pt of ready) ctx.lineTo(pt[0], pt[1]);
      ctx.lineTo(ready[ready.length - 1][0], base);
      ctx.closePath();
      ctx.fill();
      ctx.fillStyle = c.rbf;
      ctx.globalAlpha = 0.5;
      ctx.beginPath();
      ctx.moveTo(total[0][0], total[0][1]);
      for (const pt of total) ctx.lineTo(pt[0], pt[1]);
      for (let i = ready.length - 1; i >= 0; i--) ctx.lineTo(ready[i][0], ready[i][1]);
      ctx.closePath();
      ctx.fill();
      ctx.globalAlpha = 1;
      let anyStarting = false;
      for (const h of H) {
        if (h.replicas > h.ready) {
          anyStarting = true;
          break;
        }
      }
      if (anyStarting) {
        ctx.strokeStyle = c.p;
        ctx.lineWidth = 1.2;
        ctx.setLineDash([3, 3]);
        ctx.beginPath();
        ctx.moveTo(total[0][0], total[0][1]);
        for (const pt of total) ctx.lineTo(pt[0], pt[1]);
        ctx.stroke();
        ctx.setLineDash([]);
      }
      ctx.strokeStyle = c.p;
      ctx.lineWidth = 2;
      ctx.beginPath();
      ctx.moveTo(ready[0][0], ready[0][1]);
      for (const pt of ready) ctx.lineTo(pt[0], pt[1]);
      ctx.stroke();
      let anyQueue = false;
      for (const h of H) {
        if ((h.queue || 0) > 0.5) {
          anyQueue = true;
          break;
        }
      }
      if (anyQueue) {
        ctx.fillStyle = c.badBg;
        ctx.beginPath();
        ctx.moveTo(xS(H[0].t), base);
        for (const h of H) ctx.lineTo(xS(h.t), yL(h.queue || 0));
        ctx.lineTo(xS(H[H.length - 1].t), base);
        ctx.closePath();
        ctx.fill();
        ctx.strokeStyle = c.bad;
        ctx.lineWidth = 1;
        ctx.globalAlpha = 0.7;
        ctx.beginPath();
        for (let i = 0; i < H.length; i++) {
          const x = xS(H[i].t), y = yL(H[i].queue || 0);
          if (i === 0) ctx.moveTo(x, y); else ctx.lineTo(x, y);
        }
        ctx.stroke();
        ctx.globalAlpha = 1;
      }
      ctx.strokeStyle = c.q;
      ctx.lineWidth = 2.2;
      ctx.beginPath();
      let mAvg = false;
      for (let i = 0; i < H.length; i++) {
        if (H[i].avg == null) continue;
        const x = xS(H[i].t), y = yL(H[i].avg);
        if (!mAvg) {
          ctx.moveTo(x, y);
          mAvg = true;
        } else ctx.lineTo(x, y);
      }
      ctx.stroke();
      ctx.strokeStyle = c.w;
      ctx.lineWidth = 1.6;
      ctx.setLineDash([5, 3]);
      ctx.beginPath();
      let pty = null, mThr = false;
      for (let i = 0; i < H.length; i++) {
        if (H[i].thr == null) continue;
        const x = xS(H[i].t), y = yL(H[i].thr);
        if (!mThr) {
          ctx.moveTo(x, y);
          mThr = true;
        } else {
          ctx.lineTo(x, pty);
          ctx.lineTo(x, y);
        }
        pty = y;
      }
      ctx.stroke();
      ctx.setLineDash([]);
      if (mThr && pty != null) {
        ctx.fillStyle = c.w;
        ctx.font = "500 9px ui-monospace,Menlo,monospace";
        ctx.textAlign = "end";
        ctx.textBaseline = "bottom";
        ctx.fillText("scale threshold", W - PAD.r - 4, pty - 3);
      }
      ctx.font = "500 9px ui-monospace,Menlo,monospace";
      ctx.textBaseline = "alphabetic";
      for (const e of sim.events) {
        if (e.t < t0 || e.type !== "scale-up" && e.type !== "scale-down") continue;
        const ex = xS(e.t), n = e.n || 0;
        if (e.type === "scale-up") {
          ctx.fillStyle = c.p;
          ctx.beginPath();
          ctx.moveTo(ex, base + 3);
          ctx.lineTo(ex - 3, base + 9);
          ctx.lineTo(ex + 3, base + 9);
          ctx.closePath();
          ctx.fill();
          if (n > 0) {
            ctx.textAlign = "center";
            ctx.fillText("+" + n, ex, base + 19);
          }
        } else {
          ctx.fillStyle = c.sub;
          ctx.beginPath();
          ctx.moveTo(ex - 3, base + 3);
          ctx.lineTo(ex + 3, base + 3);
          ctx.lineTo(ex, base + 9);
          ctx.closePath();
          ctx.fill();
          if (n > 0) {
            ctx.textAlign = "center";
            ctx.fillText("−" + n, ex, base + 19);
          }
        }
      }
      ctx.fillStyle = c.sub;
      ctx.font = "500 9px ui-monospace,Menlo,monospace";
      ctx.textAlign = "center";
      ctx.textBaseline = "alphabetic";
      for (let g = 0; g <= 4; g++) {
        const x = PAD.l + g / 4 * iW, tv = t0 + g / 4 * (t1 - t0);
        ctx.fillText(E.fmtTime(tv), x, CH - 8);
      }
      if (view.hover != null) drawHover(ctx, sim, cfg, W, CH, view, {
        PAD,
        iH,
        t0,
        t1,
        xS,
        yL,
        c,
        E
      });
    }
    function drawHover(ctx, sim, cfg, W, CH, view, g) {
      const t = g.t0 + (view.hover - g.PAD.l) / Math.max(1, W - g.PAD.l - g.PAD.r) * (g.t1 - g.t0);
      if (t < g.t0 || t > g.t1) return;
      const H = sim.history;
      let h = H[0];
      for (let i = 1; i < H.length; i++) {
        if (H[i].t > t) break;
        h = H[i];
      }
      if (!h) return;
      const x = g.xS(h.t);
      ctx.strokeStyle = g.c.sub;
      ctx.lineWidth = 1;
      ctx.setLineDash([2, 3]);
      ctx.globalAlpha = 0.8;
      ctx.beginPath();
      ctx.moveTo(x, g.PAD.t);
      ctx.lineTo(x, g.PAD.t + g.iH);
      ctx.stroke();
      ctx.setLineDash([]);
      ctx.globalAlpha = 1;
      const rows = [[g.E.fmtTime(h.t), ""], ["in-flight", g.E.fmt(h.inFlight, 0)], ["windowed avg", g.E.fmt(h.avg == null ? h.inFlight : h.avg, 1)], ["threshold", g.E.fmt(h.thr == null ? 0 : h.thr, 1)], ["replicas", h.ready + (h.replicas > h.ready ? " +" + (h.replicas - h.ready) + " starting" : "")], ["queued", g.E.fmt(h.queue || 0, 0)]];
      ctx.font = "500 10px ui-monospace,Menlo,monospace";
      let bw = 0;
      for (const r of rows) bw = Math.max(bw, ctx.measureText(r[0] + "  " + r[1]).width);
      bw += 20;
      const bh = rows.length * 15 + 12;
      const bx = x + 10 + bw > W - g.PAD.r ? x - 10 - bw : x + 10;
      const by = g.PAD.t + 6;
      ctx.fillStyle = g.c.bg;
      ctx.globalAlpha = 0.94;
      ctx.strokeStyle = g.c.brd;
      ctx.lineWidth = 1;
      ctx.beginPath();
      ctx.roundRect(bx, by, bw, bh, 6);
      ctx.fill();
      ctx.globalAlpha = 1;
      ctx.stroke();
      rows.forEach((r, i) => {
        const ry = by + 16 + i * 15;
        ctx.textAlign = "left";
        ctx.textBaseline = "alphabetic";
        ctx.fillStyle = i === 0 ? g.c.txt : g.c.sub;
        ctx.fillText(r[0], bx + 10, ry);
        if (r[1]) {
          ctx.textAlign = "right";
          ctx.fillStyle = g.c.txt;
          ctx.fillText(r[1], bx + bw - 10, ry);
        }
      });
    }
    window._asDraw = {
      drawChart
    };
    return () => {
      delete window._asDraw;
    };
  }, []);
  return <span />;
};

export const AutoscalerSimEngine = () => {
  React.useEffect(() => {
    if (window._asEngine) return;
    const SIM_SPEED = 30, HISTORY_S = 300;
    const PATTERNS = {
      steady: {
        label: "Steady",
        desc: "Constant rate with mild jitter",
        knobs: [["rate", "rate", " rps", 1, 30, 8], ["noise", "noise", "%", 0, 50, 10]],
        fn: (t, k) => {
          const n = k.noise / 100 * Math.sin(t * 0.7) + k.noise / 100 * 0.5 * Math.sin(t * 1.9);
          return Math.max(0, k.rate * (1 + n));
        }
      },
      bursty: {
        label: "Bursty",
        desc: "Baseline with sharp peaks",
        knobs: [["baseline", "baseline", " rps", 0, 20, 3], ["peak", "peak", " rps", 5, 60, 25], ["period", "every", "s", 30, 300, 90]],
        fn: (t, k) => {
          const ph = t % k.period / k.period;
          const burst = Math.exp(-Math.pow((ph - 0.3) / 0.08, 2));
          return k.baseline + (k.peak - k.baseline) * burst;
        }
      },
      scheduled: {
        label: "Scheduled",
        desc: "Business-hours wave",
        knobs: [["low", "off-peak", " rps", 0, 10, 1], ["high", "peak", " rps", 5, 50, 18], ["cycle", "cycle", "s", 60, 600, 240]],
        fn: (t, k) => {
          const ph = t % k.cycle / k.cycle;
          const wave = 0.5 - 0.5 * Math.cos(ph * Math.PI * 2);
          return k.low + (k.high - k.low) * wave;
        }
      },
      jittery: {
        label: "Jittery",
        desc: "Unpredictable, large variance",
        knobs: [["mean", "mean", " rps", 1, 30, 10], ["variance", "variance", "%", 20, 200, 100]],
        fn: (t, k) => {
          const n = Math.sin(t * 0.3) * 0.4 + Math.sin(t * 1.7 + 1.3) * 0.3 + Math.sin(t * 4.1 + 0.7) * 0.2 + Math.sin(t * 0.11) * 0.5;
          return Math.max(0, k.mean * (1 + k.variance / 100 * n));
        }
      }
    };
    function makeSim() {
      return {
        t: 0,
        inFlight: 0,
        replicas: 0,
        starting: [],
        target: 0,
        lastDecisionAt: -1e9,
        scaleDownStartedAt: null,
        arrAcc: 0,
        comAcc: 0,
        history: [],
        rpsAcc: {
          sum: 0
        },
        lastSec: 0,
        events: []
      };
    }
    function safeKnobs(pattern, knobs) {
      const out = {};
      for (const k of PATTERNS[pattern].knobs) {
        const v = knobs?.[k[0]];
        out[k[0]] = Number.isFinite(v) ? v : k[5];
      }
      return out;
    }
    function fmt(v, d) {
      return (Number.isFinite(v) ? v : 0).toFixed(d == null ? 1 : d);
    }
    function fmtTime(s) {
      const m = Math.floor(s / 60), sec = Math.floor(s % 60);
      return m + ":" + String(sec).padStart(2, "0");
    }
    function tickSim(sim, dt, cfg, rps) {
      sim.t += dt;
      sim.arrAcc += rps * dt;
      while (sim.arrAcc >= 1) {
        sim.inFlight++;
        sim.arrAcc -= 1;
      }
      const cap = sim.replicas * cfg.concurrency;
      const proc = Math.min(sim.inFlight, cap);
      sim.comAcc += proc / cfg.procTime * dt;
      while (sim.comAcc >= 1) {
        if (sim.inFlight > 0) sim.inFlight--;
        sim.comAcc -= 1;
      }
      sim.starting = sim.starting.filter(s => {
        if (sim.t >= s.readyAt) {
          sim.replicas++;
          sim.events.push({
            t: sim.t,
            type: "ready"
          });
          return false;
        }
        return true;
      });
      sim.rpsAcc.sum += rps * dt;
      if (sim.t - sim.lastSec >= 1) {
        const queue = Math.max(0, sim.inFlight - cap);
        const eff = cfg.concurrency * (cfg.utilization / 100);
        const wStart = sim.t - cfg.window;
        let wSum = sim.inFlight, wN = 1;
        for (let i = sim.history.length - 1; i >= 0; i--) {
          if (sim.history[i].t < wStart) break;
          wSum += sim.history[i].inFlight;
          wN++;
        }
        sim.history.push({
          t: sim.t,
          rps: sim.rpsAcc.sum / Math.max(0.001, sim.t - sim.lastSec),
          inFlight: sim.inFlight,
          replicas: sim.replicas + sim.starting.length,
          ready: sim.replicas,
          queue: queue,
          avg: wSum / wN,
          thr: (sim.replicas + sim.starting.length) * eff
        });
        while (sim.history.length > 0 && sim.history[0].t < sim.t - HISTORY_S) sim.history.shift();
        sim.rpsAcc = {
          sum: 0
        };
        sim.lastSec = sim.t;
      }
      if (sim.t - sim.lastDecisionAt >= cfg.window) {
        sim.lastDecisionAt = sim.t;
        const winStart = sim.t - cfg.window;
        const winS = sim.history.filter(h => h.t >= winStart);
        const avg = winS.length > 0 ? winS.reduce((s, h) => s + h.inFlight, 0) / winS.length : sim.inFlight;
        const eff = cfg.concurrency * (cfg.utilization / 100);
        let desired = Math.max(cfg.minR, Math.ceil(avg / Math.max(0.01, eff)));
        desired = Math.min(desired, cfg.maxR);
        if (avg > 0 && desired < 1) desired = Math.min(1, cfg.maxR);
        const cur = sim.replicas + sim.starting.length;
        sim.target = desired;
        if (desired > cur) {
          for (let i = 0; i < desired - cur; i++) sim.starting.push({
            readyAt: sim.t + cfg.coldStart
          });
          sim.events.push({
            t: sim.t,
            type: "scale-up",
            n: desired - cur
          });
          sim.scaleDownStartedAt = null;
        } else if (desired < cur) {
          if (sim.scaleDownStartedAt == null) sim.scaleDownStartedAt = sim.t;
        } else {
          sim.scaleDownStartedAt = null;
        }
      }
      if (sim.scaleDownStartedAt != null && sim.t - sim.scaleDownStartedAt >= cfg.delay) {
        const cur = sim.replicas + sim.starting.length;
        const excess = cur - sim.target;
        if (excess > 0) {
          const rate = Number.isFinite(cfg.sdRate) ? cfg.sdRate : 50;
          const toRemove = Math.min(excess, Math.max(1, Math.floor(cur * rate / 100)));
          let removed = 0;
          while (removed < toRemove && sim.starting.length > 0) {
            sim.starting.pop();
            removed++;
          }
          while (removed < toRemove && sim.replicas > sim.target) {
            sim.replicas--;
            removed++;
          }
          sim.events.push({
            t: sim.t,
            type: "scale-down",
            n: removed
          });
          sim.scaleDownStartedAt = sim.replicas + sim.starting.length > sim.target ? sim.t : null;
        } else {
          sim.scaleDownStartedAt = null;
        }
      }
      if (sim.events.length > 80) sim.events.splice(0, sim.events.length - 80);
    }
    function prefill(pattern, knobs, cfg) {
      const sim = makeSim();
      const k = safeKnobs(pattern, knobs);
      const fn = PATTERNS[pattern].fn;
      let rpsSum = 0;
      for (let i = 0; i < 60; i++) rpsSum += Math.max(0, fn(i / 60 * 600, k));
      const avgRps = rpsSum / 60;
      const avgInF = avgRps * cfg.procTime;
      const eff = Math.max(0.01, cfg.concurrency * (cfg.utilization / 100));
      const seed = Math.min(cfg.maxR, Math.max(cfg.minR, Math.ceil(avgInF / eff)));
      sim.replicas = seed;
      sim.target = seed;
      const dt = 0.5;
      const steps = Math.floor(HISTORY_S / dt);
      for (let i = 0; i < steps; i++) {
        const r = fn(sim.t, k);
        tickSim(sim, dt, cfg, Number.isFinite(r) ? r : 0);
      }
      sim.events = [];
      sim.lastDecisionAt = sim.t;
      return sim;
    }
    function computeMath(sim, cfg) {
      const winStart = sim.t - cfg.window;
      const winS = sim.history.filter(h => h.t >= winStart);
      const avg = winS.length > 0 ? winS.reduce((s, h) => s + h.inFlight, 0) / winS.length : sim.inFlight;
      const eff = cfg.concurrency * (cfg.utilization / 100);
      const desired = Math.min(cfg.maxR, Math.max(cfg.minR, Math.ceil(avg / Math.max(0.01, eff))));
      const cur = sim.replicas + sim.starting.length;
      const excess = Math.max(0, cur - desired);
      const rate = Number.isFinite(cfg.sdRate) ? cfg.sdRate : 50;
      const willRemove = excess > 0 ? Math.min(excess, Math.max(1, Math.floor(cur * rate / 100))) : 0;
      const nextDecision = Math.max(0, cfg.window - (sim.t - sim.lastDecisionAt));
      const sdEta = sim.scaleDownStartedAt != null ? Math.max(0, cfg.delay - (sim.t - sim.scaleDownStartedAt)) : null;
      return {
        avg,
        eff,
        desired,
        cur,
        excess,
        willRemove,
        nextDecision,
        sdEta
      };
    }
    const isDark = () => document.documentElement.classList.contains("dark");
    function P() {
      const d = isDark();
      return {
        bg: d ? "#021309" : "#fff",
        surf: d ? "#0C1D13" : "#f4f9f3",
        sub: "#869089",
        brd: d ? "#344339" : "#dee4de",
        brdM: d ? "#203026" : "#f4f9f3",
        txt: d ? "#dee4de" : "#0c1d13",
        q: d ? "#4a90ff" : "#2176ff",
        qBg: d ? "rgba(74,144,255,0.16)" : "rgba(199,220,255,0.6)",
        w: d ? "#f7c42f" : "#9c7400",
        p: d ? "#19E76E" : "#0e863f",
        pBg: d ? "rgba(25,231,110,0.18)" : "rgba(178,247,207,0.45)",
        rb: "#005934",
        rbf: d ? "rgba(25,231,110,0.22)" : "rgba(178,247,207,0.55)",
        rsf: d ? "rgba(247,196,47,0.18)" : "rgba(253,237,188,0.55)",
        bad: d ? "#ff6b6b" : "#c23b3b",
        badBg: d ? "rgba(255,107,107,0.22)" : "rgba(214,69,69,0.18)",
        grid: d ? "#203026" : "#dee4de"
      };
    }
    function setRich(el, s) {
      el.replaceChildren();
      const parts = s.split("`");
      for (let i = 0; i < parts.length; i++) {
        if (i % 2 === 0) {
          if (parts[i]) el.appendChild(document.createTextNode(parts[i]));
        } else {
          const c = document.createElement("code");
          c.textContent = parts[i];
          c.style.cssText = "font-family:ui-monospace,Menlo,monospace;font-size:0.92em;background:" + (isDark() ? "rgba(255,255,255,0.08)" : "rgba(0,0,0,0.05)") + ";padding:1px 4px;border-radius:3px";
          el.appendChild(c);
        }
      }
    }
    const PARAMS = [["concurrency", "concurrency_target", 1, 32, "", "How many requests each replica can handle simultaneously. This directly determines replica count for a given load.", "Controls requests sent to a replica, not requests processed inside it. `predict_concurrency` in `config.yaml` controls the inside-the-container limit."], ["utilization", "target_utilization_percentage", 1, 100, "%", "Headroom before scaling triggers. The autoscaler scales when utilization reaches this percentage of `concurrency_target`, not when replicas are fully loaded.", "Lower values (50 to 60%) absorb spikes but cost more. Higher values (80%+) are cost-efficient for steady traffic but absorb spikes less effectively."], ["window", "autoscaling_window", 10, 300, "s", "How far back (in seconds) the autoscaler looks when measuring traffic. Traffic is averaged over this window to make scaling decisions.", "Shorter windows (30 to 60s) react quickly, which suits bursty workloads. Longer windows (2 to 5 minutes) ignore short-lived fluctuations."], ["delay", "scale_down_delay", 0, 600, "s", "How long (in seconds) the autoscaler waits after load drops before removing replicas.", "When the timer elapses, the autoscaler removes at most `max_scale_down_rate` of running replicas, waits a full delay, and repeats. This is the primary lever against oscillation."], ["sdRate", "max_scale_down_rate", 1, 50, "%", "The maximum percentage of running replicas the autoscaler can remove in one scale-down step.", "The default of 50% halves capacity each step. Lower values release capacity more gradually, which keeps replicas warm when traffic rebounds shortly after it drops."], ["minR", "min_replica", 0, 8, "", "The floor for your deployment's capacity. The autoscaler will not scale below this number.", "The default of 0 enables scale-to-zero. For production, set this to 2 or more to eliminate cold starts and add redundancy."], ["maxR", "max_replica", 1, 16, "", "The ceiling for your deployment's capacity. The autoscaler will not scale above this number.", "Protects against runaway scaling. If traffic exceeds `max_replica` × `concurrency_target`, requests queue rather than triggering new replicas."]];
    window._asEngine = {
      SIM_SPEED,
      HISTORY_S,
      PATTERNS,
      PARAMS,
      makeSim,
      safeKnobs,
      fmt,
      fmtTime,
      tickSim,
      prefill,
      computeMath,
      isDark,
      P,
      setRich
    };
  }, []);
  return <span />;
};

export const MiniScaleDown = () => {
  const ref = React.useRef(null);
  const init = React.useRef(false);
  React.useEffect(() => {
    if (!ref.current || init.current) return;
    init.current = true;
    const PER = 7000;
    function traffic(p) {
      if (p < 0.20) return 0.7 + Math.sin(p * 40) * 0.05;
      if (p < 0.30) return 0.7 * (1 - (p - 0.20) / 0.10);
      return 0;
    }
    const opts = {
      title: "`scale_down_delay`",
      desc: "Replicas must stay idle this long before they are removed. Smooths oscillation.",
      formula: "idle > `scale_down_delay` → terminate",
      W: 580,
      H: 180,
      draw: function (ctx, t, p) {
        const h = window._mdHelpers;
        const ph = t % PER / PER;
        const x0 = 40, x1 = 540;
        const cx = h.lerp(x0, x1, ph);
        const idleStart = 0.30, idleEnd = 0.78;
        const idleProg = ph < idleStart ? 0 : ph > idleEnd ? 1 : (ph - idleStart) / (idleEnd - idleStart);
        const repAlive = ph < idleEnd;
        const repStop = ph >= idleEnd && ph < idleEnd + 0.06;
        ctx.font = "500 9px ui-monospace,Menlo,monospace";
        ctx.fillStyle = p.sub;
        ctx.textAlign = "left";
        ctx.textBaseline = "middle";
        ctx.fillText("in_flight", 20, 22);
        ctx.strokeStyle = p.brdM;
        ctx.lineWidth = 1;
        ctx.beginPath();
        ctx.moveTo(x0, 54);
        ctx.lineTo(x1, 54);
        ctx.stroke();
        ctx.fillStyle = p.qFill;
        ctx.beginPath();
        ctx.moveTo(x0, 54);
        for (let q = 0; q <= 1; q += 0.01) ctx.lineTo(h.lerp(x0, x1, q), h.lerp(54, 14, traffic(q)));
        ctx.lineTo(x1, 54);
        ctx.closePath();
        ctx.fill();
        ctx.strokeStyle = p.q;
        ctx.lineWidth = 1.5;
        ctx.beginPath();
        for (let q = 0; q <= 1; q += 0.01) {
          const px = h.lerp(x0, x1, q), py = h.lerp(54, 14, traffic(q));
          if (q === 0) ctx.moveTo(px, py); else ctx.lineTo(px, py);
        }
        ctx.stroke();
        ctx.font = "500 9px ui-monospace,Menlo,monospace";
        ctx.fillStyle = p.sub;
        ctx.fillText("idle timer", 20, 84);
        ctx.strokeStyle = p.brd;
        ctx.lineWidth = 0.8;
        ctx.beginPath();
        ctx.roundRect(x0, 88, x1 - x0, 8, 2);
        ctx.stroke();
        if (idleProg > 0) {
          ctx.fillStyle = idleProg >= 1 ? p.qDark : p.q;
          ctx.beginPath();
          ctx.roundRect(x0, 88, (x1 - x0) * idleProg, 8, 2);
          ctx.fill();
        }
        ctx.setLineDash([2, 2]);
        ctx.strokeStyle = p.qDark;
        ctx.beginPath();
        ctx.moveTo(h.lerp(x0, x1, idleEnd), 84);
        ctx.lineTo(h.lerp(x0, x1, idleEnd), 100);
        ctx.stroke();
        ctx.setLineDash([]);
        ctx.font = "500 9px ui-monospace,Menlo,monospace";
        ctx.fillStyle = p.sub;
        ctx.fillText("replica", 20, 122);
        if (repAlive) {
          ctx.globalAlpha = repStop ? 1 - h.fade(ph, idleEnd, idleEnd + 0.06) : 1;
          h.repBox(ctx, x0, 106, 130, 24, ph < 0.30 ? "busy" : "ready", "Replica", 0, p, "ACTIVE");
          ctx.globalAlpha = 1;
        }
        if (ph >= idleEnd && ph < idleEnd + 0.10) {
          ctx.globalAlpha = h.fade(ph, idleEnd, idleEnd + 0.04);
          ctx.font = "500 10px ui-monospace,Menlo,monospace";
          ctx.fillStyle = p.sub;
          ctx.textAlign = "left";
          ctx.textBaseline = "middle";
          ctx.fillText("Deployment is now SCALED_TO_ZERO", x0 + 150, 118);
          ctx.globalAlpha = 1;
        }
        ctx.strokeStyle = p.txt;
        ctx.globalAlpha = 0.25;
        ctx.lineWidth = 0.5;
        ctx.beginPath();
        ctx.moveTo(cx, 10);
        ctx.lineTo(cx, 140);
        ctx.stroke();
        ctx.globalAlpha = 1;
        ctx.font = "500 10px ui-monospace,Menlo,monospace";
        ctx.fillStyle = p.sub;
        ctx.textAlign = "left";
        ctx.textBaseline = "middle";
        let cap = "";
        if (ph < 0.30) cap = "Serving traffic, deployment is ACTIVE"; else if (ph < idleEnd) {
          const rem = Math.round((idleEnd - ph) / (idleEnd - idleStart) * 20) * 5;
          cap = "Idle, " + rem + "% of scale_down_delay remains, still ACTIVE";
        } else cap = "Replica reclaimed, deployment is now SCALED_TO_ZERO";
        ctx.fillText(cap, x0, 168);
      }
    };
    let cleanup = null, destroyed = false, timer = null, retries = 0;
    const tryMount = () => {
      if (destroyed || !ref.current) return;
      if (window._mdMount) {
        cleanup = window._mdMount(ref.current, opts);
      } else if (retries++ < 60) {
        timer = setTimeout(tryMount, 30);
      }
    };
    tryMount();
    return () => {
      destroyed = true;
      if (timer) clearTimeout(timer);
      if (cleanup) cleanup();
      init.current = false;
    };
  }, []);
  return <div ref={ref} />;
};

export const MiniConcurrency = () => {
  const ref = React.useRef(null);
  const init = React.useRef(false);
  React.useEffect(() => {
    if (!ref.current || init.current) return;
    init.current = true;
    const PER = 7000, N = 8, UTIL = 0.5;
    const THRESHOLD = N * UTIL;
    const STAGES = [["Capacity available", 0.08, 0.08], ["At threshold", 0.36, 0.32], ["Scaling up", 0.55, 0.50], ["Two replicas", 0.85, 0.80]];
    let frozen = null;
    const opts = {
      title: "`concurrency_target`",
      desc: "Click any stage below to freeze on it. Click again to resume the loop.",
      formula: "load > replicas × `concurrency_target` × `target_utilization` → scale up",
      W: 580,
      H: 224,
      onClick: function (x, y) {
        if (y > 6 && y < 30) {
          let nearest = null, nearestDist = Infinity;
          STAGES.forEach(s => {
            const sx = 40 + s[1] * 500;
            const d = Math.abs(x - sx);
            if (d < nearestDist) {
              nearestDist = d;
              nearest = s[0];
            }
          });
          if (nearestDist < 75) frozen = frozen === nearest ? null : nearest;
        } else {
          frozen = null;
        }
      },
      draw: function (ctx, t, p) {
        const h = window._mdHelpers;
        const ph = frozen != null ? (STAGES.find(s => s[0] === frozen) || [null, 0, 0])[2] : t % PER / PER;
        const yOff = 28;
        STAGES.forEach(s => {
          const sx = 40 + s[1] * 500;
          let cur = false;
          if (s[0] === "Capacity available") cur = ph < 0.32; else if (s[0] === "At threshold") cur = ph >= 0.32 && ph < 0.42; else if (s[0] === "Scaling up") cur = ph >= 0.42 && ph < 0.65; else cur = ph >= 0.65;
          ctx.font = (cur ? "600 " : "500 ") + "9.5px ui-monospace,Menlo,monospace";
          ctx.fillStyle = cur ? p.txt : p.sub;
          ctx.textAlign = "center";
          ctx.textBaseline = "middle";
          ctx.fillText(s[0], sx, 18);
        });
        const slotY = 72 + yOff, slotH = 16, slotW = 12, slotGap = 3;
        const slotXFor = i => 244 + i * (slotW + slotGap);
        const slotFill = i => {
          if (i >= THRESHOLD) return 0;
          const s = 0.06 + i * 0.07, e = s + 0.07;
          return ph < s ? 0 : ph > e ? 1 : (ph - s) / (e - s);
        };
        const overflowVis = ph > 0.32 && ph < 0.55;
        const overflowX = overflowVis ? h.lerp(40, 200, (ph - 0.32) / 0.10) : 40;
        const r2Vis = ph > 0.42;
        const r2Cold = ph > 0.42 && ph < 0.62;
        const r2Prog = h.fade(ph, 0.42, 0.62);
        const r2Busy = ph >= 0.65 && ph < 0.95;
        h.repBox(ctx, 200, 28 + yOff, 140, 32, ph < 0.10 ? "ready" : "busy", "Replica 1", 0, p, "Active");
        for (let i = 0; i < N; i++) {
          const f = slotFill(i);
          ctx.strokeStyle = p.brd;
          ctx.globalAlpha = 0.7;
          ctx.lineWidth = 0.8;
          ctx.beginPath();
          ctx.roundRect(slotXFor(i), slotY, slotW, slotH, 2);
          ctx.stroke();
          ctx.globalAlpha = 1;
          ctx.fillStyle = p.p;
          ctx.globalAlpha = 0.8;
          ctx.fillRect(slotXFor(i), slotY + slotH * (1 - f), slotW, slotH * f);
          ctx.globalAlpha = 1;
        }
        const thrX = slotXFor(THRESHOLD) - slotGap / 2;
        ctx.strokeStyle = p.q;
        ctx.globalAlpha = 0.85;
        ctx.lineWidth = 1.3;
        ctx.setLineDash([3, 3]);
        ctx.beginPath();
        ctx.moveTo(thrX, slotY - 4);
        ctx.lineTo(thrX, slotY + slotH + 4);
        ctx.stroke();
        ctx.setLineDash([]);
        ctx.globalAlpha = 1;
        ctx.font = "500 8.5px ui-monospace,Menlo,monospace";
        ctx.fillStyle = p.q;
        ctx.textAlign = "center";
        ctx.textBaseline = "bottom";
        ctx.fillText("50% threshold", thrX, slotY - 6);
        ctx.font = "500 9.5px ui-monospace,Menlo,monospace";
        ctx.fillStyle = p.sub;
        ctx.textAlign = "left";
        ctx.textBaseline = "middle";
        ctx.fillText("concurrency_target = " + N, slotXFor(N) + 8, slotY + slotH / 2);
        if (r2Vis) {
          ctx.globalAlpha = h.fade(ph, 0.42, 0.46);
          h.repBox(ctx, 200, 122 + yOff, 140, 32, r2Cold ? "starting" : r2Busy ? "busy" : "ready", "Replica 2", r2Cold ? r2Prog : 0, p, r2Cold ? "Waking up" : "Active");
          ctx.globalAlpha = 1;
        }
        if (overflowVis) {
          ctx.globalAlpha = h.fade(ph, 0.32, 0.36) * (1 - h.fade(ph, 0.50, 0.55));
          ctx.fillStyle = p.q;
          ctx.beginPath();
          ctx.arc(overflowX, 44 + yOff, 4.5, 0, Math.PI * 2);
          ctx.fill();
          ctx.globalAlpha = 1;
        }
        if (ph > 0.36 && ph < 0.50) {
          ctx.globalAlpha = h.fade(ph, 0.36, 0.40) * (1 - h.fade(ph, 0.46, 0.50));
          ctx.strokeStyle = p.p;
          ctx.lineWidth = 1.5;
          ctx.beginPath();
          ctx.moveTo(200, 60 + yOff);
          ctx.quadraticCurveTo(180, 100 + yOff, 200, 130 + yOff);
          ctx.stroke();
          const tx = 200 - 180, ty = 130 + yOff - (100 + yOff), tl = Math.hypot(tx, ty);
          const ux = tx / tl, uy = ty / tl, px = -uy, py = ux;
          const tipX = 200, tipY = 130 + yOff, headLen = 9, headW = 4;
          const baseX = tipX - ux * headLen, baseY = tipY - uy * headLen;
          ctx.fillStyle = p.p;
          ctx.beginPath();
          ctx.moveTo(tipX, tipY);
          ctx.lineTo(baseX + px * headW, baseY + py * headW);
          ctx.lineTo(baseX - px * headW, baseY - py * headW);
          ctx.closePath();
          ctx.fill();
          ctx.font = "500 10px ui-monospace,Menlo,monospace";
          ctx.fillStyle = p.p;
          ctx.textAlign = "left";
          ctx.fillText("scale up", 134, 100 + yOff);
          ctx.globalAlpha = 1;
        }
        for (let i = 0; i < 4; i++) {
          const off = i * 0.07;
          const local = ((ph - off) % 1 + 1) % 1;
          if (local > 0.3) continue;
          if (ph > 0.10 + i * 0.07) continue;
          const x = h.lerp(40, 220, local / 0.3);
          ctx.fillStyle = p.q;
          ctx.beginPath();
          ctx.arc(x, 44 + yOff, 4.5, 0, Math.PI * 2);
          ctx.fill();
        }
        ctx.font = "500 10px ui-monospace,Menlo,monospace";
        ctx.fillStyle = p.sub;
        ctx.textAlign = "left";
        ctx.textBaseline = "middle";
        const reps = r2Vis ? 2 : 1;
        ctx.fillText("threshold " + reps * THRESHOLD + " (" + reps + " × concurrency_target " + N + " × target_utilization " + UTIL + ")", 20, 216);
      }
    };
    let cleanup = null, destroyed = false, timer = null, retries = 0;
    const tryMount = () => {
      if (destroyed || !ref.current) return;
      if (window._mdMount) {
        cleanup = window._mdMount(ref.current, opts);
      } else if (retries++ < 60) {
        timer = setTimeout(tryMount, 30);
      }
    };
    tryMount();
    return () => {
      destroyed = true;
      if (timer) clearTimeout(timer);
      if (cleanup) cleanup();
      init.current = false;
    };
  }, []);
  return <div ref={ref} />;
};

export const MiniDiagramEngine = () => {
  React.useEffect(() => {
    if (window._mdMount) return;
    const isDark = () => document.documentElement.classList.contains("dark");
    const lerp = (a, b, t) => a + (b - a) * Math.min(1, Math.max(0, t));
    const fade = (p, a, b) => Math.min(1, Math.max(0, (p - a) / (b - a)));
    function P() {
      const d = isDark();
      return {
        bg: d ? "#021309" : "#fff",
        sub: "#869089",
        brd: d ? "#344339" : "#dee4de",
        brdM: d ? "#203026" : "#f4f9f3",
        q: d ? "#4a90ff" : "#2176ff",
        qFill: d ? "rgba(74,144,255,0.18)" : "rgba(199,220,255,0.7)",
        qDark: "#114aa6",
        w: d ? "#f7c42f" : "#9c7400",
        p: d ? "#19E76E" : "#0e863f",
        rb: "#005934",
        rbf: d ? "rgba(25,231,110,0.22)" : "rgba(178,247,207,0.55)",
        rsf: d ? "rgba(247,196,47,0.18)" : "rgba(253,237,188,0.55)",
        txt: d ? "#dee4de" : "#0c1d13"
      };
    }
    function repBox(ctx, x, y, w, h, state, label, prog, p, displayState) {
      let fl = p.bg, st = p.p, tc = p.p, ds = [];
      if (state === "stopped") {
        fl = "transparent";
        st = p.brd;
        tc = p.sub;
        ds = [3, 3];
      } else if (state === "starting") {
        fl = p.rsf;
        st = p.w;
        tc = p.w;
      } else if (state === "busy") {
        fl = p.rbf;
        st = p.rb;
        tc = p.rb;
      } else if (state === "stopping") {
        st = p.sub;
        tc = p.sub;
      }
      ctx.globalAlpha = state === "stopped" || state === "stopping" ? 0.55 : 1;
      ctx.setLineDash(ds);
      ctx.beginPath();
      ctx.roundRect(x, y, w, h, 6);
      ctx.fillStyle = fl;
      ctx.fill();
      ctx.strokeStyle = st;
      ctx.lineWidth = 1.3;
      ctx.stroke();
      ctx.setLineDash([]);
      ctx.globalAlpha = 1;
      ctx.font = "500 11px ui-monospace,Menlo,monospace";
      ctx.fillStyle = tc;
      ctx.textAlign = "left";
      ctx.textBaseline = "middle";
      ctx.fillText(label, x + 10, y + h / 2);
      ctx.font = "500 9.5px ui-monospace,Menlo,monospace";
      ctx.textAlign = "right";
      ctx.fillText(displayState || state, x + w - 10, y + h / 2);
      if (state === "starting" && prog > 0) {
        ctx.fillStyle = p.w;
        ctx.fillRect(x, y + h - 2, w * prog, 2);
      }
    }
    function setRich(el, s) {
      el.replaceChildren();
      const parts = s.split("`");
      for (let i = 0; i < parts.length; i++) {
        if (i % 2 === 0) {
          if (parts[i]) el.appendChild(document.createTextNode(parts[i]));
        } else {
          const code = document.createElement("code");
          code.textContent = parts[i];
          code.style.cssText = "font-family:ui-monospace,Menlo,monospace;font-size:0.92em;background:" + (isDark() ? "rgba(255,255,255,0.08)" : "rgba(0,0,0,0.05)") + ";padding:1px 4px;border-radius:3px";
          el.appendChild(code);
        }
      }
    }
    window._mdHelpers = {
      lerp,
      fade,
      repBox
    };
    window._mdMount = function (root, opts) {
      const W = opts.W || 580, H = opts.H || 200;
      const card = document.createElement("div");
      const tit = document.createElement("div");
      tit.style.cssText = "font:500 12px ui-monospace,Menlo,monospace;letter-spacing:-0.28px;margin:0 0 4px";
      const desc = document.createElement("p");
      desc.style.cssText = "margin:0 0 10px;font-size:12px;line-height:1.45;font-family:system-ui,-apple-system,sans-serif";
      const cv = document.createElement("canvas");
      cv.style.cssText = "display:block;width:100%;max-width:" + W + "px;touch-action:pan-y";
      const formula = document.createElement("div");
      formula.style.cssText = "margin:8px 0 0;font:500 11px ui-monospace,Menlo,monospace;letter-spacing:-0.28px;border-radius:4px;padding:4px 8px;display:inline-block";
      setRich(tit, opts.title);
      setRich(desc, opts.desc);
      if (opts.formula) setRich(formula, opts.formula);
      card.appendChild(tit);
      card.appendChild(desc);
      card.appendChild(cv);
      if (opts.formula) card.appendChild(formula);
      root.appendChild(card);
      const ctx = cv.getContext("2d");
      const dpr = window.devicePixelRatio || 1;
      cv.width = W * dpr;
      cv.height = H * dpr;
      cv.style.height = H + "px";
      ctx.scale(dpr, dpr);
      if (opts.onClick) {
        cv.style.cursor = "pointer";
        cv.addEventListener("click", e => {
          const r = cv.getBoundingClientRect();
          opts.onClick((e.clientX - r.left) / r.width * W, (e.clientY - r.top) / r.height * H);
        });
      }
      function applyTheme() {
        const d = isDark();
        card.style.cssText = "border:1px solid " + (d ? "#344339" : "#f4f9f3") + ";border-radius:8px;padding:16px 18px;margin:12px 0;background:" + (d ? "#021309" : "#fff") + ";max-width:" + W + "px";
        tit.style.color = d ? "#dee4de" : "#0c1d13";
        desc.style.color = d ? "#9CA59E" : "#5a675e";
        formula.style.background = d ? "#0C1D13" : "#f4f9f3";
        formula.style.borderColor = d ? "#203026" : "#dee4de";
        formula.style.border = "1px solid " + (d ? "#203026" : "#dee4de");
        formula.style.color = d ? "#19E76E" : "#0e863f";
        if (opts.title) setRich(tit, opts.title);
        if (opts.desc) setRich(desc, opts.desc);
        if (opts.formula) setRich(formula, opts.formula);
      }
      applyTheme();
      let visible = true, raf = 0, t0 = performance.now(), dirty = true;
      const obs = new IntersectionObserver(e => visible = e[0].isIntersecting, {
        threshold: 0.15
      });
      obs.observe(cv);
      const themeObs = new MutationObserver(() => {
        dirty = true;
        applyTheme();
      });
      themeObs.observe(document.documentElement, {
        attributes: true,
        attributeFilter: ["class"]
      });
      function loop(ts) {
        raf = requestAnimationFrame(loop);
        if (!visible) {
          dirty = true;
          return;
        }
        const t = ts - t0;
        ctx.clearRect(0, 0, W, H);
        opts.draw(ctx, t, P());
        dirty = false;
      }
      raf = requestAnimationFrame(loop);
      return () => {
        cancelAnimationFrame(raf);
        obs.disconnect();
        themeObs.disconnect();
        card.remove();
      };
    };
    return () => {
      delete window._mdMount;
      delete window._mdHelpers;
    };
  }, []);
  return <span />;
};

<MiniDiagramEngine />

<AutoscalerSimEngine />

<AutoscalerSimDraw />

Without autoscaling, you'd choose between two bad options: pay for enough GPUs to handle your peak traffic 24/7, or accept that requests fail when load exceeds your fixed capacity. Autoscaling eliminates this tradeoff by adjusting the number of **replicas** backing a deployment based on demand. When traffic rises, the autoscaler adds replicas. When it falls, it removes them. The goal is to match capacity to load so you pay for what you use without sacrificing latency.

Baseten [bills per minute](/organization/billing) for every minute a replica is observed as up, including the builder workload after `truss push` and any training workloads. A deployment scaled to zero replicas incurs no charges, but model load on a fresh replica is metered. See [Billing and usage](/organization/billing) for the full lifecycle breakdown, and [Cold starts](/deployment/autoscaling/cold-starts) for techniques to minimize startup time.

<Accordion title="Reference">
  Baseten provides default settings that work for most workloads.
  Tune your autoscaling settings based on your model and traffic.

  | Parameter           | Default | Range    | What it controls                             |
  | ------------------- | ------- | -------- | -------------------------------------------- |
  | Min replicas        | 0       | ≥ 0      | Baseline capacity (0 = scale to zero).       |
  | Max replicas        | 1       | ≥ 1      | Cost/capacity ceiling.                       |
  | Autoscaling window  | 60s     | 10-3600s | Time window for traffic analysis.            |
  | Scale-down delay    | 900s    | 0-3600s  | Wait time before removing idle replicas.     |
  | Max scale-down rate | 50%     | 1-50%    | Cap on replicas removed per scale-down step. |
  | Concurrency target  | 1       | ≥ 1      | Requests per replica before scaling.         |
  | Target utilization  | 70%     | 1-100%   | Headroom before scaling triggers.            |
</Accordion>

You can configure autoscaling settings through the Baseten UI or API.

<Tabs>
  <Tab title="UI">
    1. Select your deployment.
    2. Under **Replicas** for your production environment, choose **Configure**.
    3. Configure the autoscaling settings and choose **Update**.

    <Accordion title="Show the configure-autoscaling panel">
      <img src="https://mintcdn.com/baseten-preview/LAFcl_wGfUIg5k4I/_images/configure-autoscaling.png?fit=max&auto=format&n=LAFcl_wGfUIg5k4I&q=85&s=d572ba832d684960535846e062b2380e" alt="UI view to configure autoscaling" width="1330" height="1550" data-path="_images/configure-autoscaling.png" />
    </Accordion>
  </Tab>

  <Tab title="cURL">
    Send a PATCH request to the autoscaling settings endpoint:

    ```bash Request theme={"system"}
    curl -X PATCH \
      https://api.baseten.co/v1/models/{model_id}/deployments/{deployment_id}/autoscaling_settings \
      -H "Authorization: Bearer $BASETEN_API_KEY" \
      -d '{
        "min_replica": 2,
        "max_replica": 10,
        "concurrency_target": 32,
        "target_utilization_percentage": 70,
        "autoscaling_window": 60,
        "scale_down_delay": 900
      }'
    ```

    For more information, see the [API reference](/reference/management-api/deployments/autoscaling/updates-a-deployments-autoscaling-settings).
  </Tab>

  <Tab title="Python">
    Use the `requests` library to send the same PATCH:

    ```python update_autoscaling.py theme={"system"}
    import requests
    import os

    API_KEY = os.environ.get("BASETEN_API_KEY")

    response = requests.patch(
        "https://api.baseten.co/v1/models/{model_id}/deployments/{deployment_id}/autoscaling_settings",
        headers={"Authorization": f"Bearer {API_KEY}"},
        json={
            "min_replica": 2,
            "max_replica": 10,
            "concurrency_target": 32,
            "target_utilization_percentage": 70,
            "autoscaling_window": 60,
            "scale_down_delay": 900
        }
    )

    print(response.json())
    ```

    For more information, see the [API reference](/reference/management-api/deployments/autoscaling/updates-a-deployments-autoscaling-settings).
  </Tab>
</Tabs>

## How autoscaling works

The autoscaler matches replica count to demand by continuously sampling in-flight requests into a sliding window that spans `autoscaling_window` (60 seconds by default). It averages the load over that window, divides by each replica's effective capacity (`concurrency_target` × `target_utilization_percentage`), and rounds up to set the desired replica count. Scaling up happens at the next decision, but scaling down is deliberately patient: load has to stay below the threshold for an entire `scale_down_delay` before the autoscaler halves the excess, and it keeps halving on each subsequent delay rather than dropping replicas all at once. That asymmetry is what keeps the deployment from oscillating when traffic dips and recovers.

The simulator below runs that exact loop on live traffic: every scale-up fires at the moment the windowed average crosses the scale threshold. Start from a scenario to stage a cold start or oscillation, or stay in the sandbox and shape the traffic yourself. Every parameter is live, and the meters track what your settings cost in idle capacity and queued requests.

<AutoscalerSim />

<AutoscalerSimScenarios />

<AutoscalerSimPanels />

To put numbers on it, consider a deployment with `concurrency_target` set to 10 and `target_utilization_percentage` at 70%. Each replica's effective capacity is 7 concurrent requests (10 × 0.70). If the windowed average rises from 5 to 25 in-flight requests, the autoscaler computes ⌈25 / 7⌉ = 4 desired replicas at the next decision and starts provisioning the difference. Scale-up continues until the deployment reaches `max_replica`; beyond that ceiling, additional load queues until capacity frees up.

Scale-down is slower by design. When the windowed average drops below the threshold, the autoscaler waits a full `scale_down_delay` (900 seconds by default), removes the excess at a pace capped by `max_scale_down_rate` (50% of running replicas by default), and resets the timer. At the default rate, a deployment with eight excess replicas drains to four, then two, then one, with a full delay between steps. If traffic recovers inside the delay, no scale event fires and the replicas stay warm. Scale-down stops at `min_replica`; production deployments typically hold it at two or more so a healthy replica is always available.

## Replicas

Each replica is an independent instance of your model, running on its own hardware and capable of serving requests in parallel with other replicas. The autoscaler controls how many replicas are active at any given time, but you set the boundaries.

<ParamField body="min_replica" type="integer" default="0">
  The floor for your deployment's capacity. The autoscaler won't scale below this number.

  **Range:** ≥ 0

  The default of 0 enables *scale-to-zero*: when no requests arrive for long enough, all replicas shut down and your deployment incurs no charges. The tradeoff is that the next request triggers a [cold start](/deployment/autoscaling/cold-starts), which can take minutes for large models. During that wake-up period, [billing is per minute](/organization/billing) even though the replica isn't yet serving responses.

  <Note>
    For production deployments, set `min_replica` to at least 2. This eliminates cold starts and provides redundancy if one replica fails.
  </Note>
</ParamField>

<ParamField body="max_replica" type="integer" default="1">
  The ceiling for your deployment's capacity. The autoscaler won't scale above this number.

  **Range:** ≥ 1

  This setting protects against runaway scaling and unexpected costs. If traffic exceeds what your maximum replicas can handle, requests queue rather than triggering new replicas. See [Request lifecycle](/deployment/autoscaling/request-lifecycle) for details on queuing and load shedding behavior. The default of 1 effectively disables autoscaling: you get exactly one replica regardless of load.

  Estimate max replicas:

  $$
  (peak\_requests\_per\_second / throughput\_per\_replica) + buffer
  $$
</ParamField>

For high-volume workloads requiring guaranteed capacity, [contact Baseten](mailto:support@baseten.co) about reserved capacity options.

## Scaling triggers

The autoscaler decides when a replica is "full" by comparing in-flight requests against a per-replica threshold. `concurrency_target` caps how many simultaneous requests each replica accepts, and `target_utilization_percentage` cuts the threshold lower so the autoscaler can trigger a scale-up before any replica is completely saturated, leaving room for new replicas to come online without queueing requests in the meantime. Scale-up fires when:

$$
load > replicas \times concurrency\_target \times target\_utilization
$$

The following diagram shows a replica with `concurrency_target` of 8 and `target_utilization` of 50%, so the per-replica threshold sits at 4. The first four requests fill capacity within headroom; the fifth crosses the threshold, and the autoscaler provisions a second replica to absorb the overflow before the remaining slots saturate.

<MiniConcurrency />

<ParamField body="concurrency_target" type="integer" default="1">
  How many requests each replica can handle simultaneously. This directly determines replica count for a given load.

  **Range:** ≥ 1

  Given the current load, the autoscaler calculates desired replicas:

  $$
  desired\_replicas = \lceil in\_flight\_requests / (concurrency\_target \times target\_utilization) \rceil
  $$

  *In-flight requests* are requests sent to your model that haven't returned a response (for streaming, until the stream completes). [Async inference requests](/inference/async) are not included in this count. This count is exposed as [`baseten_concurrent_requests`](/observability/export-metrics/supported-metrics#baseten_concurrent_requests) in the metrics dashboard and metrics export.

  The right value depends on how your model uses hardware. Image generation models that consume all GPU memory per request can only process one at a time, so a concurrency target of 1 is correct. LLMs and embedding models batch requests internally and can handle dozens simultaneously, so higher targets (32 or more) reduce cost by packing more work onto each replica.

  **Tradeoff:** Higher concurrency = fewer replicas (lower cost) but more per-replica queueing (higher latency). Lower concurrency = more replicas (higher cost) but less queueing (lower latency).
</ParamField>

**Starting points by model type:**

| Model type              | Starting concurrency |
| ----------------------- | -------------------- |
| Standard Truss model    | 1                    |
| vLLM / LLM inference    | 32-128               |
| SGLang                  | 32                   |
| Text embeddings (TEI)   | 32                   |
| BEI embeddings          | 96+ (min ≥ 8)        |
| Whisper (async batch)   | 256                  |
| Image generation (SDXL) | 1                    |

For engine-specific guidance, see [Autoscaling engines](/engines/performance-concepts/autoscaling-engines).

<Note>
  **Concurrency target** controls requests sent *to* a replica and triggers autoscaling.
  **predict\_concurrency** (Truss config.yaml) controls requests processed *inside* the container.
  Concurrency target should be less than or equal to predict\_concurrency.
  See the `predict_concurrency` field in the [Truss configuration reference](/reference/truss-configuration) for details.
</Note>

<ParamField body="target_utilization_percentage" type="integer" default="70">
  Headroom before scaling triggers. The autoscaler scales when utilization reaches this percentage of the concurrency target, not when replicas are fully loaded.

  **Range:** 1-100%

  The effective threshold is:

  $$
  concurrency\_target × target\_utilization
  $$

  With a concurrency target of 10 and utilization of 70%, scaling triggers at 7 concurrent requests (10 × 0.70), leaving 30% headroom for absorbing spikes while new replicas start.

  Lower values (50-60%) provide more headroom for spikes but cost more. Higher values (80%+) are cost-efficient for steady traffic but absorb spikes less effectively.
</ParamField>

<Warning>
  Target utilization is **not** GPU utilization. It measures request slot usage relative to your concurrency target, not hardware utilization.
</Warning>

## Scaling dynamics

Once the autoscaler decides to scale, the settings here control the pace. `autoscaling_window` determines how much history feeds into each decision, so a longer window averages out short spikes while a shorter one reacts to traffic changes faster. `scale_down_delay` gates removal in the other direction by holding replicas warm even after load drops, so a brief dip does not trigger a teardown that the next request would have to wait through. `max_scale_down_rate` caps how much capacity each scale-down step can remove once that delay has passed. Together, these settings tune the tradeoff between responsiveness and stability. The diagram below shows traffic falling to zero, the idle timer filling up, and the replica being reclaimed only once the timer crosses the `scale_down_delay` threshold.

<ParamField body="autoscaling_window" type="integer" default="60">
  How far back (in seconds) the autoscaler looks when measuring traffic. Traffic is averaged over this window to make scaling decisions.

  **Range:** 10-3600 seconds

  A 60-second window smooths out momentary spikes by averaging load over the past minute. Shorter windows (30-60s) react quickly to traffic changes, which suits bursty workloads. Longer windows (2-5 min) ignore short-lived fluctuations and prevent the autoscaler from chasing noise.
</ParamField>

<MiniScaleDown />

<ParamField body="scale_down_delay" type="integer" default="900">
  How long (in seconds) the autoscaler waits after load drops before removing replicas.

  **Range:** 0-3600 seconds

  When load drops, the autoscaler starts a countdown. If load stays low for the full delay, it removes replicas using exponential back-off (half the excess, wait, half again). If traffic returns before the countdown finishes, the replicas stay active and the countdown resets.

  This is your primary lever for preventing *oscillation*. If replicas repeatedly scale up and down, increase this value first.
</ParamField>

<ParamField body="max_scale_down_rate" type="integer" default="50">
  The maximum percentage of running replicas the autoscaler can remove in one scale-down step.

  **Range:** 1-50%

  Each time a `scale_down_delay` elapses, the autoscaler removes at most this percentage of running replicas. The default of 50% produces the halve-and-wait pattern described above. Lower values release capacity more gradually, which keeps more replicas warm when traffic tends to rebound shortly after it drops.
</ParamField>

<Tip>
  A **short window** with a **long delay** gives you fast scale-up while maintaining capacity during temporary dips. This is a good starting configuration for most workloads.
</Tip>

## Development deployments

Development deployments are designed for iteration, not production traffic. Replicas are fixed at 0-1 to match the [`truss watch`](/reference/cli/truss/watch) workflow, where you're testing changes on a single instance rather than handling concurrent users. You can still adjust timing and concurrency settings.

| Setting            | Value       | Modifiable |
| ------------------ | ----------- | ---------- |
| Min replicas       | 0           | No         |
| Max replicas       | 1           | No         |
| Autoscaling window | 60 seconds  | Yes        |
| Scale-down delay   | 900 seconds | Yes        |
| Concurrency target | 1           | Yes        |
| Target utilization | 70%         | Yes        |

To enable full autoscaling with configurable replica settings, [promote the deployment to production](/deployment/deployments).

## Next steps

<CardGroup cols={2}>
  <Card title="Traffic patterns" href="/deployment/autoscaling/traffic-patterns">
    Identify your traffic pattern and get recommended starting settings.
  </Card>

  <Card title="Cold starts" href="/deployment/autoscaling/cold-starts">
    Understand cold starts and how to minimize their impact.
  </Card>

  <Card title="API reference" href="/reference/management-api/deployments/autoscaling/updates-a-deployments-autoscaling-settings">
    Complete autoscaling API documentation.
  </Card>

  <Card title="Engine-specific autoscaling" href="/engines/performance-concepts/autoscaling-engines">
    Recommended settings for BEI and Engine-Builder-LLM with dynamic batching.
  </Card>
</CardGroup>

## Troubleshooting

Having issues with autoscaling? See [Autoscaling troubleshooting](/troubleshooting/deployments#autoscaling-issues) for solutions to common problems like oscillation, slow scale-up, and unexpected costs.
