Skip to main content
Interactive sessions use rSSH (remote SSH) to connect your local IDE to a training container. Unlike traditional SSH, rSSH doesn’t require SSH keys, open ports, or direct network access. Instead, it uses VS Code Remote Tunnels or Cursor’s equivalent. You authenticate via a device code flow through Microsoft or GitHub, and the tunnel connects your IDE to the container securely. Use rSSH to debug a failed training job, inspect state on a running job, or develop interactively without resubmitting.

Prerequisites

  • VS Code or Cursor installed locally.
  • The Remote - Tunnels extension installed in your IDE.
  • A Microsoft or GitHub account for device flow authentication.

Quick start

This walkthrough uses the MNIST PyTorch example to push a training job with rSSH enabled, then connects to the container.

1. Clone the example

Clone the ml-cookbook and navigate to the MNIST training example:
git clone https://github.com/basetenlabs/ml-cookbook.git
cd ml-cookbook/examples/mnist-pytorch/training

2. Configure and push the job

Add an interactive session to your config.py:
config.py
from truss_train import TrainingProject, TrainingJob, Image, Compute, Runtime
from truss_train.definitions import (
    InteractiveSession,
    InteractiveSessionTrigger,
    InteractiveSessionAuthProvider,
)
from truss.base.truss_config import AcceleratorSpec

training_job = TrainingJob(
    image=Image(base_image="pytorch/pytorch:2.7.0-cuda12.8-cudnn9-runtime"),
    compute=Compute(
        accelerator=AcceleratorSpec(accelerator="H100", count=1),
    ),
    runtime=Runtime(
        start_commands=["python train.py"],
    ),
    interactive_session=InteractiveSession(
        trigger=InteractiveSessionTrigger.ON_STARTUP,
        auth_provider=InteractiveSessionAuthProvider.MICROSOFT,
    ),
)

training_project = TrainingProject(name="mnist-training", job=training_job)
Push the job:
truss train push config.py
Once the job is running, retrieve the auth code using truss train isession:
truss train isession --job-id <job_id>
Interactive Sessions for Job: <job_id>
Replica ID  Tunnel Name              Auth Code  Auth URL                             Generated At (Local)
r0          bt-session-<job_id>-0    AB12-CD34  https://login.microsoftonline.com/…  14:30:00 PST
You can also view this table in truss train logs --job-id <job_id> --tail, where it auto-refreshes every 30 seconds alongside your training logs.

3. Authenticate and connect

Connecting to the tunnel relies on the Remote - Tunnels extension in your IDE.
  1. Open the Auth URL from the table in your browser.
  2. Enter the Auth Code shown in the table.
  3. Connect to the tunnel in your IDE:
  1. Open the command palette (Cmd+Shift+P on macOS, Ctrl+Shift+P on Windows/Linux).
  2. Select Remote-Tunnels: Connect to Tunnel.
  3. Select the tunnel named bt-session-<job_id>-<node_rank> (for example, bt-session-abc123-0).
Open your workspace to the desired folder path (typically /app or /workspace) to start debugging, editing your training script, or running commands.

Trigger modes

The trigger mode controls when the rSSH session’s container stays alive for interactive use. Baseten generates the tunnel and auth code for all modes. The trigger determines the session lifecycle:
ModeWhen to useBehavior
on_startupDevelop interactively, run commands, test code while training runs.Session is active from job start. Your start_commands still run alongside the session.
on_failureDebug a failing training run. Your most common choice for production jobs.Session activates when training exits with a non-zero exit code. The container stays alive for you to inspect the failure.
on_demandDecide later whether you need a session. This is the default.Session activates when you authenticate through the device code flow, or when you change the trigger on a running job.
Auth codes appear in truss train isession as soon as the tunnel starts, regardless of trigger mode. With on_failure, the container stays alive for interactive use only after training fails. With on_demand, the container stays alive only after you authenticate or explicitly change the trigger.

Activating an on-demand session

If you pushed a job with on_demand (the default), activate the session by completing the device code flow: open the Auth URL and enter the Auth Code from truss train isession. You can also activate the session by changing the trigger on a running job:
truss train update_session <job_id> --trigger on_startup

Configuration

Configure interactive sessions with CLI flags or the Python SDK. CLI flags override SDK values when both are set.
Pass --interactive to truss train push with a trigger mode:
truss train push config.py \
  --interactive on_startup \
  --interactive-timeout-minutes 120
See the CLI reference for all push options.

Session management

View session status

Check auth codes and connection status:
truss train isession --job-id <job_id>

Monitor with live logs

The --tail flag displays a live view with the session table pinned at the top and training logs streaming below:
truss train logs --job-id <job_id> --tail

Timeout and expiry

Sessions expire based on the timeout_minutes setting (default: 480 minutes or 8 hours).
  1. When the tunnel starts successfully, Baseten sets the expiry to now + timeout_minutes.
  2. Each time the tunnel reconnects, the expiry resets to now + timeout_minutes.
  3. When the expiry passes, the session ends and the container shuts down.
The timeout resets on tunnel reconnection, not on general IDE activity. If you disconnect and reconnect, the timer resets. If you stay connected but idle, the session expires after the configured timeout.

What happens when a session expires

When a session expires, Baseten signals the container to shut down gracefully. Baseten doesn’t hard-kill the container — it receives the signal and exits cleanly. Baseten preserves any files you saved to $BT_CHECKPOINT_DIR, but you lose unsaved work in the container’s local filesystem.

Multi-node sessions

For multi-node training jobs, Baseten creates one rSSH session per node. Each node gets its own auth code, and you connect to each node independently. Tunnel names follow the format bt-session-<job_id>-<node_rank>, where node_rank starts at 0. For example, a 2-node job produces:
  • bt-session-abc123-0 (node 0)
  • bt-session-abc123-1 (node 1)
The truss train isession command displays auth codes for all nodes in a single table.