Prerequisites
- VS Code or Cursor installed locally.
- The Remote - Tunnels extension installed in your IDE.
- A Microsoft or GitHub account for device flow authentication.
Quick start
This walkthrough uses the MNIST PyTorch example to push a training job with rSSH enabled, then connects to the container.1. Clone the example
Clone the ml-cookbook and navigate to the MNIST training example:2. Configure and push the job
Add an interactive session to yourconfig.py:
config.py
truss train isession:
3. Authenticate and connect
Connecting to the tunnel relies on the Remote - Tunnels extension in your IDE.- Open the Auth URL from the table in your browser.
- Enter the Auth Code shown in the table.
- Connect to the tunnel in your IDE:
- VS Code
- Cursor
- Open the command palette (
Cmd+Shift+Pon macOS,Ctrl+Shift+Pon Windows/Linux). - Select Remote-Tunnels: Connect to Tunnel.
- Select the tunnel named
bt-session-<job_id>-<node_rank>(for example,bt-session-abc123-0).
/app or /workspace) to start debugging, editing your training script, or running commands.
Trigger modes
The trigger mode controls when the rSSH session’s container stays alive for interactive use. Baseten generates the tunnel and auth code for all modes. The trigger determines the session lifecycle:| Mode | When to use | Behavior |
|---|---|---|
on_startup | Develop interactively, run commands, test code while training runs. | Session is active from job start. Your start_commands still run alongside the session. |
on_failure | Debug a failing training run. Your most common choice for production jobs. | Session activates when training exits with a non-zero exit code. The container stays alive for you to inspect the failure. |
on_demand | Decide later whether you need a session. This is the default. | Session activates when you authenticate through the device code flow, or when you change the trigger on a running job. |
Auth codes appear in
truss train isession as soon as the tunnel starts, regardless of trigger mode.
With on_failure, the container stays alive for interactive use only after training fails.
With on_demand, the container stays alive only after you authenticate or explicitly change the trigger.Activating an on-demand session
If you pushed a job withon_demand (the default), activate the session by completing the device code flow: open the Auth URL and enter the Auth Code from truss train isession.
You can also activate the session by changing the trigger on a running job:
Configuration
Configure interactive sessions with CLI flags or the Python SDK. CLI flags override SDK values when both are set.- CLI
- Python SDK
Pass See the CLI reference for all
--interactive to truss train push with a trigger mode:push options.Session management
View session status
Check auth codes and connection status:Monitor with live logs
The--tail flag displays a live view with the session table pinned at the top and training logs streaming below:
Timeout and expiry
Sessions expire based on thetimeout_minutes setting (default: 480 minutes or 8 hours).
- When the tunnel starts successfully, Baseten sets the expiry to
now + timeout_minutes. - Each time the tunnel reconnects, the expiry resets to
now + timeout_minutes. - When the expiry passes, the session ends and the container shuts down.
The timeout resets on tunnel reconnection, not on general IDE activity.
If you disconnect and reconnect, the timer resets.
If you stay connected but idle, the session expires after the configured timeout.
What happens when a session expires
When a session expires, Baseten signals the container to shut down gracefully. Baseten doesn’t hard-kill the container — it receives the signal and exits cleanly. Baseten preserves any files you saved to$BT_CHECKPOINT_DIR, but you lose unsaved work in the container’s local filesystem.
Multi-node sessions
For multi-node training jobs, Baseten creates one rSSH session per node. Each node gets its own auth code, and you connect to each node independently. Tunnel names follow the formatbt-session-<job_id>-<node_rank>, where node_rank starts at 0. For example, a 2-node job produces:
bt-session-abc123-0(node 0)bt-session-abc123-1(node 1)
truss train isession command displays auth codes for all nodes in a single table.