When a model is serving hundreds of requests per minute, a single slow or failing prediction can be difficult to isolate from the surrounding noise. Baseten addresses this by assigning a unique request ID to every predict call and returning it in theDocumentation Index
Fetch the complete documentation index at: https://docs.baseten.co/llms.txt
Use this file to discover all available pages before exploring further.
X-Baseten-Request-Id response header. Because each request carries its own ID, you can trace a single prediction through your model’s logs without sifting through unrelated entries.
Per-request log filtering requires Truss version 0.15.5 or later. Upgrade with
pip install --upgrade trussScope by environment or deployment
The Logs tab can show entries from a single deployment or from every deployment in an environment. Use the dropdowns at the top of the tab to switch. Environment scope aggregates logs across every deployment in that environment, including past deployments still serving traffic during a rollout. Use it to follow a request across deployment boundaries or to watch a promotion in progress. Deployment scope restricts logs to a single deployment ID. Use it to isolate behavior to one version, such as a development deployment. The same scope applies to live tail and historical search.Getting the request ID
The first step is capturing the request ID from the response. Baseten includes it in every predict response, regardless of whether the call is synchronous, asynchronous, or gRPC. The exact location depends on the protocol you’re using:- HTTP
- gRPC
- Async
When you make a predict call, include the The request ID appears as a response header:
-sD- flag to print response headers alongside the body:Filtering logs by request ID
Once you have a request ID, open the model’s logs page and enter it in the search filter bar using therequestId: prefix:
Logging with request context
For standard Truss models, Baseten automatically attaches the request ID to any log emitted via Python’slogging module during a predict call. No configuration is required — just use a logger:
Custom servers
For standard Truss models, Baseten handles request ID logging automatically through the framework’s built-in JSON formatter. No configuration is required. Custom servers don’t have this built-in support, so you’ll need to do two things: extract thex-baseten-request-id header from incoming requests, and include it as a top-level request_id key in your JSON log output. Both steps are covered in the setup guides for custom HTTP servers and custom gRPC servers.
Export logs to an OTLP endpoint
You can stream the same logs that appear in the Baseten UI to any backend that accepts OTLP over HTTP, including Honeycomb, Datadog, and Grafana Cloud. Once configured, every new log line is forwarded to your endpoint in near real time, so you can build dashboards, alerts, and long-term retention on top of your inference traffic without scraping the UI.Log export is rolling out gradually. If the OTEL connection card isn’t visible in your settings, contact Baseten support to enable it for your organization.
What gets exported
The exporter forwards every log you would see in the Baseten UI, which includes:- Build logs: image builds for new deployments.
- Deploy and promotion logs: lifecycle events emitted as a deployment activates, scales, or is promoted to an environment.
- Serving logs: stdout and stderr from your model replicas, including anything you write through Python’s
loggingmodule.
LogRecord with service.name = "baseten" and an allowlisted set of attributes:
| Attribute | Description |
|---|---|
message | The log line. |
model_id | Stable ID of the model the log came from. |
model_version_id | Deployment (model version) the log came from. |
environment | Environment name, such as production or staging, when the deployment is attached to one. |
replica | Replica ID for serving logs. |
request_id | Per-prediction request ID. Matches the X-Baseten-Request-Id header. |
training_job_id | Training job ID for training logs. |
chainlet_id | Chainlet ID for Chains. |
exc_info | Formatted Python traceback, when the log carries an exception. |
SeverityNumber and SeverityText (DEBUG, INFO, WARN, ERROR, FATAL). Internal labels that aren’t on the allowlist are stripped before export so your backend only receives the same fields you see in the UI.
Exports start from the moment the connection is enabled. Historical logs are not backfilled, and delivery is best-effort: Baseten retries transient failures with exponential backoff, but records can be dropped if your endpoint is unreachable for an extended period.
Configure a connection
Each Baseten organization can have one OTLP destination at a time.Add a connection
Click Add connection and fill in:
- Endpoint URL: The full URL of your OTLP/HTTP logs receiver, including the path (typically
/v1/logs). See the integration notes below for per-vendor examples. - Header name: The HTTP header your backend uses to authenticate.
- Header value: The credential for that header. The value is stored encrypted and never displayed again after you save it.
Save and verify
Save the connection. New log records start flowing to your endpoint within a few seconds. Click Test on the saved connection to send a probe log and confirm the endpoint and credentials are accepted. For an end-to-end check, send a prediction to a deployment and look for its request ID in your backend.
Integration notes
The endpoint and header values below come from each vendor’s OTLP/HTTP documentation. Check those docs for the most current values for your account and region.- Honeycomb
- Datadog
- Grafana Cloud
Honeycomb accepts OTLP/HTTP at
https://api.honeycomb.io/v1/logs (or a region-specific host such as https://api.eu1.honeycomb.io/v1/logs). Authenticate with an ingest API key:- Endpoint URL:
https://api.honeycomb.io/v1/logs - Header name:
x-honeycomb-team - Header value: Your Honeycomb ingest API key.
service.name, logs land in a dataset named baseten. Honeycomb Classic accounts and other dataset-routing setups may differ. See Honeycomb’s OTLP/HTTP reference for dataset routing and regional endpoints.