Skip to main content
Exporting metrics is in beta mode.

baseten_inference_requests_total

Cumulative number of requests to the model. Type: counter Labels:
model_id
label
required
The ID of the model.
model_name
label
required
The name of the model.
deployment_id
label
required
The ID of the deployment.
status_code
label
required
The status code of the response.
is_async
label
required
Whether the request was an async inference request.
environment
label
The environment that the deployment corresponds to. Empty if the deployment is not associated with an environment.
rollout_phase
label
The phase of the deployment in the promote to production process. Empty if the deployment is not associated with an environment.Possible values:
  • "promoting"
  • "stable"

baseten_end_to_end_response_time_seconds

End-to-end response time in seconds. Type: histogram Labels:
model_id
label
required
The ID of the model.
model_name
label
required
The name of the model.
deployment_id
label
required
The ID of the deployment.
status_code
label
required
The status code of the response.
is_async
label
required
Whether the request was an async inference request.
environment
label
The environment that the deployment corresponds to. Empty if the deployment is not associated with an environment.
rollout_phase
label
The phase of the deployment in the promote to production process. Empty if the deployment is not associated with an environment.Possible values:
  • "promoting"
  • "stable"

baseten_container_cpu_usage_seconds_total

Cumulative CPU time consumed by the container in core-seconds. Type: counter Labels:
model_id
label
required
The ID of the model.
model_name
label
required
The name of the model.
deployment_id
label
required
The ID of the deployment.
replica
label
required
The ID of the replica.
environment
label
The environment that the deployment corresponds to. Empty if the deployment is not associated with an environment.
rollout_phase
label
The phase of the deployment in the promote to production process. Empty if the deployment is not associated with an environment.Possible values:
  • "promoting"
  • "stable"

baseten_replicas_active

Number of replicas ready to serve model requests. Type: gauge Labels:
model_id
label
required
The ID of the model.
model_name
label
required
The name of the model.
deployment_id
label
required
The ID of the deployment.
environment
label
The environment that the deployment corresponds to. Empty if the deployment is not associated with an environment.
rollout_phase
label
The phase of the deployment in the promote to production process. Empty if the deployment is not associated with an environment.Possible values:
  • "promoting"
  • "stable"

baseten_replicas_starting

Number of replicas starting up—i.e. either waiting for resources to be available or loading the model. Type: gauge Labels:
model_id
label
required
The ID of the model.
model_name
label
required
The name of the model.
deployment_id
label
required
The ID of the deployment.
environment
label
The environment that the deployment corresponds to. Empty if the deployment is not associated with an environment.
rollout_phase
label
The phase of the deployment in the promote to production process. Empty if the deployment is not associated with an environment.Possible values:
  • "promoting"
  • "stable"

baseten_container_cpu_memory_working_set_bytes

Working set memory usage of the container in bytes. Type: gauge Labels:
model_id
label
required
The ID of the model.
model_name
label
required
The name of the model.
deployment_id
label
required
The ID of the deployment.
replica
label
required
The ID of the replica.
environment
label
The environment that the deployment corresponds to. Empty if the deployment is not associated with an environment.
rollout_phase
label
The phase of the deployment in the promote to production process. Empty if the deployment is not associated with an environment.Possible values:
  • "promoting"
  • "stable"

baseten_request_size_bytes

Request size in bytes. Proxy for input tokens. Type: histogram Labels:
model_id
label
required
The ID of the model.
model_name
label
required
The name of the model.
deployment_id
label
required
The ID of the deployment.
status_code
label
required
The status code of the response.
is_async
label
required
Whether the request was an async inference request.
environment
label
The environment that the deployment corresponds to. Empty if the deployment is not associated with an environment.
rollout_phase
label
The phase of the deployment in the promote to production process. Empty if the deployment is not associated with an environment.Possible values:
  • "promoting"
  • "stable"

baseten_response_size_bytes

Response size in bytes. Proxy for generated tokens. Type: histogram Labels:
model_id
label
required
The ID of the model.
model_name
label
required
The name of the model.
deployment_id
label
required
The ID of the deployment.
status_code
label
required
The status code of the response.
is_async
label
required
Whether the request was an async inference request.
environment
label
The environment that the deployment corresponds to. Empty if the deployment is not associated with an environment.
rollout_phase
label
The phase of the deployment in the promote to production process. Empty if the deployment is not associated with an environment.Possible values:
  • "promoting"
  • "stable"

baseten_time_to_first_byte_seconds

Time to first byte/write in seconds. Proxy for time-to-first-token (TTFT). Type: histogram Labels:
model_id
label
required
The ID of the model.
model_name
label
required
The name of the model.
deployment_id
label
required
The ID of the deployment.
status_code
label
required
The status code of the response.
is_async
label
required
Whether the request was an async inference request.
environment
label
The environment that the deployment corresponds to. Empty if the deployment is not associated with an environment.
rollout_phase
label
The phase of the deployment in the promote to production process. Empty if the deployment is not associated with an environment.Possible values:
  • "promoting"
  • "stable"

baseten_time_in_async_queue_seconds

Time async requests spend queued before processing. Type: histogram Labels:
model_id
label
required
The ID of the model.
model_name
label
required
The name of the model.
deployment_id
label
required
The ID of the deployment.
environment
label
The environment that the deployment corresponds to. Empty if the deployment is not associated with an environment.
rollout_phase
label
The phase of the deployment in the promote to production process. Empty if the deployment is not associated with an environment.Possible values:
  • "promoting"
  • "stable"

baseten_async_queue_size

Number of queued async requests over time. Type: gauge Labels:
model_id
label
required
The ID of the model.
model_name
label
required
The name of the model.
deployment_id
label
required
The ID of the deployment.
environment
label
The environment that the deployment corresponds to. Empty if the deployment is not associated with an environment.
rollout_phase
label
The phase of the deployment in the promote to production process. Empty if the deployment is not associated with an environment.Possible values:
  • "promoting"
  • "stable"

baseten_gpu_memory_used

GPU memory used in MiB. Type: gauge Labels:
model_id
label
required
The ID of the model.
model_name
label
required
The name of the model.
deployment_id
label
required
The ID of the deployment.
replica
label
required
The ID of the replica.
gpu
label
required
The ID of the GPU.
environment
label
The environment that the deployment corresponds to. Empty if the deployment is not associated with an environment.
rollout_phase
label
The phase of the deployment in the promote to production process. Empty if the deployment is not associated with an environment.Possible values:
  • "promoting"
  • "stable"

baseten_gpu_utilization

GPU utilization as a percentage (between 0 and 100). Type: gauge Labels:
model_id
label
required
The ID of the model.
model_name
label
required
The name of the model.
deployment_id
label
required
The ID of the deployment.
replica
label
required
The ID of the replica.
gpu
label
required
The ID of the GPU.
environment
label
The environment that the deployment corresponds to. Empty if the deployment is not associated with an environment.
rollout_phase
label
The phase of the deployment in the promote to production process. Empty if the deployment is not associated with an environment.Possible values:
  • "promoting"
  • "stable"