Model health
Get statistics on your model's traffic and resource utilization.
To review statistics on your model's traffic and resource utilization, visit the "Health" tab on a custom model's page. The model health tab is only available for your own models deployed on Baseten, not pre-trained models.
Focus your search by selecting a time range — anything from the last five minutes to the last week — from the dropdown.
At a glance, you can reference two important stats for the selected time range: average requests per minute and average 50th-percentile response time. If your model is under unusually high load, response time may be slower unless you provision more resources.
Prediction volume and response time stats

Prediction volume

This graph shows the number of requests per minute made to your deployed model. Use this in conjunction with the below graphs to ensure that your model is handling the desired load.
Prediction volume graph showing requests over time

Response time

Monitor your model's 50th-, 90th-, 95th-, and 99th-percentile response time for incoming requests. Measured in milliseconds, a higher response time means that your users are experiencing a slower application.
Response time graph showing 50th-, 90th-, 95th-, and 99th-percentile response times

CPU and memory usage

Use this graph to examine the total CPU and memory usage across all replicas of your model. If the CPU or RAM used are consistently near the provisioned capacity, click "Support" in the left-side navbar to request additional model resources.
Usage graph showing CPU and RAM utilization
Copy link
Outline
Prediction volume
Response time
CPU and memory usage