Get statistics on your model's traffic and resource utilization.
To review statistics on your model's traffic and resource utilization, just scroll down on the model page. Model health information is unavailable for pre-trained models.
Focus your search by selecting a time range — anything from the last five minutes to the last week — from the dropdown.
At a glance, you can reference two important stats for the selected time range: average requests per minute and average (50th-percentile) response time. If your model is under unusually high load, response time may be slower unless you provision more resources.
Prediction volume and response time stats
This graph shows the number of requests per minute made to your deployed model. Use this in conjunction with the below graphs to ensure that your model is handling the desired load.
Prediction volume graph showing requests over time
Monitor your model's 50th-, 90th-, 95th-, and 99th-percentile response time for incoming requests. Measured in milliseconds, a higher response time means that your users are experiencing a slower application.
Response time graph showing 50th-, 90th-, 95th-, and 99th-percentile response times
Use this graph to examine the total CPU and memory usage across all replicas of your model. If the CPU or RAM used are consistently near the provisioned capacity, update your model's resources to scale as needed.
Usage graph showing CPU and RAM utilization
This graph shows the number of active replicas of your model. Replica count autoscales in response to demand on your model. You can adjust the replica scaling range by updating your model's resources
Usage graph showing replica count