Links

Model logs

Review model logs to debug deployment or operation issues.
From the moment you deploy a model, we provide comprehensive logs to help you debug any issues. Use live view to see streaming logs in real time, or use the date and time picker to review logs for a specific time range.
Model logs

Troubleshooting tips

Here are a few common errors and their root causes:

Missing dependencies

Error log:
The build fails with the following error:
Exception while loading model
Traceback ...(omitted for brevity)
ModuleNotFoundError: No module named 'my-dependency'
Solution:
Make sure all model requirements are specified in your Truss' config.yaml. Here's an example for WizardLM:
requirements:
- accelerate==0.20.3
- bitsandbytes==0.39.1
- peft==0.3.0
- protobuf==4.23.3
- sentencepiece==0.1.99
- torch==2.0.1
- transformers==4.30.2

Out of memory

Error log:
A model fails during deployment or invocation with:
torch.cuda.OutOfMemoryError: CUDA out of memory.
Or:
Model terminated unexpectedly. This model does not have enough resources to run. Try upgrading to a larger instance type. Exit code: 0, reason: OOMKilled, restart count: 1
Solution:
Ensure that your hardware requirements are sufficient. In particular, make sure the model has access to a GPU if needed and that the GPU selected has enough VRAM to load the model weights.
Model resources are set in your Truss during the initial deployment and can be updated on the model page on Baseten. Here's an example model resource config for WizardLM:
resources:
cpu: "3"
memory: 14Gi
use_gpu: true
accelerator: A10G

Server crash restart loop

Error log:
Loading a model gets stuck in a cycle of failing and retrying with the following error:
Inference server seems to have crashed, restarting
Solution:
Ensure that your hardware requirements are sufficient. In particular, make sure the model has access to a GPU if needed and that the GPU selected has enough VRAM to load the model weights.
Model resources are set in your Truss during the initial deployment and can be updated on the model page on Baseten. Here's an example model resource config for WizardLM:
resources:
cpu: "3"
memory: 14Gi
use_gpu: true
accelerator: A10G

Live but not loaded

Error log:
When invoking a draft model that successfully deployed, you get the following error as a response:
ERROR Failed to invoke model. Model version 32pvyxq is currently being loaded, please retry in a bit.
Solution:
Wait to invoke the model until you see the following log line:
Completed model.load() execution in 68264 ms
A draft model will not show as active in the Baseten UI until model.load() has completed.