When you need custom preprocessing, postprocessing, or want to run a model that isn’t supported by Baseten’s built-in engines, you can write Python code in aDocumentation Index
Fetch the complete documentation index at: https://docs.baseten.co/llms.txt
Use this file to discover all available pages before exploring further.
model.py file. Truss provides a Model class with three methods (__init__, load, and predict) that give you full control over how your model initializes, loads weights, and handles requests.
Most deployments don’t need custom Python at all. If you’re deploying a supported open-source model, see Build your first model for the config-only approach. Use custom model code when you need to:
- Run a model architecture that Baseten’s engines don’t support.
- Add custom preprocessing or postprocessing around inference.
- Combine multiple models or libraries in a single endpoint.
Prerequisites
You need uv installed and a Baseten account with an API key.Initialize your model
Create a new Truss project withtruss init.
config.yaml: Configuration for dependencies, resources, and deployment settings.model/model.py: Your model code.packages/: Optional local Python packages.data/: Optional data files bundled with your model.
config.yaml
Theconfig.yaml file configures dependencies, resources, and other settings. Here’s the default:
config.yaml
truss init writes python_version based on your local Python (3.9 through 3.13). use_gpu is computed from accelerator; setting it manually has no effect.requirements: Python packages installed at build time (pip format).resources: CPU, memory, and GPU allocation.secrets: Secret names your model needs at runtime, such as Hugging Face API keys.
model.py
Themodel.py file defines a Model class with three methods:
__init__: Runs when the class is created. Initialize variables and store configuration here.load: Runs once at startup, before any requests. Load model weights, tokenizers, and other heavy resources here. Separating this from__init__keeps expensive operations out of the request path.predict: Runs on every API request. Process input, run inference, and return the response.
Give your model access to files
Most models need additional files at runtime, such as weights, tokenizers, configs, or reference datasets. For local files under ~1 GB total, bundle them in your Truss’sdata/ directory and access them in __init__ through kwargs["data_dir"]:
model.py
Deploy your model
Deploy withtruss push --watch.
Invoke your model
After deployment, call your model at the invocation URL:Example: text classification
To see theModel class in action, deploy a text classification model from Hugging Face using the transformers library.
Update config.yaml
Addtransformers and torch as dependencies:
config.yaml
Update model.py
Load the classification pipeline inload and run it in predict:
model.py
Deploy and call
Deploy withtruss push --watch, then call the endpoint:
Next steps
- Implementation: Core
Modellifecycle, method signatures, and sync vs. asyncpredictpatterns. - HTTP endpoints: Add
chat_completions,completions,embeddings,messages, orresponsesto serve matching/v1/*routes from custom model code. - Streaming output: Return generated output incrementally instead of waiting for the full response.
- Custom health checks: Define readiness and liveness behavior for custom model logic.
- Configuration: Full reference for
config.yamloptions. - Build your first model: Deploy a model with just a config file, no custom Python needed.