Setup
To get started, sign into Baseten with Truss and then install the Pythonrequests library.
Sign in to Baseten
Install requests
Hardware
H100
Engine
TRT-LLM
Write the config
Create and move into the project directory:config.yaml and paste the following:
config.yaml
Key parameters
Baseten Embeddings Inference (BEI) reads these fields from thetrt_llm block. Each one shapes how the engine is built and served:
| Parameter | Value |
|---|---|
| Quantization | fp8 |
| Base model type | encoder |
Deploy
Push the config to Baseten:truss push output (abcd1234 in the example). Use it wherever you see {model_id} in the next section.
Call the model
Your deployment exposes a cross-encoder scoring endpoint at/predict. Replace {model_id} with your model ID and make sure BASETEN_API_KEY is set.
Now call your deployment to score candidates:
- Python
- cURL
main.py