Links

Loading model weights and data

Bundle model weights and other data with your ML model.
Serving a model requires loading data like model weights and tokenizers. This data is stored in files, which can be many gigabytes for larger models.
For many models, this data comes from HuggingFace. You can also bundle the data directly with your Truss, even if the files are stored elsewhere, like AWS S3. And model data may be private. Baseten supports private models from both HuggingFace and external data stores like AWS S3.

Public models on HuggingFace

This is the standard approach when using the transformers library and requires no configuration. From the WizardLM example:
def load(self):
base_model = "TheBloke/wizardLM-7B-HF"
self._tokenizer = LlamaTokenizer.from_pretrained(base_model)
self._model = LlamaForCausalLM.from_pretrained(
base_model,
load_in_8bit=False,
torch_dtype=torch.float16,
device_map="auto",
)
This load() function will pull the public weights and tokenizer from HuggingFace when the model is loaded on the model server. No additional configuration is needed.

Private models on HuggingFace

Often, model files on HuggingFace require an access token to download. Fortunately, it's easy to securely authenticate your model on Baseten with HuggingFace.
First, add your access token to Baseten:
  1. 1.
    Create a user access token from your HuggingFace account settings.
  2. 2.
    Create a secret in your Baseten account with the key hf_access_token. Set the value to your access token (the string of letters starting with hf_).
Then, enable your model to use the token. First, add the secret key to your config.yaml by updating the secrets field as follows:
secrets:
hf_access_token: null
Do NOT include the actual value of your HuggingFace access token or any other secret in your config.yaml. Only include the secret's name, and set the value to null.
You can now load model data like weights and tokenizers that require an access key in your model/model.py file by using the use_auth_token parameter. Here it is in context:
class Model:
def __init__(self, **kwargs) -> None:
self._secrets = kwargs.get("secrets")
self._tokenizer = None
self._model = None
def load(self):
self._tokenizer = T5Tokenizer.from_pretrained(
"google/flan-t5-xl", use_auth_token=self._secrets["hf_access_token"]
)
self._model = T5ForConditionalGeneration.from_pretrained(
"google/flan-t5-xl",
device_map="auto",
use_auth_token=self._secrets["hf_access_token"],
)
When you're ready to deploy your model, make sure to pass is_trusted=True to baseten.deploy():
import baseten
import truss
my_model = truss.load("my-model")
baseten.deploy(
my_model,
model_name="My model",
is_trusted=True
)
For further details, see docs on using secrets in models.

Bundling data with the model

You can bundle model data directly with your model in Truss. To do so, use the Truss' data folder to store any necessary files.
Here's an example of the data folder for a Truss of Stable Diffusion 2.1.
data/
scheduler/
scheduler_config.json
text_encoder/
config.json
diffusion_pytorch_model.bin
tokenizer/
merges.txt
special_tokens_map.json
tokenizer_config.json
vocab.json
unet/
config.json
diffusion_pytorch_model.bin
vae/
config.json
diffusion_pytorch_model.bin
model_index.json
To access the data in the model, use the self._data_dir() parameter in the load() function of model/model.py:
class Model:
def __init__(self, **kwargs) -> None:
self._data_dir = kwargs["data_dir"]
def load(self):
self.model = StableDiffusionPipeline.from_pretrained(
str(self._data_dir), # Set to "data" by default from config.yaml
revision="fp16",
torch_dtype=torch.float16,
).to("cuda")

Accessing large files in S3

Bundling multi-gigabyte files with your Truss can be difficult if you have limited local storage. As a workaround, we allow you to store these larger files externally. You can store your model weights and other files in file stores like S3.
Some example Trusses for models in Baseten's model library take advantage of S3 for faster model load times.
Using files from S3 requires four steps:
  1. 1.
    Uploading the content of your data directory to S3
  2. 2.
    Setting external_data in config.yaml
  3. 3.
    Removing unneeded files from the data directory
  4. 4.
    Accessing data correctly in the model
Here's an example of that setup for Stable Diffusion, where we have already uploaded the content of our data/ directory to S3.
First, add the URLs for hosted versions of the large files to config.yaml:
external_data:
- url: https://baseten-public.s3.us-west-2.amazonaws.com/models/stable-diffusion-truss/unet/diffusion_pytorch_model.bin
local_data_path: unet/diffusion_pytorch_model.bin
- url: https://baseten-public.s3.us-west-2.amazonaws.com/models/stable-diffusion-truss/text_encoder/pytorch_model.bin
local_data_path: text_encoder/pytorch_model.bin
- url: https://baseten-public.s3.us-west-2.amazonaws.com/models/stable-diffusion-truss/vae/diffusion_pytorch_model.bin
local_data_path: vae/diffusion_pytorch_model.bin
Each URL matches with a local data path that represents where the model data would be stored if everything was bundled together locally. This is how your model code will know where to look for the data.
Then, get rid of the large files from your data folder. The Stable Diffusion Truss has the following directory structure after large files are removed:
data/
scheduler/
scheduler_config.json
text_encoder/
config.json
tokenizer/
merges.txt
special_tokens_map.json
tokenizer_config.json
vocab.json
unet/
config.json
vae/
config.json
model_index.json
The code in model/model.py does not need to be changed and will automatically pull the large files from the provided links.

Accessing private data from S3

If your model weights are proprietary, you'll be storing them in a private S3 bucket or similar access-restricted data store. Accessing these model files works exactly the same as above, but first uses secrets to securely authenticate your model with the data store.
First, set the following secrets in config.yaml. Set the values to null, only the keys are needed here.
secrets:
aws_access_key_id: null
aws_secret_access_key: null
aws_region: null # e.g. us-east-1
aws_bucket: null
Then, add secrets to your Baseten account for your AWS access key id, secret access key, region, and bucket. This time, use the actual values as they will be securely stored and provided to your model at runtime.
In your model code, authenticate with AWS in the __init__() function:
def __init__(self, **kwargs) -> None:
self._config = kwargs.get("config")
secrets = kwargs.get("secrets")
self.s3_config = (
{
"aws_access_key_id": secrets["aws_access_key_id"],
"aws_secret_access_key": secrets["aws_secret_access_key"],
"aws_region": secrets["aws_region"],
}
)
self.s3_bucket = (secrets["aws_bucket"])
You can then use the boto3 package to access your model weights in load().
When you're ready to deploy your model, make sure to pass is_trusted=True to baseten.deploy():
import baseten
import truss
my_model = truss.load("my-model")
baseten.deploy(
my_model,
model_name="My model",
is_trusted=True
)
For further details, see docs on using secrets in models.