Private Hugging Face model
Load a model that requires authentication with Hugging Face
Summary
To load a gated or private model from Hugging Face:
- Create an access token on your Hugging Face account.
- Add the
hf_access_token
key to yourconfig.yaml
secrets and value to your Baseten account. - Add
use_auth_token
to the appropriate line inmodel.py
.
Example code:
Step-by-step example
BERT base (uncased) is a masked language model that can be used to infer missing words in a sentence.
While the model is publicly available on Hugging Face, we copied it into a gated model to use in this tutorial. The process is the same for using a gated model as it is for a private model.
You can see the code for the finished private model Truss on the right. Keep reading for step-by-step instructions on how to build it.
This example will cover:
- Implementing a
transformers.pipeline
model in Truss - Securely accessing secrets in your model server
- Using a gated or private model with an access token
Step 0: Initialize Truss
Get started by creating a new Truss:
truss init private-bert
Give your model a name when prompted, like Private Model Demo
. Then, navigate to the newly created directory:
cd private-bert
Step 1: Implement the Model
class
BERT base (uncased) is a pipeline model, so it is straightforward to implement in Truss.
In model/model.py
, we write the class Model
with three member functions:
__init__
, which creates an instance of the object with a_model
propertyload
, which runs once when the model server is spun up and loads thepipeline
modelpredict
, which runs each time the model is invoked and handles the inference. It can use any JSON-serializable type as input and output.
Read the quickstart guide for more details on Model
class implementation.
from transformers import pipeline
class Model:
def __init__(self, **kwargs) -> None:
self._secrets = kwargs["secrets"]
self._model = None
def load(self):
self._model = pipeline(
"fill-mask",
model="baseten/docs-example-gated-model"
)
def predict(self, model_input):
return self._model(model_input)
Step 2: Set Python dependencies
Now, we can turn our attention to configuring the model server in config.yaml
.
BERT base (uncased) has two dependencies:
requirements:
- torch==2.0.1
- transformers==4.30.2
Always pin exact versions for your Python dependencies. The ML/AI space moves fast, so you want to have an up-to-date version of each package while also being protected from breaking changes.
Step 3: Set required secret
Now itโs time to mix in access to the gated model:
- Go to the model page on Hugging Face and accept the terms to access the model.
- Create an access token on your Hugging Face account.
- Add the
hf_access_token
key and value to your Baseten workspace secret manager. - In your
config.yaml
, add the keyhf_access_token
:
secrets:
hf_access_token: null
Never set the actual value of a secret in the config.yaml
file. Only put secret values in secure places, like the Baseten workspace secret manager.
Step 4: Use access token in load
In model/model.py
, you can give your model access to secrets in the init function:
def __init__(self, **kwargs) -> None:
self._secrets = kwargs["secrets"]
self._model = None
Then, update the load function with use_auth_token
:
self._model = pipeline(
"fill-mask",
model="baseten/docs-example-gated-model",
use_auth_token=self._secrets["hf_access_token"]
)
This will allow the pipeline
function to load the specified model from Hugging Face.
Step 5: Deploy the model
Youโll need a Baseten API key for this step.
We have successfully packaged a gated model as a Truss. Letโs deploy!
Use --trusted
with truss push
to give the model server access to secrets stored on the remote host.
truss push --trusted
Wait for the model to finish deployment before invoking.
You can invoke the model with:
truss predict -d '"It is a [MASK] world"'