Private Hugging Face model
Load a model that requires authentication with Hugging Face
Summary
To load a gated or private model from Hugging Face:
- Create an access token on your Hugging Face account.
- Add the
hf_access_token
key to yourconfig.yaml
secrets and value to your Baseten account. - Add
use_auth_token
to the appropriate line inmodel.py
.
Example code:
Step-by-step example
BERT base (uncased) is a masked language model that can be used to infer missing words in a sentence.
While the model is publicly available on Hugging Face, we copied it into a gated model to use in this tutorial. The process is the same for using a gated model as it is for a private model.
You can see the code for the finished private model Truss on the right. Keep reading for step-by-step instructions on how to build it.
This example will cover:
- Implementing a
transformers.pipeline
model in Truss - Securely accessing secrets in your model server
- Using a gated or private model with an access token
Step 0: Initialize Truss
Get started by creating a new Truss:
Give your model a name when prompted, like Private Model Demo
. Then, navigate to the newly created directory:
Step 1: Implement the Model
class
BERT base (uncased) is a pipeline model, so it is straightforward to implement in Truss.
In model/model.py
, we write the class Model
with three member functions:
__init__
, which creates an instance of the object with a_model
propertyload
, which runs once when the model server is spun up and loads thepipeline
modelpredict
, which runs each time the model is invoked and handles the inference. It can use any JSON-serializable type as input and output.
Read the quickstart guide for more details on Model
class implementation.
Step 2: Set Python dependencies
Now, we can turn our attention to configuring the model server in config.yaml
.
BERT base (uncased) has two dependencies:
Always pin exact versions for your Python dependencies. The ML/AI space moves fast, so you want to have an up-to-date version of each package while also being protected from breaking changes.
Step 3: Set required secret
Now itβs time to mix in access to the gated model:
- Go to the model page on Hugging Face and accept the terms to access the model.
- Create an access token on your Hugging Face account.
- Add the
hf_access_token
key and value to your Baseten workspace secret manager. - In your
config.yaml
, add the keyhf_access_token
:
Never set the actual value of a secret in the config.yaml
file. Only put secret values in secure places, like the Baseten workspace secret manager.
Step 4: Use access token in load
In model/model.py
, you can give your model access to secrets in the init function:
Then, update the load function with use_auth_token
:
This will allow the pipeline
function to load the specified model from Hugging Face.
Step 5: Deploy the model
Youβll need a Baseten API key for this step.
We have successfully packaged a gated model as a Truss. Letβs deploy!
Use --trusted
with truss push
to give the model server access to secrets stored on the remote host.
Wait for the model to finish deployment before invoking.
You can invoke the model with: