To load a gated or private model from Hugging Face, follow these steps:

  1. Create an access token in your Hugging Face account.
  2. Store the token securely as hf_access_token in Baseten’s secret manager.
  3. Reference the token in model.py using use_auth_token.

Configuring Secrets

Add the token reference to config.yaml:

config.yaml
secrets:
  hf_access_token: null

Then, update model.py:

model/model.py
from transformers import pipeline

class Model:
    def __init__(self, **kwargs) -> None:
        self._secrets = kwargs["secrets"]
        self._model = None

    def load(self):
        self._model = pipeline(
            "fill-mask",
            model="baseten/docs-example-gated-model",
            use_auth_token=self._secrets["hf_access_token"]
        )

    def predict(self, model_input):
        return self._model(model_input)

Never store secrets directly in config.yaml — use the Baseten secret manager.

Example: Deploying a Private BERT Model

We’ll deploy a gated version of BERT base (uncased), a masked language model.

1: Initialize Truss

Get started by creating a new Truss:

truss init private-bert && cd private-bert

Step 2: Set Dependencies

config.yaml
requirements:
  - torch==2.0.1
  - transformers==4.30.2

Step 3: Store the Access Token

4. Deploy and invoke

truss push

Invoke the model:

truss predict -d '"It is a [MASK] world"'

Was this page helpful?