LLM
Building an LLM
View on Github
In this example, we go through a Truss that serves an LLM. We use the model Mistral-7B, which is a general-purpose LLM that can used for a variety of tasks, like summarization, question-answering, translation, and others.
Set up the imports and key constants
In this example, we use the Huggingface transformers library to build a text generation model.
We use the 7B version of the Mistral model.
Define the Model
class and load function
In the load
function of the Truss, we implement logic involved in
downloading and setting up the model. For this LLM, we use the Auto
classes in transformers
to instantiate our Mistral model.
Define the predict
function
In the predict function, we implement the actual inference logic. The steps here are:
- Set up the generation params. We have defaults for both of these, but adjusting the values will have an impact on the model output
- Tokenize the input
- Generate the output
- Use tokenizer to decode the output
Setting up the config.yaml
Running Mistral 7B requires a few libraries, such as
torch
, transformers
and a couple others.
Configure resources for Mistral
Note that we need an A10G to run this model.
Deploy the model
Deploy the model like you would other Trusses, with:
You can then invoke the model with: