> ## Documentation Index
> Fetch the complete documentation index at: https://docs.baseten.co/llms.txt
> Use this file to discover all available pages before exploring further.

# Resources

> Manage and configure model resources

Every AI/ML model on Baseten runs on an **instance**, a dedicated set of hardware allocated to the model server. Selecting the right instance type ensures **optimal performance** while controlling **compute costs**.

* **Insufficient resources**: Slow inference or failures.
* **Excess resources**: Higher costs without added benefit.

<img noZoom src="https://mintcdn.com/baseten-preview/W3NbEem9OZkF5rdB/images/deployment-resources.png?fit=max&auto=format&n=W3NbEem9OZkF5rdB&q=85&s=028f7b8a7c5e1d92f0b55f2eec8aad11" width="964" height="552" data-path="images/deployment-resources.png" />

## Instance type resource components

* **Instance**: The allocated hardware for inference.
* **Node**: The compute unit within an instance, comprising 8 GPUs with associated vCPU, RAM, and VRAM.
* **vCPU**: Virtual CPU cores for general computing.
* **RAM**: Memory available to the CPU.
* **GPU**: Specialized hardware for accelerated ML workloads.
* **VRAM**: Dedicated GPU memory for model execution.

***

# Configuring model resources

Define resources **before deployment** in Truss or **adjust them later** via the Baseten UI.

### Defining resources in Truss

Define resource requirements in [`config.yaml`](/development/model/configuration) before running `truss push`.

* **Published deployment** (`truss push`): Creates a new deployment (named sequentially: `deployment-1`, `deployment-2`, etc.) using the resources in [`config.yaml`](/development/model/configuration).
* **Development deployment** (`truss push --watch`): Overwrites the existing development deployment with the specified resource configuration and starts watching for changes. Use [`truss watch`](/development/model/deploy-and-iterate) to resume watching an existing development deployment.
* **Production deployment** (`truss push --promote`): Creates a new deployment and promotes it to production, replacing the active deployment.
* **Environment deployment** (`truss push --environment <name>`): Deploys directly to a [custom environment](/deployment/environments) like staging.

<Info>
  Changes to `config.yaml` only affect new deployments. To update resources on an existing published deployment, edit resources in the [Baseten UI](#updating-resources-in-the-baseten-ui).
</Info>

You can configure resources in two ways:

**Option 1: Specify individual resource fields**

```yaml config.yaml theme={"system"}
resources:
  accelerator: L4
  cpu: "4"
  memory: 16Gi
```

Baseten provisions the **smallest instance that meets the specified constraints**:

* cpu: "3" or "4" → Maps to a 4-core instance.
* cpu: "5" to "8" → Maps to an 8-core instance.

<Info>
  `Gi` in `resources.memory` refers to **Gibibytes**, which are slightly larger
  than **Gigabytes**.
</Info>

**Option 2: Specify an exact instance type**

An instance type is the full SKU name that uniquely identifies a specific hardware configuration. When you specify individual resource fields like `cpu` and `accelerator`, Baseten selects the smallest instance that meets your requirements. With `instance_type`, you specify exactly which instance you want, no guessing required.

Use `instance_type` when you:

* Know the exact hardware configuration you need.
* Want to ensure consistent instance selection across deployments.
* Are following a recommendation for a specific model (for example, "use an L4 with 4 vCPUs and 16 GiB RAM").

```yaml config.yaml theme={"system"}
resources:
  instance_type: "L4:4x16"
```

The format encodes the hardware specs: `<GPU>:<vCPU>x<MEMORY>`. For example, `L4:4x16` means an L4 GPU with 4 vCPUs and 16 GiB of RAM. When `instance_type` is specified, other resource fields (`cpu`, `memory`, `accelerator`, `use_gpu`) are ignored.

### Updating resources in the Baseten UI

Once deployed, you can only update resource configurations **through the Baseten UI**. Changing the instance type deploys a copy of the deployment using the specified instance type.

For a list of available instance types, see the [instance type reference](/deployment/resources#instance-type-reference).

***

# Instance type reference

Specs and benchmarks for every Baseten instance type.

Choosing the right instance for model inference means balancing performance and cost. This page lists all available instance types on Baseten to help you deploy and serve models effectively.

## CPU-only instances

Cost-effective options for lighter workloads. No GPU.

* **Starts at**: \$0.00058/min
* **Best for**: Transformers pipelines, small QA models, text embeddings

| Instance | \$/min    | vCPU | RAM    |
| -------- | --------- | ---- | ------ |
| 1x2      | \$0.00058 | 1    | 2 GiB  |
| 1x4      | \$0.00086 | 1    | 4 GiB  |
| 2x8      | \$0.00173 | 2    | 8 GiB  |
| 4x16     | \$0.00346 | 4    | 16 GiB |
| 8x32     | \$0.00691 | 8    | 32 GiB |
| 16x64    | \$0.01382 | 16   | 64 GiB |

To select a CPU-only instance, use the bare `<vCPU>x<MEMORY>` SKU (for example, `instance_type: "4x16"`).

**Example workloads:**

* `1x2`: Text classification (for example, Truss quickstart)
* `4x16`: LayoutLM Document QA
* `4x16+`: Sentence Transformers embeddings on larger corpora

## GPU instances

Accelerated inference for LLMs, diffusion models, and Whisper.

| Instance       | \$/min    | vCPU | RAM      | GPU                    | VRAM     |
| -------------- | --------- | ---- | -------- | ---------------------- | -------- |
| T4x4x16        | \$0.01052 | 4    | 16 GiB   | NVIDIA T4              | 16 GiB   |
| T4x8x32        | \$0.01504 | 8    | 32 GiB   | NVIDIA T4              | 16 GiB   |
| T4x16x64       | \$0.02408 | 16   | 64 GiB   | NVIDIA T4              | 16 GiB   |
| T4:2x24x96     | \$0.03912 | 24   | 96 GiB   | 2 NVIDIA T4s           | 32 GiB   |
| T4:4x48x192    | \$0.07824 | 48   | 192 GiB  | 4 NVIDIA T4s           | 64 GiB   |
| L4:4x16        | \$0.01414 | 4    | 16 GiB   | NVIDIA L4              | 24 GiB   |
| L4:2x24x96     | \$0.04002 | 24   | 96 GiB   | 2 NVIDIA L4s           | 48 GiB   |
| L4:4x48x192    | \$0.08003 | 48   | 192 GiB  | 4 NVIDIA L4s           | 96 GiB   |
| A10Gx4x16      | \$0.02012 | 4    | 16 GiB   | NVIDIA A10G            | 24 GiB   |
| A10Gx8x32      | \$0.02424 | 8    | 32 GiB   | NVIDIA A10G            | 24 GiB   |
| A10Gx16x64     | \$0.03248 | 16   | 64 GiB   | NVIDIA A10G            | 24 GiB   |
| A10G:2x24x96   | \$0.05672 | 24   | 96 GiB   | 2 NVIDIA A10Gs         | 48 GiB   |
| A10G:4x48x192  | \$0.11344 | 48   | 192 GiB  | 4 NVIDIA A10Gs         | 96 GiB   |
| A10G:8x192x768 | \$0.32576 | 192  | 768 GiB  | 8 NVIDIA A10Gs         | 188 GiB  |
| A100:12x144    | \$0.06667 | 12   | 144 GiB  | 1 NVIDIA A100          | 80 GiB   |
| A100:2x24x288  | \$0.13333 | 24   | 288 GiB  | 2 NVIDIA A100s         | 160 GiB  |
| A100:3x36x432  | \$0.20000 | 36   | 432 GiB  | 3 NVIDIA A100s         | 240 GiB  |
| A100:4x48x576  | \$0.26667 | 48   | 576 GiB  | 4 NVIDIA A100s         | 320 GiB  |
| A100:5x60x720  | \$0.33333 | 60   | 720 GiB  | 5 NVIDIA A100s         | 400 GiB  |
| A100:6x72x864  | \$0.40000 | 72   | 864 GiB  | 6 NVIDIA A100s         | 480 GiB  |
| A100:7x84x1008 | \$0.46667 | 84   | 1008 GiB | 7 NVIDIA A100s         | 560 GiB  |
| A100:8x96x1152 | \$0.53333 | 96   | 1152 GiB | 8 NVIDIA A100s         | 640 GiB  |
| H100           | \$0.10833 | 16   | 118 GiB  | 1 NVIDIA H100          | 80 GiB   |
| H100:2         | \$0.21667 | 32   | 236 GiB  | 2 NVIDIA H100s         | 160 GiB  |
| H100:4         | \$0.43333 | 64   | 472 GiB  | 4 NVIDIA H100s         | 320 GiB  |
| H100:8         | \$0.86667 | 128  | 944 GiB  | 8 NVIDIA H100s         | 640 GiB  |
| H100MIG        | \$0.06250 | 8    | 59 GiB   | Fractional NVIDIA H100 | 40 GiB   |
| H200           | \$0.12500 | 28   | 384 GiB  | 1 NVIDIA H200          | 141 GiB  |
| H200:2         | \$0.25000 | 58   | 768 GiB  | 2 NVIDIA H200s         | 282 GiB  |
| H200:4         | \$0.50000 | 112  | 1536 GiB | 4 NVIDIA H200s         | 564 GiB  |
| H200:8         | \$1.00000 | 224  | 3072 GiB | 8 NVIDIA H200s         | 1128 GiB |
| B200           | \$0.16633 | 28   | 384 GiB  | 1 NVIDIA B200          | 180 GiB  |
| B200:2         | \$0.33267 | 58   | 768 GiB  | 2 NVIDIA B200s         | 360 GiB  |
| B200:4         | \$0.66533 | 112  | 1536 GiB | 4 NVIDIA B200s         | 720 GiB  |
| B200:8         | \$1.33067 | 224  | 3072 GiB | 8 NVIDIA B200s         | 1440 GiB |

<Note>
  H200 and B200 instances are available on request. [Contact us](mailto:support@baseten.co) to get access.
</Note>

To select a GPU instance with `instance_type`:

* **Single GPU**: `<GPU>:<vCPU>x<MEMORY>` (for example, `"L4:4x16"`).
* **Multi-GPU**: `<GPU>:<COUNT>x<vCPU>x<MEMORY>` (for example, `"A100:2x24x288"`).
* **H100/H200/B200**: `<GPU>` or `<GPU>:<COUNT>` (for example, `"H100:2"`, `"B200:4"`).
* **Fractional H100**: `"H100_40GB"`.

## GPU details and workloads

### T4

Turing-series GPU

* 2,560 CUDA / 320 Tensor cores
* 16 GiB VRAM
* **Best for:** Whisper, small LLMs like StableLM 3B

### L4

Ada Lovelace-series GPU

* 7,680 CUDA / 240 Tensor cores
* 24 GiB VRAM, 300 GiB/s
* 121 TFLOPS (fp16)
* **Best for**: Stable Diffusion XL
* **Limit**: Not suitable for LLMs due to bandwidth

### A10G

Ampere-series GPU

* 9,216 CUDA / 288 Tensor cores
* 24 GiB VRAM, 600 GiB/s
* 70 TFLOPS (fp16)
* **Best for**: Mistral 7B, Whisper, Stable Diffusion/SDXL

### A100

Ampere-series GPU

* 6,912 CUDA / 432 Tensor cores
* 80 GiB VRAM, 1.94 TB/s
* 312 TFLOPS (fp16)
* **Best for**: Mixtral, Llama 2 70B (2 A100s), Falcon 180B (5 A100s), SDXL

### H100

Hopper-series GPU

* 16,896 CUDA / 640 Tensor cores
* 80 GiB VRAM, 3.35 TB/s
* 990 TFLOPS (fp16)
* **Best for**: Mixtral 8x7B, Llama 2 70B (2xH100), SDXL

### H100MIG

Fractional H100 (3/7 compute, ½ memory)

* 7,242 CUDA cores, 40 GiB VRAM
* 1.675 TB/s bandwidth
* **Best for**: Efficient LLM inference at lower cost than A100
