Choosing the right resources for your model inference workload requires carefully balancing performance and cost. This page lists every instance type currently available on Baseten to help you pick the best fit for serving your model.

CPU-only instance reference

Instances with no GPU start at $0.00058 per minute.

Available instance types

InstanceCost/minutevCPURAM
1×2$0.0005812 GiB
1×4$0.0008614 GiB
2×8$0.0017328 GiB
4×16$0.00346416 GiB
8×32$0.00691832 GiB
16×64$0.013821664 GiB

What can it run?

CPU-only instances are a cost-effective way to run inference on a variety of models like:

GPU instance reference

Available instance types

InstanceCost/minutevCPURAMGPUVRAM
T4x4x16$0.01052416 GiBNVIDIA T416 GiB
T4x8x32$0.01504832 GiBNVIDIA T416 GiB
T4x16x64$0.024081664 GiBNVIDIA T416 GiB
A10Gx4x16$0.02012416 GiBNVIDIA A1024 GiB
A10Gx8x32$0.02424832 GiBNVIDIA A1024 GiB
A10Gx16x64$0.032481664 GiBNVIDIA A1024 GiB
A10G:2x24x96$0.056722496 GiB2 NVIDIA A10s48 GiB
A10G:4x48x192$0.1134448192 GiB4 NVIDIA A10s96 GiB
A10G:8x192x768$0.32576192768 GiB8 NVIDIA A10s188 GiB
V100x8x61$0.061201661 GiBNVIDIA V10016 GiB
A100x12x144$0.1024012144 GiB1 NVIDIA A10080 GiB
A100:2x24x288$0.2048024288 GiB2 NVIDIA A100s160 GiB
A100:3x36x432$0.3072036432 GiB3 NVIDIA A100s240 GiB
A100:4x48x576$0.4096048576 GiB4 NVIDIA A100s320 GiB
A100:5x60x720$0.5120060720 GiB5 NVIDIA A100s400 GiB
A100:6x72x864$0.6144072864 GiB6 NVIDIA A100s480 GiB
A100:7x84x1008$0.71680841008 GiB7 NVIDIA A100s560 GiB
A100:8x96x1152$0.81920961152 GiB8 NVIDIA A100s640 GiB

NVIDIA T4

Instances with an NVIDIA T4 GPU start at $0.01052 per minute.

GPU specs

The T4 is an Turing-series GPU with:

  • 2,560 CUDA cores
  • 320 Tensor cores
  • 16 GiB VRAM

Available instance types

InstanceCost/minutevCPURAMGPUVRAM
T4x4x16$0.01052416 GiBNVIDIA T416 GiB
T4x8x32$0.01504832 GiBNVIDIA T416 GiB
T4x16x64$0.024081664 GiBNVIDIA T416 GiB

What can it run?

T4-equipped instances can run inference for models like:

  • Whisper, transcribing 5 minutes of audio in 31.4 seconds with Whisper small.
  • While the T4’s 16 GiB of VRAM is insufficient for 7 billion parameter LLMs, it can run smaller 3B parameter models like StableLM.

NVIDIA A10

Instances with the NVIDIA A10 GPU start at $0.02012 per minute.

GPU specs

The A10 is an Ampere-series GPU with:

  • 9,216 CUDA cores
  • 288 Tensor cores
  • 24 GiB VRAM
  • 600 GiB/s Memory bandwidth

This enables the card to reach 125 teraFLOPS in fp16 operations, the most common quantization for large language models.

Available instance types

InstanceCost/minutevCPURAMGPUVRAM
A10Gx4x16$0.02012416 GiBNVIDIA A1024 GiB
A10Gx8x32$0.02424832 GiBNVIDIA A1024 GiB
A10Gx16x64$0.032481664 GiBNVIDIA A1024 GiB
A10G:2x24x96$0.056722496 GiB2 NVIDIA A10s48 GiB
A10G:4x48x192$0.1134448192 GiB4 NVIDIA A10s96 GiB
A10G:8x192x768$0.32576192768 GiB8 NVIDIA A10s188 GiB

What can it run?

Single A10s are great for running 7 billion parameter LLMs, and multi-A10 instances can work together to run larger models.

A10-equipped instances can run inference for models like:

NVIDIA V100

Instances with the NVIDIA V100 GPU start at $0.06120 per minute.

GPU specs

The V100 is an Volta-series GPU with 16GiB of VRAM.

Available instance types

InstanceCost/minutevCPURAMGPUVRAM
V100x8x61$0.061201661 GiBNVIDIA V10016 GiB

NVIDIA A100

A100s are not enabled by default. Reach out to support@baseten.co to get A100s enabled for your workspace.

Instances with the NVIDIA A100 GPU start at $0.10240 per minute.

GPU specs

The A100 is an Ampere-series GPU with:

  • 6,912 CUDA cores
  • 432 Tensor cores
  • 80 GiB VRAM
  • 1,935 GiB/s Memory bandwidth

This enables the card to reach 312 teraFLOPS in fp16 operations, the most common quantization for large language models.

Available instance types

InstanceCost/minutevCPURAMGPUVRAM
A100x12x144$0.1024012144 GiB1 NVIDIA A10080 GiB
A100:2x24x288$0.2048024288 GiB2 NVIDIA A100s160 GiB
A100:3x36x432$0.3072036432 GiB3 NVIDIA A100s240 GiB
A100:4x48x576$0.4096048576 GiB4 NVIDIA A100s320 GiB
A100:5x60x720$0.5120060720 GiB5 NVIDIA A100s400 GiB
A100:6x72x864$0.6144072864 GiB6 NVIDIA A100s480 GiB
A100:7x84x1008$0.71680841008 GiB7 NVIDIA A100s560 GiB
A100:8x96x1152$0.81920961152 GiB8 NVIDIA A100s640 GiB

What can it run?

A100s are the largest and most powerful GPUs currently available on Baseten. They’re great for large language models, high-performance image generation, and other demanding tasks.

A100-equipped instances can run inference for models like: