Baseten provides two sets of API endpoints:

  1. An inference API for calling deployed models
  2. A management API for managing your models and workspace

Many inference and management API endpoints have different routes for the three types of deployments — development, production, and individual published deployments — which are listed separately in the sidebar.

Inference API

Each model deployed on Baseten has its own subdomain on to enable faster routing. This subdomain is used for inference endpoints, which are formatted as follows:



  • model_id is the alphanumeric ID of the model, which you can find in your model dashboard.
  • deployment_type_or_id is one of development, production, or a separate alphanumeric ID for a specific published deployment of the model.
  • endpoint is a supported endpoint such as predict that you want to call.

The inference API also supports asynchronous inference for long-running tasks and priority queuing.

Management API

Management API endpoints all run through the base subdomain. Use management API endpoints for monitoring, CI/CD, and building both model-level and workspace-level automations.