Inference API
- POST🆕 Inference by environment
- POSTProduction deployment
- POSTDevelopment deployment
- POSTPublished deployment
- OpenAI compatible endpoints
- Wake deployment endpoints
- Deprecated endpoints
Async Inference API
Management API
- GETGet all secrets
- POSTUpsert a secret
- 🆕 Manage model environments
- 🆕 Manage chain environments
- GETGet instance types
- Get models
- DELDelete models
- Get chains
- DELDelete chains
- Get model deployments
- DELDelete model deployments
- Get chain deployments
- DELDelete chain deployments
- Promote deployment
- Update model deployment autoscaling settings
- Activate model deployment
- Deactivate model deployment
Get all model environments
Gets all environments for a given model
Authorizations
You must specify the scheme 'Api-Key' in the Authorization header. For example, Authorization: Api-Key <Your_Api_Key>
Path Parameters
Response
list of environments
Name of the environment
Time the environment was created in ISO 8601 format
Unique identifier of the model
Current deployment of the environment
Unique identifier of the deployment
Time the deployment was created in ISO 8601 format
Name of the deployment
Unique identifier of the model
Whether the deployment is the production deployment of the model
Whether the deployment is the development deployment of the model
Status of the deployment
BUILDING
, DEPLOYING
, DEPLOY_FAILED
, LOADING_MODEL
, ACTIVE
, UNHEALTHY
, BUILD_FAILED
, BUILD_STOPPED
, DEACTIVATING
, INACTIVE
, FAILED
, UPDATING
, SCALED_TO_ZERO
, WAKING_UP
Number of active replicas
Autoscaling settings for the deployment. If null, the model has not finished deploying
Minimum number of replicas
Maximum number of replicas
Timeframe of traffic considered for autoscaling decisions
Waiting period before scaling down any active replica
Number of requests per replica before scaling up
The environment associated with the deployment
Autoscaling settings for the environment
Minimum number of replicas
Maximum number of replicas
Timeframe of traffic considered for autoscaling decisions
Waiting period before scaling down any active replica
Number of requests per replica before scaling up
Promotion settings for the environment
Whether to deploy on all promotions. Enabling this flag allows model code to safely handle environment-specific logic. When a deployment is promoted, a new deployment will be created with a copy of the image.
Whether to ramp up traffic while promoting
Duration of the ramp up in seconds