import requests
import os

model_id = ""

# Read secrets from environment variables

baseten_api_key = os.environ["BASETEN_API_KEY"]

# Define the request payload

payload = {
"whisper_input": {
"audio": {
"url": "https://example.com/audio.wav", # Replace with actual URL # "audio_b64": "BASE64_ENCODED_AUDIO", # Uncomment if using Base64
},
"whisper_params": {
"prompt": "Optional transcription prompt",
"audio_language": "en",
}
}
}

resp = requests.post(
f"https://model-{model_id}.api.baseten.co/environments/production/predict",
headers={"Authorization": f"Api-Key {baseten_api_key}"},
json=payload
)

print(resp.json())

{
  "language_code": "en",
  "language_prob": null,
  "segments": [
    {
      "text": "That's one small step for man, one giant leap for mankind.",
      "log_prob": -0.8644908666610718,
      "start_time": 0,
      "end_time": 9.92
    }
  ]
}

Use this endpoint to call the production environment of your model.

https://model-{model_id}.api.baseten.co/environments/production/predict

If you are deploying this model as a chain, you can call it in the following way

https://chain-{chain_id}.api.baseten.co/environments/production/run_remote

Parameters

Model ID
string
required

The ID of the model you want to call.

Authorization
string
required

Your Baseten API key, formatted with prefix Api-Key (e.g. {"Authorization": "Api-Key abcd1234.abcd1234"}).

Body

whisper_input.audio
object
required

The audio input options. You must provide one of url, audio_b64, or audio_bytes.

  • url (string): URL of the audio file.
  • audio_b64 (string): Base64-encoded audio content.
  • audio_bytes (bytes): Raw audio bytes.
whisper_input.whisper_params
object

Parameters for controlling Whisper’s behavior.

  • prompt (string, optional): Optional transcription prompt.
  • audio_language (string, default="en"): Language of the input audio. Set to "auto" for automatic detection.
  • language_detection_only (boolean, default=false): If true, only return the automatic language detection result without transcribing.
  • language_options (list[string], default=[]): List of language codes to consider for language detection, for example ["en", "zh"]. This could improve language detection accuracy by scoping the language detection to a specific set of languages that only makes sense for your use case. By default, we consider all languages supported by Whisper model. [Added since v0.5.0]
  • use_dynamic_preprocessing (boolean, default=false): Enables dynamic range compression to process audio with variable loudness.
  • show_word_timestamps (boolean, default=false): If true, include word-level timestamps in the output. [Added since v0.4.0]
whisper_input.asr_options
object

Advanced settings for automatic speech recognition (ASR) process.

  • beam_size (integer, default=1): Beam search size for decoding. We support beam size up to 5.
  • length_penalty (float, default=2.0): Length penalty applied to ASR output. Length penalty can only work when beam_size is greater than 1.
whisper_input.vad_config
object

Parameters for controlling voice activity detection (VAD) process.

  • max_speech_duration_s (integer, default=29): Maximum duration of speech in seconds to be considered a speech segment. max_speech_duration_s cannot be over 30 because Whisper model can only take up to 30 seconds audio input. [Added since v0.4.0]
  • min_silence_duration_ms (integer, default=3000): In the end of each speech chunk wait for min_silence_duration_ms before separating it. [Added since v0.4.0]
  • threshold (float, default=0.5): Speech threshold. VAD outputs speech probabilities for each audio chunk, probabilities above this value are considered as speech. It is better to tune this parameter for each dataset separately, but “lazy” 0.5 is pretty good for most datasets. [Added since v0.4.0]
import requests
import os

model_id = ""

# Read secrets from environment variables

baseten_api_key = os.environ["BASETEN_API_KEY"]

# Define the request payload

payload = {
"whisper_input": {
"audio": {
"url": "https://example.com/audio.wav", # Replace with actual URL # "audio_b64": "BASE64_ENCODED_AUDIO", # Uncomment if using Base64
},
"whisper_params": {
"prompt": "Optional transcription prompt",
"audio_language": "en",
}
}
}

resp = requests.post(
f"https://model-{model_id}.api.baseten.co/environments/production/predict",
headers={"Authorization": f"Api-Key {baseten_api_key}"},
json=payload
)

print(resp.json())

{
  "language_code": "en",
  "language_prob": null,
  "segments": [
    {
      "text": "That's one small step for man, one giant leap for mankind.",
      "log_prob": -0.8644908666610718,
      "start_time": 0,
      "end_time": 9.92
    }
  ]
}