Api-Key
(e.g. {"Authorization": "Api-Key abcd1234.abcd1234"}
).float
): The probability threshold for detecting speech, between 0.0 and 1.0. Frames with a probability above this value are considered speech. A higher threshold makes the VAD more selective, reducing false positives from background noise.int
): The minimum duration of silence (in milliseconds) required to determine that speech has ended.int
): Padding (in milliseconds) added to both the start and end of detected speech segments to avoid cutting off words prematurely.string
, default="pcm_s16le"
): Audio encoding format.int
, default="16000"
): Audio sample rate in Hz. Whisper models are optimized for a sample rate of 16,000 Hz.boolean
, optional): If set to true, intermediate (partial) transcripts will be sent over the WebSocket as audio is received. For most voice AI use cases, we recommend setting this to false
.float
, default=0.5
): Interval in seconds that the model waits before sending a partial transcript, if partials are enabled.int
, default=30
): The maximum duration of buffered audio (in seconds) before a final transcript is forcibly returned. This value should not exceed 30
.string
, optional): Optional transcription prompt.string
, default="en"
): Language of the input audio. Set to "auto"
for automatic detection.boolean
, default=false
): If true
, only return the automatic language detection result without transcribing.list[string]
, default=[]
): List of language codes to consider for language detection, for example ["en", "zh"]
. This could improve language detection accuracy by scoping the language detection to a specific set of languages that only makes sense for your use case. By default, we consider all languages supported by Whisper model. [Added since v0.5.0]boolean
, default=false
): Enables dynamic range compression to process audio with variable loudness.boolean
, default=false
): If true
, include word-level timestamps in the output. [Added since v0.4.0]integer
, optional): Beam search width for decoding. Controls the number of candidate sequences to maintain during beam search. [Added since v0.6.0]float
, optional): Length penalty applied to the output. Higher values encourage longer outputs. [Added since v0.6.0]float
, optional): Penalty for repeating tokens. Higher values discourage repetition. [Added since v0.6.0]float
, optional): Controls diversity in beam search. Higher values increase diversity among beam candidates. [Added since v0.6.0]integer
, optional): Prevents repetition of n-grams of the specified size. [Added since v0.6.0]integer
, default=1
): Beam search size for decoding. We support beam size up to 5. [Deprecated since v0.6.0. Use whisper_input.whisper_params.whisper_sampling_params.beam_width
instead.]float
, default=2.0
): Length penalty applied to ASR output. Length penalty can only work when beam_size
is greater than 1. [Deprecated since v0.6.0. Use whisper_input.whisper_params.whisper_sampling_params.length_penalty
instead.]