View on GitHub

Installation
Python
Node.js
Getting Started
Python
Node.js
base_url
.
Embeddings
The client provides efficient embedding requests with configurable batching, concurrency, and latency optimizations.Example (Python)
max_chars_per_request
,batch_size
: Packs/Batches requests by number of entries or character count, whatever limit is reached first. Useful for optimial distribution across all your replicas on baseten.hedge_delay
: Send duplicate requests after a delay (β₯0.2s) to reduce the p99.5 latency. After hedge_delay (s) is met, your request will be cloned once and race the original request. Limited by a 5% budget. Default: disabled.timeout_s
: Timeout on each request. Raised a request.TimeoutError once a single request canβt be retried. 429 and 5xx errors are always retried.
Example (Node.js)
Batch POST
Usebatch_post
for sending POST requests to any URL.
Built for benchmarks (p90/p95/p99 timings). Useful for starting off massive batch tasks, or benchmarking the performance of individual requests, while retaining a capped concurrency.
Releasing the GIL during all calls - you can do work in parallel without impacting performance.
Example (Python) - completions/chat completions
Example (Node.js)
Reranking
Compatible with BEI and text-embeddings-inference.Example (Python)
Classification
Supports classification endpoints such as BEI or text-embeddings-inference.Example (Python)
Error Handling
The client raises standard Python/Node.js errors:- HTTPError: Authentication failures, 4xx/5xx responses.
- ValueError: Invalid inputs (e.g., empty list, invalid batch size).