Inference on Baseten is designed for flexibility, efficiency, and scalability. Models can be served synchronously, asynchronously, or with streaming to meet different performance and latency needs.
- Synchronously inference is ideal for low-latency, real-time responses.
- Asynchronously inference handles long-running tasks efficiently without blocking resources.
- Streaming inference delivers partial results as they become available for faster response times.