Making one-off requests with the requests library is a convenient way to get started with model invocation. But for latency-sensitive high-throughput applications, connection pooling is a better technique that can save hundreds of milliseconds per request.

Co-author: Bola