AI Proxy Advanced

AI License Required

Load balancing: Least-connections

v3.13+ Configure the plugin to use two OpenAI models and route requests to the backend with the highest spare capacity based on in-flight connection counts.

In this example, both models have equal weight (2), so requests are distributed based on which backend has fewer active connections. The algorithm automatically routes new requests to backends with more spare capacity, making it particularly effective when backends have varying response times.

Prerequisites

  • An OpenAI account

Environment variables

  • OPENAI_API_KEY: The API key to use to connect to OpenAI.

Set up the plugin

Something wrong?

Help us make these docs great!

Kong Developer docs are open source. If you find these useful and want to make them better, contribute today!