AI Proxy Advanced: Load balancing: Semantic with fallback - Plugin

Load balancing: Semantic with fallback

v3.13+ Configure the plugin to use three OpenAI models and route requests based on semantic similarity between the prompt and model descriptions.

In this example, two targets share the same description (“Specialist in programming problems”). When a prompt matches this description, the plugin will first route to the target with weight 75 (gpt-4o). If that target fails, it falls back to the target with weight 25 (gpt-4o-mini) using round-robin. The third target with a different description (“Specialist in real life topics”) handles prompts about non-technical topics.

Prerequisites

An OpenAI account
A Redis instance for vector storage

Environment variables

OPENAI_API_KEY: The API key to use to connect to OpenAI.

Set up the plugin

Add this section to your kong.yaml configuration file:

kong.yaml

Copied!

_format_version: "3.0"
plugins:
  - name: ai-proxy-advanced
    config:
      balancer:
        algorithm: semantic
        retries: 3
        failover_criteria:
        - error
        - timeout
        - http_429
        - http_503
        - non_idempotent
      embeddings:
        auth:
          header_name: Authorization
          header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}
        model:
          name: text-embedding-3-small
          provider: openai
      vectordb:
        strategy: redis
        distance_metric: cosine
        threshold: 0.7
        dimensions: 1024
        redis:
          host: localhost
          port: 6379
      targets:
      - model:
          name: gpt-4o
          provider: openai
          options:
            max_tokens: 1024
            temperature: 1.0
        route_type: llm/v1/chat
        weight: 2
        description: Specialist in real life topics
        auth:
          header_name: Authorization
          header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}
        logging:
          log_statistics: true
          log_payloads: true
      - model:
          name: gpt-4o
          provider: openai
          options:
            max_tokens: 1024
            temperature: 1.0
        route_type: llm/v1/chat
        weight: 75
        description: Specialist in programming problems
        auth:
          header_name: Authorization
          header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}
        logging:
          log_statistics: true
          log_payloads: true
      - model:
          name: gpt-4o-mini
          provider: openai
          options:
            max_tokens: 1024
            temperature: 1.0
        route_type: llm/v1/chat
        weight: 25
        description: Specialist in programming problems
        auth:
          header_name: Authorization
          header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}
        logging:
          log_statistics: true
          log_payloads: true

Make the following request:

curl -i -X POST http://localhost:8001/plugins/ \
    --header "Accept: application/json" \
    --header "Content-Type: application/json" \
    --data '
    {
      "name": "ai-proxy-advanced",
      "config": {
        "balancer": {
          "algorithm": "semantic",
          "retries": 3,
          "failover_criteria": [
            "error",
            "timeout",
            "http_429",
            "http_503",
            "non_idempotent"
          ]
        },
        "embeddings": {
          "auth": {
            "header_name": "Authorization",
            "header_value": "Bearer '$OPENAI_API_KEY'"
          },
          "model": {
            "name": "text-embedding-3-small",
            "provider": "openai"
          }
        },
        "vectordb": {
          "strategy": "redis",
          "distance_metric": "cosine",
          "threshold": 0.7,
          "dimensions": 1024,
          "redis": {
            "host": "localhost",
            "port": 6379
          }
        },
        "targets": [
          {
            "model": {
              "name": "gpt-4o",
              "provider": "openai",
              "options": {
                "max_tokens": 1024,
                "temperature": 1.0
              }
            },
            "route_type": "llm/v1/chat",
            "weight": 2,
            "description": "Specialist in real life topics",
            "auth": {
              "header_name": "Authorization",
              "header_value": "Bearer '$OPENAI_API_KEY'"
            },
            "logging": {
              "log_statistics": true,
              "log_payloads": true
            }
          },
          {
            "model": {
              "name": "gpt-4o",
              "provider": "openai",
              "options": {
                "max_tokens": 1024,
                "temperature": 1.0
              }
            },
            "route_type": "llm/v1/chat",
            "weight": 75,
            "description": "Specialist in programming problems",
            "auth": {
              "header_name": "Authorization",
              "header_value": "Bearer '$OPENAI_API_KEY'"
            },
            "logging": {
              "log_statistics": true,
              "log_payloads": true
            }
          },
          {
            "model": {
              "name": "gpt-4o-mini",
              "provider": "openai",
              "options": {
                "max_tokens": 1024,
                "temperature": 1.0
              }
            },
            "route_type": "llm/v1/chat",
            "weight": 25,
            "description": "Specialist in programming problems",
            "auth": {
              "header_name": "Authorization",
              "header_value": "Bearer '$OPENAI_API_KEY'"
            },
            "logging": {
              "log_statistics": true,
              "log_payloads": true
            }
          }
        ]
      },
      "tags": []
    }
    '

Copied!

Make the following request:

curl -X POST https://{region}.api.konghq.com/v2/control-planes/{controlPlaneId}/core-entities/plugins/ \
    --header "accept: application/json" \
    --header "Content-Type: application/json" \
    --header "Authorization: Bearer $KONNECT_TOKEN" \
    --data '
    {
      "name": "ai-proxy-advanced",
      "config": {
        "balancer": {
          "algorithm": "semantic",
          "retries": 3,
          "failover_criteria": [
            "error",
            "timeout",
            "http_429",
            "http_503",
            "non_idempotent"
          ]
        },
        "embeddings": {
          "auth": {
            "header_name": "Authorization",
            "header_value": "Bearer '$OPENAI_API_KEY'"
          },
          "model": {
            "name": "text-embedding-3-small",
            "provider": "openai"
          }
        },
        "vectordb": {
          "strategy": "redis",
          "distance_metric": "cosine",
          "threshold": 0.7,
          "dimensions": 1024,
          "redis": {
            "host": "localhost",
            "port": 6379
          }
        },
        "targets": [
          {
            "model": {
              "name": "gpt-4o",
              "provider": "openai",
              "options": {
                "max_tokens": 1024,
                "temperature": 1.0
              }
            },
            "route_type": "llm/v1/chat",
            "weight": 2,
            "description": "Specialist in real life topics",
            "auth": {
              "header_name": "Authorization",
              "header_value": "Bearer '$OPENAI_API_KEY'"
            },
            "logging": {
              "log_statistics": true,
              "log_payloads": true
            }
          },
          {
            "model": {
              "name": "gpt-4o",
              "provider": "openai",
              "options": {
                "max_tokens": 1024,
                "temperature": 1.0
              }
            },
            "route_type": "llm/v1/chat",
            "weight": 75,
            "description": "Specialist in programming problems",
            "auth": {
              "header_name": "Authorization",
              "header_value": "Bearer '$OPENAI_API_KEY'"
            },
            "logging": {
              "log_statistics": true,
              "log_payloads": true
            }
          },
          {
            "model": {
              "name": "gpt-4o-mini",
              "provider": "openai",
              "options": {
                "max_tokens": 1024,
                "temperature": 1.0
              }
            },
            "route_type": "llm/v1/chat",
            "weight": 25,
            "description": "Specialist in programming problems",
            "auth": {
              "header_name": "Authorization",
              "header_value": "Bearer '$OPENAI_API_KEY'"
            },
            "logging": {
              "log_statistics": true,
              "log_payloads": true
            }
          }
        ]
      },
      "tags": []
    }
    '

Copied!

Make sure to replace the following placeholders with your own values:

region: Geographic region where your Kong Konnect is hosted and operates.
KONNECT_TOKEN: Your Personal Access Token (PAT) associated with your Konnect account.
controlPlaneId: The id of the control plane.

See the Konnect API reference to learn about region-specific URLs and personal access tokens.

echo "
apiVersion: configuration.konghq.com/v1
kind: KongClusterPlugin
metadata:
  name: ai-proxy-advanced
  namespace: kong
  annotations:
    kubernetes.io/ingress.class: kong
    konghq.com/tags: ''
  labels:
    global: 'true'
config:
  balancer:
    algorithm: semantic
    retries: 3
    failover_criteria:
    - error
    - timeout
    - http_429
    - http_503
    - non_idempotent
  embeddings:
    auth:
      header_name: Authorization
      header_value: Bearer $OPENAI_API_KEY
    model:
      name: text-embedding-3-small
      provider: openai
  vectordb:
    strategy: redis
    distance_metric: cosine
    threshold: 0.7
    dimensions: 1024
    redis:
      host: localhost
      port: 6379
  targets:
  - model:
      name: gpt-4o
      provider: openai
      options:
        max_tokens: 1024
        temperature: 1.0
    route_type: llm/v1/chat
    weight: 2
    description: Specialist in real life topics
    auth:
      header_name: Authorization
      header_value: Bearer $OPENAI_API_KEY
    logging:
      log_statistics: true
      log_payloads: true
  - model:
      name: gpt-4o
      provider: openai
      options:
        max_tokens: 1024
        temperature: 1.0
    route_type: llm/v1/chat
    weight: 75
    description: Specialist in programming problems
    auth:
      header_name: Authorization
      header_value: Bearer $OPENAI_API_KEY
    logging:
      log_statistics: true
      log_payloads: true
  - model:
      name: gpt-4o-mini
      provider: openai
      options:
        max_tokens: 1024
        temperature: 1.0
    route_type: llm/v1/chat
    weight: 25
    description: Specialist in programming problems
    auth:
      header_name: Authorization
      header_value: Bearer $OPENAI_API_KEY
    logging:
      log_statistics: true
      log_payloads: true
plugin: ai-proxy-advanced
" | kubectl apply -f -

Copied!

Prerequisite: Configure your Personal Access Token

terraform {
  required_providers {
    konnect = {
      source  = "kong/konnect"
    }
  }
}

provider "konnect" {
  personal_access_token = "$KONNECT_TOKEN"
  server_url            = "https://us.api.konghq.com/"
}

Copied!

Add the following to your Terraform configuration to create a Konnect Gateway Plugin:

resource "konnect_gateway_plugin_ai_proxy_advanced" "my_ai_proxy_advanced" {
  enabled = true

  config = {

    balancer = {
      algorithm = "semantic"
      retries = 3
      failover_criteria = ["error", "timeout", "http_429", "http_503", "non_idempotent"]
    }

    embeddings = {

      auth = {
        header_name = "Authorization"
        header_value = "Bearer var.openai_api_key"
      }

      model = {
        name = "text-embedding-3-small"
        provider = "openai"
      }
    }

    vectordb = {
      strategy = "redis"
      distance_metric = "cosine"
      threshold = 0.7
      dimensions = 1024

      redis = {
        host = "localhost"
        port = 6379
      }
    }
    targets = [
      {

        model = {
          name = "gpt-4o"
          provider = "openai"

          options = {
            max_tokens = 1024
            temperature = 1.0
          }
        }
        route_type = "llm/v1/chat"
        weight = 2
        description = "Specialist in real life topics"

        auth = {
          header_name = "Authorization"
          header_value = "Bearer var.openai_api_key"
        }

        logging = {
          log_statistics = true
          log_payloads = true
        }
      }, 

      {

        model = {
          name = "gpt-4o"
          provider = "openai"

          options = {
            max_tokens = 1024
            temperature = 1.0
          }
        }
        route_type = "llm/v1/chat"
        weight = 75
        description = "Specialist in programming problems"

        auth = {
          header_name = "Authorization"
          header_value = "Bearer var.openai_api_key"
        }

        logging = {
          log_statistics = true
          log_payloads = true
        }
      }, 

      {

        model = {
          name = "gpt-4o-mini"
          provider = "openai"

          options = {
            max_tokens = 1024
            temperature = 1.0
          }
        }
        route_type = "llm/v1/chat"
        weight = 25
        description = "Specialist in programming problems"

        auth = {
          header_name = "Authorization"
          header_value = "Bearer var.openai_api_key"
        }

        logging = {
          log_statistics = true
          log_payloads = true
        }
      }    ]
  }
  tags = []

  control_plane_id = konnect_gateway_control_plane.my_konnect_cp.id
}

Copied!

This example requires the following variables to be added to your manifest. You can specify values at runtime by setting TF_VAR_name=value.

variable "openai_api_key" {
  type = string
}

Copied!

Add this section to your kong.yaml configuration file:

kong.yaml

Copied!

_format_version: "3.0"
plugins:
  - name: ai-proxy-advanced
    service: serviceName|Id
    config:
      balancer:
        algorithm: semantic
        retries: 3
        failover_criteria:
        - error
        - timeout
        - http_429
        - http_503
        - non_idempotent
      embeddings:
        auth:
          header_name: Authorization
          header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}
        model:
          name: text-embedding-3-small
          provider: openai
      vectordb:
        strategy: redis
        distance_metric: cosine
        threshold: 0.7
        dimensions: 1024
        redis:
          host: localhost
          port: 6379
      targets:
      - model:
          name: gpt-4o
          provider: openai
          options:
            max_tokens: 1024
            temperature: 1.0
        route_type: llm/v1/chat
        weight: 2
        description: Specialist in real life topics
        auth:
          header_name: Authorization
          header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}
        logging:
          log_statistics: true
          log_payloads: true
      - model:
          name: gpt-4o
          provider: openai
          options:
            max_tokens: 1024
            temperature: 1.0
        route_type: llm/v1/chat
        weight: 75
        description: Specialist in programming problems
        auth:
          header_name: Authorization
          header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}
        logging:
          log_statistics: true
          log_payloads: true
      - model:
          name: gpt-4o-mini
          provider: openai
          options:
            max_tokens: 1024
            temperature: 1.0
        route_type: llm/v1/chat
        weight: 25
        description: Specialist in programming problems
        auth:
          header_name: Authorization
          header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}
        logging:
          log_statistics: true
          log_payloads: true

Make sure to replace the following placeholders with your own values:

serviceName|Id: The id or name of the service the plugin configuration will target.

Make the following request:

curl -i -X POST http://localhost:8001/services/{serviceName|Id}/plugins/ \
    --header "Accept: application/json" \
    --header "Content-Type: application/json" \
    --data '
    {
      "name": "ai-proxy-advanced",
      "config": {
        "balancer": {
          "algorithm": "semantic",
          "retries": 3,
          "failover_criteria": [
            "error",
            "timeout",
            "http_429",
            "http_503",
            "non_idempotent"
          ]
        },
        "embeddings": {
          "auth": {
            "header_name": "Authorization",
            "header_value": "Bearer '$OPENAI_API_KEY'"
          },
          "model": {
            "name": "text-embedding-3-small",
            "provider": "openai"
          }
        },
        "vectordb": {
          "strategy": "redis",
          "distance_metric": "cosine",
          "threshold": 0.7,
          "dimensions": 1024,
          "redis": {
            "host": "localhost",
            "port": 6379
          }
        },
        "targets": [
          {
            "model": {
              "name": "gpt-4o",
              "provider": "openai",
              "options": {
                "max_tokens": 1024,
                "temperature": 1.0
              }
            },
            "route_type": "llm/v1/chat",
            "weight": 2,
            "description": "Specialist in real life topics",
            "auth": {
              "header_name": "Authorization",
              "header_value": "Bearer '$OPENAI_API_KEY'"
            },
            "logging": {
              "log_statistics": true,
              "log_payloads": true
            }
          },
          {
            "model": {
              "name": "gpt-4o",
              "provider": "openai",
              "options": {
                "max_tokens": 1024,
                "temperature": 1.0
              }
            },
            "route_type": "llm/v1/chat",
            "weight": 75,
            "description": "Specialist in programming problems",
            "auth": {
              "header_name": "Authorization",
              "header_value": "Bearer '$OPENAI_API_KEY'"
            },
            "logging": {
              "log_statistics": true,
              "log_payloads": true
            }
          },
          {
            "model": {
              "name": "gpt-4o-mini",
              "provider": "openai",
              "options": {
                "max_tokens": 1024,
                "temperature": 1.0
              }
            },
            "route_type": "llm/v1/chat",
            "weight": 25,
            "description": "Specialist in programming problems",
            "auth": {
              "header_name": "Authorization",
              "header_value": "Bearer '$OPENAI_API_KEY'"
            },
            "logging": {
              "log_statistics": true,
              "log_payloads": true
            }
          }
        ]
      },
      "tags": []
    }
    '

Copied!

Make sure to replace the following placeholders with your own values:

serviceName|Id: The id or name of the service the plugin configuration will target.

Make the following request:

curl -X POST https://{region}.api.konghq.com/v2/control-planes/{controlPlaneId}/core-entities/services/{serviceId}/plugins/ \
    --header "accept: application/json" \
    --header "Content-Type: application/json" \
    --header "Authorization: Bearer $KONNECT_TOKEN" \
    --data '
    {
      "name": "ai-proxy-advanced",
      "config": {
        "balancer": {
          "algorithm": "semantic",
          "retries": 3,
          "failover_criteria": [
            "error",
            "timeout",
            "http_429",
            "http_503",
            "non_idempotent"
          ]
        },
        "embeddings": {
          "auth": {
            "header_name": "Authorization",
            "header_value": "Bearer '$OPENAI_API_KEY'"
          },
          "model": {
            "name": "text-embedding-3-small",
            "provider": "openai"
          }
        },
        "vectordb": {
          "strategy": "redis",
          "distance_metric": "cosine",
          "threshold": 0.7,
          "dimensions": 1024,
          "redis": {
            "host": "localhost",
            "port": 6379
          }
        },
        "targets": [
          {
            "model": {
              "name": "gpt-4o",
              "provider": "openai",
              "options": {
                "max_tokens": 1024,
                "temperature": 1.0
              }
            },
            "route_type": "llm/v1/chat",
            "weight": 2,
            "description": "Specialist in real life topics",
            "auth": {
              "header_name": "Authorization",
              "header_value": "Bearer '$OPENAI_API_KEY'"
            },
            "logging": {
              "log_statistics": true,
              "log_payloads": true
            }
          },
          {
            "model": {
              "name": "gpt-4o",
              "provider": "openai",
              "options": {
                "max_tokens": 1024,
                "temperature": 1.0
              }
            },
            "route_type": "llm/v1/chat",
            "weight": 75,
            "description": "Specialist in programming problems",
            "auth": {
              "header_name": "Authorization",
              "header_value": "Bearer '$OPENAI_API_KEY'"
            },
            "logging": {
              "log_statistics": true,
              "log_payloads": true
            }
          },
          {
            "model": {
              "name": "gpt-4o-mini",
              "provider": "openai",
              "options": {
                "max_tokens": 1024,
                "temperature": 1.0
              }
            },
            "route_type": "llm/v1/chat",
            "weight": 25,
            "description": "Specialist in programming problems",
            "auth": {
              "header_name": "Authorization",
              "header_value": "Bearer '$OPENAI_API_KEY'"
            },
            "logging": {
              "log_statistics": true,
              "log_payloads": true
            }
          }
        ]
      },
      "tags": []
    }
    '

Copied!

Make sure to replace the following placeholders with your own values:

region: Geographic region where your Kong Konnect is hosted and operates.
KONNECT_TOKEN: Your Personal Access Token (PAT) associated with your Konnect account.
controlPlaneId: The id of the control plane.
serviceId: The id of the service the plugin configuration will target.

See the Konnect API reference to learn about region-specific URLs and personal access tokens.

echo "
apiVersion: configuration.konghq.com/v1
kind: KongPlugin
metadata:
  name: ai-proxy-advanced
  namespace: kong
  annotations:
    kubernetes.io/ingress.class: kong
    konghq.com/tags: ''
config:
  balancer:
    algorithm: semantic
    retries: 3
    failover_criteria:
    - error
    - timeout
    - http_429
    - http_503
    - non_idempotent
  embeddings:
    auth:
      header_name: Authorization
      header_value: Bearer $OPENAI_API_KEY
    model:
      name: text-embedding-3-small
      provider: openai
  vectordb:
    strategy: redis
    distance_metric: cosine
    threshold: 0.7
    dimensions: 1024
    redis:
      host: localhost
      port: 6379
  targets:
  - model:
      name: gpt-4o
      provider: openai
      options:
        max_tokens: 1024
        temperature: 1.0
    route_type: llm/v1/chat
    weight: 2
    description: Specialist in real life topics
    auth:
      header_name: Authorization
      header_value: Bearer $OPENAI_API_KEY
    logging:
      log_statistics: true
      log_payloads: true
  - model:
      name: gpt-4o
      provider: openai
      options:
        max_tokens: 1024
        temperature: 1.0
    route_type: llm/v1/chat
    weight: 75
    description: Specialist in programming problems
    auth:
      header_name: Authorization
      header_value: Bearer $OPENAI_API_KEY
    logging:
      log_statistics: true
      log_payloads: true
  - model:
      name: gpt-4o-mini
      provider: openai
      options:
        max_tokens: 1024
        temperature: 1.0
    route_type: llm/v1/chat
    weight: 25
    description: Specialist in programming problems
    auth:
      header_name: Authorization
      header_value: Bearer $OPENAI_API_KEY
    logging:
      log_statistics: true
      log_payloads: true
plugin: ai-proxy-advanced
" | kubectl apply -f -

Copied!

Next, apply the KongPlugin resource by annotating the service resource:

kubectl annotate -n kong service SERVICE_NAME konghq.com/plugins=ai-proxy-advanced

Copied!

Prerequisite: Configure your Personal Access Token

terraform {
  required_providers {
    konnect = {
      source  = "kong/konnect"
    }
  }
}

provider "konnect" {
  personal_access_token = "$KONNECT_TOKEN"
  server_url            = "https://us.api.konghq.com/"
}

Copied!

Add the following to your Terraform configuration to create a Konnect Gateway Plugin:

resource "konnect_gateway_plugin_ai_proxy_advanced" "my_ai_proxy_advanced" {
  enabled = true

  config = {

    balancer = {
      algorithm = "semantic"
      retries = 3
      failover_criteria = ["error", "timeout", "http_429", "http_503", "non_idempotent"]
    }

    embeddings = {

      auth = {
        header_name = "Authorization"
        header_value = "Bearer var.openai_api_key"
      }

      model = {
        name = "text-embedding-3-small"
        provider = "openai"
      }
    }

    vectordb = {
      strategy = "redis"
      distance_metric = "cosine"
      threshold = 0.7
      dimensions = 1024

      redis = {
        host = "localhost"
        port = 6379
      }
    }
    targets = [
      {

        model = {
          name = "gpt-4o"
          provider = "openai"

          options = {
            max_tokens = 1024
            temperature = 1.0
          }
        }
        route_type = "llm/v1/chat"
        weight = 2
        description = "Specialist in real life topics"

        auth = {
          header_name = "Authorization"
          header_value = "Bearer var.openai_api_key"
        }

        logging = {
          log_statistics = true
          log_payloads = true
        }
      }, 

      {

        model = {
          name = "gpt-4o"
          provider = "openai"

          options = {
            max_tokens = 1024
            temperature = 1.0
          }
        }
        route_type = "llm/v1/chat"
        weight = 75
        description = "Specialist in programming problems"

        auth = {
          header_name = "Authorization"
          header_value = "Bearer var.openai_api_key"
        }

        logging = {
          log_statistics = true
          log_payloads = true
        }
      }, 

      {

        model = {
          name = "gpt-4o-mini"
          provider = "openai"

          options = {
            max_tokens = 1024
            temperature = 1.0
          }
        }
        route_type = "llm/v1/chat"
        weight = 25
        description = "Specialist in programming problems"

        auth = {
          header_name = "Authorization"
          header_value = "Bearer var.openai_api_key"
        }

        logging = {
          log_statistics = true
          log_payloads = true
        }
      }    ]
  }
  tags = []

  control_plane_id = konnect_gateway_control_plane.my_konnect_cp.id
  service = {
    id = konnect_gateway_service.my_service.id
  }
}

Copied!

This example requires the following variables to be added to your manifest. You can specify values at runtime by setting TF_VAR_name=value.

variable "openai_api_key" {
  type = string
}

Copied!

Add this section to your kong.yaml configuration file:

kong.yaml

Copied!

_format_version: "3.0"
plugins:
  - name: ai-proxy-advanced
    route: routeName|Id
    config:
      balancer:
        algorithm: semantic
        retries: 3
        failover_criteria:
        - error
        - timeout
        - http_429
        - http_503
        - non_idempotent
      embeddings:
        auth:
          header_name: Authorization
          header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}
        model:
          name: text-embedding-3-small
          provider: openai
      vectordb:
        strategy: redis
        distance_metric: cosine
        threshold: 0.7
        dimensions: 1024
        redis:
          host: localhost
          port: 6379
      targets:
      - model:
          name: gpt-4o
          provider: openai
          options:
            max_tokens: 1024
            temperature: 1.0
        route_type: llm/v1/chat
        weight: 2
        description: Specialist in real life topics
        auth:
          header_name: Authorization
          header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}
        logging:
          log_statistics: true
          log_payloads: true
      - model:
          name: gpt-4o
          provider: openai
          options:
            max_tokens: 1024
            temperature: 1.0
        route_type: llm/v1/chat
        weight: 75
        description: Specialist in programming problems
        auth:
          header_name: Authorization
          header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}
        logging:
          log_statistics: true
          log_payloads: true
      - model:
          name: gpt-4o-mini
          provider: openai
          options:
            max_tokens: 1024
            temperature: 1.0
        route_type: llm/v1/chat
        weight: 25
        description: Specialist in programming problems
        auth:
          header_name: Authorization
          header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}
        logging:
          log_statistics: true
          log_payloads: true

Make sure to replace the following placeholders with your own values:

routeName|Id: The id or name of the route the plugin configuration will target.

Make the following request:

curl -i -X POST http://localhost:8001/routes/{routeName|Id}/plugins/ \
    --header "Accept: application/json" \
    --header "Content-Type: application/json" \
    --data '
    {
      "name": "ai-proxy-advanced",
      "config": {
        "balancer": {
          "algorithm": "semantic",
          "retries": 3,
          "failover_criteria": [
            "error",
            "timeout",
            "http_429",
            "http_503",
            "non_idempotent"
          ]
        },
        "embeddings": {
          "auth": {
            "header_name": "Authorization",
            "header_value": "Bearer '$OPENAI_API_KEY'"
          },
          "model": {
            "name": "text-embedding-3-small",
            "provider": "openai"
          }
        },
        "vectordb": {
          "strategy": "redis",
          "distance_metric": "cosine",
          "threshold": 0.7,
          "dimensions": 1024,
          "redis": {
            "host": "localhost",
            "port": 6379
          }
        },
        "targets": [
          {
            "model": {
              "name": "gpt-4o",
              "provider": "openai",
              "options": {
                "max_tokens": 1024,
                "temperature": 1.0
              }
            },
            "route_type": "llm/v1/chat",
            "weight": 2,
            "description": "Specialist in real life topics",
            "auth": {
              "header_name": "Authorization",
              "header_value": "Bearer '$OPENAI_API_KEY'"
            },
            "logging": {
              "log_statistics": true,
              "log_payloads": true
            }
          },
          {
            "model": {
              "name": "gpt-4o",
              "provider": "openai",
              "options": {
                "max_tokens": 1024,
                "temperature": 1.0
              }
            },
            "route_type": "llm/v1/chat",
            "weight": 75,
            "description": "Specialist in programming problems",
            "auth": {
              "header_name": "Authorization",
              "header_value": "Bearer '$OPENAI_API_KEY'"
            },
            "logging": {
              "log_statistics": true,
              "log_payloads": true
            }
          },
          {
            "model": {
              "name": "gpt-4o-mini",
              "provider": "openai",
              "options": {
                "max_tokens": 1024,
                "temperature": 1.0
              }
            },
            "route_type": "llm/v1/chat",
            "weight": 25,
            "description": "Specialist in programming problems",
            "auth": {
              "header_name": "Authorization",
              "header_value": "Bearer '$OPENAI_API_KEY'"
            },
            "logging": {
              "log_statistics": true,
              "log_payloads": true
            }
          }
        ]
      },
      "tags": []
    }
    '

Copied!

Make sure to replace the following placeholders with your own values:

routeName|Id: The id or name of the route the plugin configuration will target.

Make the following request:

curl -X POST https://{region}.api.konghq.com/v2/control-planes/{controlPlaneId}/core-entities/routes/{routeId}/plugins/ \
    --header "accept: application/json" \
    --header "Content-Type: application/json" \
    --header "Authorization: Bearer $KONNECT_TOKEN" \
    --data '
    {
      "name": "ai-proxy-advanced",
      "config": {
        "balancer": {
          "algorithm": "semantic",
          "retries": 3,
          "failover_criteria": [
            "error",
            "timeout",
            "http_429",
            "http_503",
            "non_idempotent"
          ]
        },
        "embeddings": {
          "auth": {
            "header_name": "Authorization",
            "header_value": "Bearer '$OPENAI_API_KEY'"
          },
          "model": {
            "name": "text-embedding-3-small",
            "provider": "openai"
          }
        },
        "vectordb": {
          "strategy": "redis",
          "distance_metric": "cosine",
          "threshold": 0.7,
          "dimensions": 1024,
          "redis": {
            "host": "localhost",
            "port": 6379
          }
        },
        "targets": [
          {
            "model": {
              "name": "gpt-4o",
              "provider": "openai",
              "options": {
                "max_tokens": 1024,
                "temperature": 1.0
              }
            },
            "route_type": "llm/v1/chat",
            "weight": 2,
            "description": "Specialist in real life topics",
            "auth": {
              "header_name": "Authorization",
              "header_value": "Bearer '$OPENAI_API_KEY'"
            },
            "logging": {
              "log_statistics": true,
              "log_payloads": true
            }
          },
          {
            "model": {
              "name": "gpt-4o",
              "provider": "openai",
              "options": {
                "max_tokens": 1024,
                "temperature": 1.0
              }
            },
            "route_type": "llm/v1/chat",
            "weight": 75,
            "description": "Specialist in programming problems",
            "auth": {
              "header_name": "Authorization",
              "header_value": "Bearer '$OPENAI_API_KEY'"
            },
            "logging": {
              "log_statistics": true,
              "log_payloads": true
            }
          },
          {
            "model": {
              "name": "gpt-4o-mini",
              "provider": "openai",
              "options": {
                "max_tokens": 1024,
                "temperature": 1.0
              }
            },
            "route_type": "llm/v1/chat",
            "weight": 25,
            "description": "Specialist in programming problems",
            "auth": {
              "header_name": "Authorization",
              "header_value": "Bearer '$OPENAI_API_KEY'"
            },
            "logging": {
              "log_statistics": true,
              "log_payloads": true
            }
          }
        ]
      },
      "tags": []
    }
    '

Copied!

Make sure to replace the following placeholders with your own values:

region: Geographic region where your Kong Konnect is hosted and operates.
KONNECT_TOKEN: Your Personal Access Token (PAT) associated with your Konnect account.
controlPlaneId: The id of the control plane.
routeId: The id of the route the plugin configuration will target.

See the Konnect API reference to learn about region-specific URLs and personal access tokens.

echo "
apiVersion: configuration.konghq.com/v1
kind: KongPlugin
metadata:
  name: ai-proxy-advanced
  namespace: kong
  annotations:
    kubernetes.io/ingress.class: kong
    konghq.com/tags: ''
config:
  balancer:
    algorithm: semantic
    retries: 3
    failover_criteria:
    - error
    - timeout
    - http_429
    - http_503
    - non_idempotent
  embeddings:
    auth:
      header_name: Authorization
      header_value: Bearer $OPENAI_API_KEY
    model:
      name: text-embedding-3-small
      provider: openai
  vectordb:
    strategy: redis
    distance_metric: cosine
    threshold: 0.7
    dimensions: 1024
    redis:
      host: localhost
      port: 6379
  targets:
  - model:
      name: gpt-4o
      provider: openai
      options:
        max_tokens: 1024
        temperature: 1.0
    route_type: llm/v1/chat
    weight: 2
    description: Specialist in real life topics
    auth:
      header_name: Authorization
      header_value: Bearer $OPENAI_API_KEY
    logging:
      log_statistics: true
      log_payloads: true
  - model:
      name: gpt-4o
      provider: openai
      options:
        max_tokens: 1024
        temperature: 1.0
    route_type: llm/v1/chat
    weight: 75
    description: Specialist in programming problems
    auth:
      header_name: Authorization
      header_value: Bearer $OPENAI_API_KEY
    logging:
      log_statistics: true
      log_payloads: true
  - model:
      name: gpt-4o-mini
      provider: openai
      options:
        max_tokens: 1024
        temperature: 1.0
    route_type: llm/v1/chat
    weight: 25
    description: Specialist in programming problems
    auth:
      header_name: Authorization
      header_value: Bearer $OPENAI_API_KEY
    logging:
      log_statistics: true
      log_payloads: true
plugin: ai-proxy-advanced
" | kubectl apply -f -

Copied!

Next, apply the KongPlugin resource by annotating the httproute or ingress resource:

kubectl annotate -n kong httproute  konghq.com/plugins=ai-proxy-advanced

Copied!

kubectl annotate -n kong ingress  konghq.com/plugins=ai-proxy-advanced

Copied!

Prerequisite: Configure your Personal Access Token

terraform {
  required_providers {
    konnect = {
      source  = "kong/konnect"
    }
  }
}

provider "konnect" {
  personal_access_token = "$KONNECT_TOKEN"
  server_url            = "https://us.api.konghq.com/"
}

Copied!

Add the following to your Terraform configuration to create a Konnect Gateway Plugin:

resource "konnect_gateway_plugin_ai_proxy_advanced" "my_ai_proxy_advanced" {
  enabled = true

  config = {

    balancer = {
      algorithm = "semantic"
      retries = 3
      failover_criteria = ["error", "timeout", "http_429", "http_503", "non_idempotent"]
    }

    embeddings = {

      auth = {
        header_name = "Authorization"
        header_value = "Bearer var.openai_api_key"
      }

      model = {
        name = "text-embedding-3-small"
        provider = "openai"
      }
    }

    vectordb = {
      strategy = "redis"
      distance_metric = "cosine"
      threshold = 0.7
      dimensions = 1024

      redis = {
        host = "localhost"
        port = 6379
      }
    }
    targets = [
      {

        model = {
          name = "gpt-4o"
          provider = "openai"

          options = {
            max_tokens = 1024
            temperature = 1.0
          }
        }
        route_type = "llm/v1/chat"
        weight = 2
        description = "Specialist in real life topics"

        auth = {
          header_name = "Authorization"
          header_value = "Bearer var.openai_api_key"
        }

        logging = {
          log_statistics = true
          log_payloads = true
        }
      }, 

      {

        model = {
          name = "gpt-4o"
          provider = "openai"

          options = {
            max_tokens = 1024
            temperature = 1.0
          }
        }
        route_type = "llm/v1/chat"
        weight = 75
        description = "Specialist in programming problems"

        auth = {
          header_name = "Authorization"
          header_value = "Bearer var.openai_api_key"
        }

        logging = {
          log_statistics = true
          log_payloads = true
        }
      }, 

      {

        model = {
          name = "gpt-4o-mini"
          provider = "openai"

          options = {
            max_tokens = 1024
            temperature = 1.0
          }
        }
        route_type = "llm/v1/chat"
        weight = 25
        description = "Specialist in programming problems"

        auth = {
          header_name = "Authorization"
          header_value = "Bearer var.openai_api_key"
        }

        logging = {
          log_statistics = true
          log_payloads = true
        }
      }    ]
  }
  tags = []

  control_plane_id = konnect_gateway_control_plane.my_konnect_cp.id
  route = {
    id = konnect_gateway_route.my_route.id
  }
}

Copied!

This example requires the following variables to be added to your manifest. You can specify values at runtime by setting TF_VAR_name=value.

variable "openai_api_key" {
  type = string
}

Copied!

Add this section to your kong.yaml configuration file:

kong.yaml

Copied!

_format_version: "3.0"
plugins:
  - name: ai-proxy-advanced
    consumer: consumerName|Id
    config:
      balancer:
        algorithm: semantic
        retries: 3
        failover_criteria:
        - error
        - timeout
        - http_429
        - http_503
        - non_idempotent
      embeddings:
        auth:
          header_name: Authorization
          header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}
        model:
          name: text-embedding-3-small
          provider: openai
      vectordb:
        strategy: redis
        distance_metric: cosine
        threshold: 0.7
        dimensions: 1024
        redis:
          host: localhost
          port: 6379
      targets:
      - model:
          name: gpt-4o
          provider: openai
          options:
            max_tokens: 1024
            temperature: 1.0
        route_type: llm/v1/chat
        weight: 2
        description: Specialist in real life topics
        auth:
          header_name: Authorization
          header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}
        logging:
          log_statistics: true
          log_payloads: true
      - model:
          name: gpt-4o
          provider: openai
          options:
            max_tokens: 1024
            temperature: 1.0
        route_type: llm/v1/chat
        weight: 75
        description: Specialist in programming problems
        auth:
          header_name: Authorization
          header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}
        logging:
          log_statistics: true
          log_payloads: true
      - model:
          name: gpt-4o-mini
          provider: openai
          options:
            max_tokens: 1024
            temperature: 1.0
        route_type: llm/v1/chat
        weight: 25
        description: Specialist in programming problems
        auth:
          header_name: Authorization
          header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}
        logging:
          log_statistics: true
          log_payloads: true

Make sure to replace the following placeholders with your own values:

consumerName|Id: The id or name of the consumer the plugin configuration will target.

Make the following request:

curl -i -X POST http://localhost:8001/consumers/{consumerName|Id}/plugins/ \
    --header "Accept: application/json" \
    --header "Content-Type: application/json" \
    --data '
    {
      "name": "ai-proxy-advanced",
      "config": {
        "balancer": {
          "algorithm": "semantic",
          "retries": 3,
          "failover_criteria": [
            "error",
            "timeout",
            "http_429",
            "http_503",
            "non_idempotent"
          ]
        },
        "embeddings": {
          "auth": {
            "header_name": "Authorization",
            "header_value": "Bearer '$OPENAI_API_KEY'"
          },
          "model": {
            "name": "text-embedding-3-small",
            "provider": "openai"
          }
        },
        "vectordb": {
          "strategy": "redis",
          "distance_metric": "cosine",
          "threshold": 0.7,
          "dimensions": 1024,
          "redis": {
            "host": "localhost",
            "port": 6379
          }
        },
        "targets": [
          {
            "model": {
              "name": "gpt-4o",
              "provider": "openai",
              "options": {
                "max_tokens": 1024,
                "temperature": 1.0
              }
            },
            "route_type": "llm/v1/chat",
            "weight": 2,
            "description": "Specialist in real life topics",
            "auth": {
              "header_name": "Authorization",
              "header_value": "Bearer '$OPENAI_API_KEY'"
            },
            "logging": {
              "log_statistics": true,
              "log_payloads": true
            }
          },
          {
            "model": {
              "name": "gpt-4o",
              "provider": "openai",
              "options": {
                "max_tokens": 1024,
                "temperature": 1.0
              }
            },
            "route_type": "llm/v1/chat",
            "weight": 75,
            "description": "Specialist in programming problems",
            "auth": {
              "header_name": "Authorization",
              "header_value": "Bearer '$OPENAI_API_KEY'"
            },
            "logging": {
              "log_statistics": true,
              "log_payloads": true
            }
          },
          {
            "model": {
              "name": "gpt-4o-mini",
              "provider": "openai",
              "options": {
                "max_tokens": 1024,
                "temperature": 1.0
              }
            },
            "route_type": "llm/v1/chat",
            "weight": 25,
            "description": "Specialist in programming problems",
            "auth": {
              "header_name": "Authorization",
              "header_value": "Bearer '$OPENAI_API_KEY'"
            },
            "logging": {
              "log_statistics": true,
              "log_payloads": true
            }
          }
        ]
      },
      "tags": []
    }
    '

Copied!

Make sure to replace the following placeholders with your own values:

consumerName|Id: The id or name of the consumer the plugin configuration will target.

Make the following request:

curl -X POST https://{region}.api.konghq.com/v2/control-planes/{controlPlaneId}/core-entities/consumers/{consumerId}/plugins/ \
    --header "accept: application/json" \
    --header "Content-Type: application/json" \
    --header "Authorization: Bearer $KONNECT_TOKEN" \
    --data '
    {
      "name": "ai-proxy-advanced",
      "config": {
        "balancer": {
          "algorithm": "semantic",
          "retries": 3,
          "failover_criteria": [
            "error",
            "timeout",
            "http_429",
            "http_503",
            "non_idempotent"
          ]
        },
        "embeddings": {
          "auth": {
            "header_name": "Authorization",
            "header_value": "Bearer '$OPENAI_API_KEY'"
          },
          "model": {
            "name": "text-embedding-3-small",
            "provider": "openai"
          }
        },
        "vectordb": {
          "strategy": "redis",
          "distance_metric": "cosine",
          "threshold": 0.7,
          "dimensions": 1024,
          "redis": {
            "host": "localhost",
            "port": 6379
          }
        },
        "targets": [
          {
            "model": {
              "name": "gpt-4o",
              "provider": "openai",
              "options": {
                "max_tokens": 1024,
                "temperature": 1.0
              }
            },
            "route_type": "llm/v1/chat",
            "weight": 2,
            "description": "Specialist in real life topics",
            "auth": {
              "header_name": "Authorization",
              "header_value": "Bearer '$OPENAI_API_KEY'"
            },
            "logging": {
              "log_statistics": true,
              "log_payloads": true
            }
          },
          {
            "model": {
              "name": "gpt-4o",
              "provider": "openai",
              "options": {
                "max_tokens": 1024,
                "temperature": 1.0
              }
            },
            "route_type": "llm/v1/chat",
            "weight": 75,
            "description": "Specialist in programming problems",
            "auth": {
              "header_name": "Authorization",
              "header_value": "Bearer '$OPENAI_API_KEY'"
            },
            "logging": {
              "log_statistics": true,
              "log_payloads": true
            }
          },
          {
            "model": {
              "name": "gpt-4o-mini",
              "provider": "openai",
              "options": {
                "max_tokens": 1024,
                "temperature": 1.0
              }
            },
            "route_type": "llm/v1/chat",
            "weight": 25,
            "description": "Specialist in programming problems",
            "auth": {
              "header_name": "Authorization",
              "header_value": "Bearer '$OPENAI_API_KEY'"
            },
            "logging": {
              "log_statistics": true,
              "log_payloads": true
            }
          }
        ]
      },
      "tags": []
    }
    '

Copied!

Make sure to replace the following placeholders with your own values:

region: Geographic region where your Kong Konnect is hosted and operates.
KONNECT_TOKEN: Your Personal Access Token (PAT) associated with your Konnect account.
controlPlaneId: The id of the control plane.
consumerId: The id of the consumer the plugin configuration will target.

See the Konnect API reference to learn about region-specific URLs and personal access tokens.

echo "
apiVersion: configuration.konghq.com/v1
kind: KongPlugin
metadata:
  name: ai-proxy-advanced
  namespace: kong
  annotations:
    kubernetes.io/ingress.class: kong
    konghq.com/tags: ''
config:
  balancer:
    algorithm: semantic
    retries: 3
    failover_criteria:
    - error
    - timeout
    - http_429
    - http_503
    - non_idempotent
  embeddings:
    auth:
      header_name: Authorization
      header_value: Bearer $OPENAI_API_KEY
    model:
      name: text-embedding-3-small
      provider: openai
  vectordb:
    strategy: redis
    distance_metric: cosine
    threshold: 0.7
    dimensions: 1024
    redis:
      host: localhost
      port: 6379
  targets:
  - model:
      name: gpt-4o
      provider: openai
      options:
        max_tokens: 1024
        temperature: 1.0
    route_type: llm/v1/chat
    weight: 2
    description: Specialist in real life topics
    auth:
      header_name: Authorization
      header_value: Bearer $OPENAI_API_KEY
    logging:
      log_statistics: true
      log_payloads: true
  - model:
      name: gpt-4o
      provider: openai
      options:
        max_tokens: 1024
        temperature: 1.0
    route_type: llm/v1/chat
    weight: 75
    description: Specialist in programming problems
    auth:
      header_name: Authorization
      header_value: Bearer $OPENAI_API_KEY
    logging:
      log_statistics: true
      log_payloads: true
  - model:
      name: gpt-4o-mini
      provider: openai
      options:
        max_tokens: 1024
        temperature: 1.0
    route_type: llm/v1/chat
    weight: 25
    description: Specialist in programming problems
    auth:
      header_name: Authorization
      header_value: Bearer $OPENAI_API_KEY
    logging:
      log_statistics: true
      log_payloads: true
plugin: ai-proxy-advanced
" | kubectl apply -f -

Copied!

Next, apply the KongPlugin resource by annotating the KongConsumer resource:

kubectl annotate -n kong kongconsumer CONSUMER_NAME konghq.com/plugins=ai-proxy-advanced

Copied!

Prerequisite: Configure your Personal Access Token

terraform {
  required_providers {
    konnect = {
      source  = "kong/konnect"
    }
  }
}

provider "konnect" {
  personal_access_token = "$KONNECT_TOKEN"
  server_url            = "https://us.api.konghq.com/"
}

Copied!

Add the following to your Terraform configuration to create a Konnect Gateway Plugin:

resource "konnect_gateway_plugin_ai_proxy_advanced" "my_ai_proxy_advanced" {
  enabled = true

  config = {

    balancer = {
      algorithm = "semantic"
      retries = 3
      failover_criteria = ["error", "timeout", "http_429", "http_503", "non_idempotent"]
    }

    embeddings = {

      auth = {
        header_name = "Authorization"
        header_value = "Bearer var.openai_api_key"
      }

      model = {
        name = "text-embedding-3-small"
        provider = "openai"
      }
    }

    vectordb = {
      strategy = "redis"
      distance_metric = "cosine"
      threshold = 0.7
      dimensions = 1024

      redis = {
        host = "localhost"
        port = 6379
      }
    }
    targets = [
      {

        model = {
          name = "gpt-4o"
          provider = "openai"

          options = {
            max_tokens = 1024
            temperature = 1.0
          }
        }
        route_type = "llm/v1/chat"
        weight = 2
        description = "Specialist in real life topics"

        auth = {
          header_name = "Authorization"
          header_value = "Bearer var.openai_api_key"
        }

        logging = {
          log_statistics = true
          log_payloads = true
        }
      }, 

      {

        model = {
          name = "gpt-4o"
          provider = "openai"

          options = {
            max_tokens = 1024
            temperature = 1.0
          }
        }
        route_type = "llm/v1/chat"
        weight = 75
        description = "Specialist in programming problems"

        auth = {
          header_name = "Authorization"
          header_value = "Bearer var.openai_api_key"
        }

        logging = {
          log_statistics = true
          log_payloads = true
        }
      }, 

      {

        model = {
          name = "gpt-4o-mini"
          provider = "openai"

          options = {
            max_tokens = 1024
            temperature = 1.0
          }
        }
        route_type = "llm/v1/chat"
        weight = 25
        description = "Specialist in programming problems"

        auth = {
          header_name = "Authorization"
          header_value = "Bearer var.openai_api_key"
        }

        logging = {
          log_statistics = true
          log_payloads = true
        }
      }    ]
  }
  tags = []

  control_plane_id = konnect_gateway_control_plane.my_konnect_cp.id
  consumer = {
    id = konnect_gateway_consumer.my_consumer.id
  }
}

Copied!

This example requires the following variables to be added to your manifest. You can specify values at runtime by setting TF_VAR_name=value.

variable "openai_api_key" {
  type = string
}

Copied!

Add this section to your kong.yaml configuration file:

kong.yaml

Copied!

_format_version: "3.0"
plugins:
  - name: ai-proxy-advanced
    consumer_group: consumerGroupName|Id
    config:
      balancer:
        algorithm: semantic
        retries: 3
        failover_criteria:
        - error
        - timeout
        - http_429
        - http_503
        - non_idempotent
      embeddings:
        auth:
          header_name: Authorization
          header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}
        model:
          name: text-embedding-3-small
          provider: openai
      vectordb:
        strategy: redis
        distance_metric: cosine
        threshold: 0.7
        dimensions: 1024
        redis:
          host: localhost
          port: 6379
      targets:
      - model:
          name: gpt-4o
          provider: openai
          options:
            max_tokens: 1024
            temperature: 1.0
        route_type: llm/v1/chat
        weight: 2
        description: Specialist in real life topics
        auth:
          header_name: Authorization
          header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}
        logging:
          log_statistics: true
          log_payloads: true
      - model:
          name: gpt-4o
          provider: openai
          options:
            max_tokens: 1024
            temperature: 1.0
        route_type: llm/v1/chat
        weight: 75
        description: Specialist in programming problems
        auth:
          header_name: Authorization
          header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}
        logging:
          log_statistics: true
          log_payloads: true
      - model:
          name: gpt-4o-mini
          provider: openai
          options:
            max_tokens: 1024
            temperature: 1.0
        route_type: llm/v1/chat
        weight: 25
        description: Specialist in programming problems
        auth:
          header_name: Authorization
          header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}
        logging:
          log_statistics: true
          log_payloads: true

Make sure to replace the following placeholders with your own values:

consumerGroupName|Id: The id or name of the consumer group the plugin configuration will target.

Make the following request:

curl -i -X POST http://localhost:8001/consumer_groups/{consumerGroupName|Id}/plugins/ \
    --header "Accept: application/json" \
    --header "Content-Type: application/json" \
    --data '
    {
      "name": "ai-proxy-advanced",
      "config": {
        "balancer": {
          "algorithm": "semantic",
          "retries": 3,
          "failover_criteria": [
            "error",
            "timeout",
            "http_429",
            "http_503",
            "non_idempotent"
          ]
        },
        "embeddings": {
          "auth": {
            "header_name": "Authorization",
            "header_value": "Bearer '$OPENAI_API_KEY'"
          },
          "model": {
            "name": "text-embedding-3-small",
            "provider": "openai"
          }
        },
        "vectordb": {
          "strategy": "redis",
          "distance_metric": "cosine",
          "threshold": 0.7,
          "dimensions": 1024,
          "redis": {
            "host": "localhost",
            "port": 6379
          }
        },
        "targets": [
          {
            "model": {
              "name": "gpt-4o",
              "provider": "openai",
              "options": {
                "max_tokens": 1024,
                "temperature": 1.0
              }
            },
            "route_type": "llm/v1/chat",
            "weight": 2,
            "description": "Specialist in real life topics",
            "auth": {
              "header_name": "Authorization",
              "header_value": "Bearer '$OPENAI_API_KEY'"
            },
            "logging": {
              "log_statistics": true,
              "log_payloads": true
            }
          },
          {
            "model": {
              "name": "gpt-4o",
              "provider": "openai",
              "options": {
                "max_tokens": 1024,
                "temperature": 1.0
              }
            },
            "route_type": "llm/v1/chat",
            "weight": 75,
            "description": "Specialist in programming problems",
            "auth": {
              "header_name": "Authorization",
              "header_value": "Bearer '$OPENAI_API_KEY'"
            },
            "logging": {
              "log_statistics": true,
              "log_payloads": true
            }
          },
          {
            "model": {
              "name": "gpt-4o-mini",
              "provider": "openai",
              "options": {
                "max_tokens": 1024,
                "temperature": 1.0
              }
            },
            "route_type": "llm/v1/chat",
            "weight": 25,
            "description": "Specialist in programming problems",
            "auth": {
              "header_name": "Authorization",
              "header_value": "Bearer '$OPENAI_API_KEY'"
            },
            "logging": {
              "log_statistics": true,
              "log_payloads": true
            }
          }
        ]
      },
      "tags": []
    }
    '

Copied!

Make sure to replace the following placeholders with your own values:

consumerGroupName|Id: The id or name of the consumer group the plugin configuration will target.

Make the following request:

curl -X POST https://{region}.api.konghq.com/v2/control-planes/{controlPlaneId}/core-entities/consumer_groups/{consumerGroupId}/plugins/ \
    --header "accept: application/json" \
    --header "Content-Type: application/json" \
    --header "Authorization: Bearer $KONNECT_TOKEN" \
    --data '
    {
      "name": "ai-proxy-advanced",
      "config": {
        "balancer": {
          "algorithm": "semantic",
          "retries": 3,
          "failover_criteria": [
            "error",
            "timeout",
            "http_429",
            "http_503",
            "non_idempotent"
          ]
        },
        "embeddings": {
          "auth": {
            "header_name": "Authorization",
            "header_value": "Bearer '$OPENAI_API_KEY'"
          },
          "model": {
            "name": "text-embedding-3-small",
            "provider": "openai"
          }
        },
        "vectordb": {
          "strategy": "redis",
          "distance_metric": "cosine",
          "threshold": 0.7,
          "dimensions": 1024,
          "redis": {
            "host": "localhost",
            "port": 6379
          }
        },
        "targets": [
          {
            "model": {
              "name": "gpt-4o",
              "provider": "openai",
              "options": {
                "max_tokens": 1024,
                "temperature": 1.0
              }
            },
            "route_type": "llm/v1/chat",
            "weight": 2,
            "description": "Specialist in real life topics",
            "auth": {
              "header_name": "Authorization",
              "header_value": "Bearer '$OPENAI_API_KEY'"
            },
            "logging": {
              "log_statistics": true,
              "log_payloads": true
            }
          },
          {
            "model": {
              "name": "gpt-4o",
              "provider": "openai",
              "options": {
                "max_tokens": 1024,
                "temperature": 1.0
              }
            },
            "route_type": "llm/v1/chat",
            "weight": 75,
            "description": "Specialist in programming problems",
            "auth": {
              "header_name": "Authorization",
              "header_value": "Bearer '$OPENAI_API_KEY'"
            },
            "logging": {
              "log_statistics": true,
              "log_payloads": true
            }
          },
          {
            "model": {
              "name": "gpt-4o-mini",
              "provider": "openai",
              "options": {
                "max_tokens": 1024,
                "temperature": 1.0
              }
            },
            "route_type": "llm/v1/chat",
            "weight": 25,
            "description": "Specialist in programming problems",
            "auth": {
              "header_name": "Authorization",
              "header_value": "Bearer '$OPENAI_API_KEY'"
            },
            "logging": {
              "log_statistics": true,
              "log_payloads": true
            }
          }
        ]
      },
      "tags": []
    }
    '

Copied!

Make sure to replace the following placeholders with your own values:

region: Geographic region where your Kong Konnect is hosted and operates.
KONNECT_TOKEN: Your Personal Access Token (PAT) associated with your Konnect account.
controlPlaneId: The id of the control plane.
consumerGroupId: The id of the consumer group the plugin configuration will target.

See the Konnect API reference to learn about region-specific URLs and personal access tokens.

echo "
apiVersion: configuration.konghq.com/v1
kind: KongPlugin
metadata:
  name: ai-proxy-advanced
  namespace: kong
  annotations:
    kubernetes.io/ingress.class: kong
    konghq.com/tags: ''
config:
  balancer:
    algorithm: semantic
    retries: 3
    failover_criteria:
    - error
    - timeout
    - http_429
    - http_503
    - non_idempotent
  embeddings:
    auth:
      header_name: Authorization
      header_value: Bearer $OPENAI_API_KEY
    model:
      name: text-embedding-3-small
      provider: openai
  vectordb:
    strategy: redis
    distance_metric: cosine
    threshold: 0.7
    dimensions: 1024
    redis:
      host: localhost
      port: 6379
  targets:
  - model:
      name: gpt-4o
      provider: openai
      options:
        max_tokens: 1024
        temperature: 1.0
    route_type: llm/v1/chat
    weight: 2
    description: Specialist in real life topics
    auth:
      header_name: Authorization
      header_value: Bearer $OPENAI_API_KEY
    logging:
      log_statistics: true
      log_payloads: true
  - model:
      name: gpt-4o
      provider: openai
      options:
        max_tokens: 1024
        temperature: 1.0
    route_type: llm/v1/chat
    weight: 75
    description: Specialist in programming problems
    auth:
      header_name: Authorization
      header_value: Bearer $OPENAI_API_KEY
    logging:
      log_statistics: true
      log_payloads: true
  - model:
      name: gpt-4o-mini
      provider: openai
      options:
        max_tokens: 1024
        temperature: 1.0
    route_type: llm/v1/chat
    weight: 25
    description: Specialist in programming problems
    auth:
      header_name: Authorization
      header_value: Bearer $OPENAI_API_KEY
    logging:
      log_statistics: true
      log_payloads: true
plugin: ai-proxy-advanced
" | kubectl apply -f -

Copied!

Next, apply the KongPlugin resource by annotating the KongConsumerGroup resource:

kubectl annotate -n kong kongconsumergroup CONSUMERGROUP_NAME konghq.com/plugins=ai-proxy-advanced

Copied!

Prerequisite: Configure your Personal Access Token

terraform {
  required_providers {
    konnect = {
      source  = "kong/konnect"
    }
  }
}

provider "konnect" {
  personal_access_token = "$KONNECT_TOKEN"
  server_url            = "https://us.api.konghq.com/"
}

Copied!

Add the following to your Terraform configuration to create a Konnect Gateway Plugin:

resource "konnect_gateway_plugin_ai_proxy_advanced" "my_ai_proxy_advanced" {
  enabled = true

  config = {

    balancer = {
      algorithm = "semantic"
      retries = 3
      failover_criteria = ["error", "timeout", "http_429", "http_503", "non_idempotent"]
    }

    embeddings = {

      auth = {
        header_name = "Authorization"
        header_value = "Bearer var.openai_api_key"
      }

      model = {
        name = "text-embedding-3-small"
        provider = "openai"
      }
    }

    vectordb = {
      strategy = "redis"
      distance_metric = "cosine"
      threshold = 0.7
      dimensions = 1024

      redis = {
        host = "localhost"
        port = 6379
      }
    }
    targets = [
      {

        model = {
          name = "gpt-4o"
          provider = "openai"

          options = {
            max_tokens = 1024
            temperature = 1.0
          }
        }
        route_type = "llm/v1/chat"
        weight = 2
        description = "Specialist in real life topics"

        auth = {
          header_name = "Authorization"
          header_value = "Bearer var.openai_api_key"
        }

        logging = {
          log_statistics = true
          log_payloads = true
        }
      }, 

      {

        model = {
          name = "gpt-4o"
          provider = "openai"

          options = {
            max_tokens = 1024
            temperature = 1.0
          }
        }
        route_type = "llm/v1/chat"
        weight = 75
        description = "Specialist in programming problems"

        auth = {
          header_name = "Authorization"
          header_value = "Bearer var.openai_api_key"
        }

        logging = {
          log_statistics = true
          log_payloads = true
        }
      }, 

      {

        model = {
          name = "gpt-4o-mini"
          provider = "openai"

          options = {
            max_tokens = 1024
            temperature = 1.0
          }
        }
        route_type = "llm/v1/chat"
        weight = 25
        description = "Specialist in programming problems"

        auth = {
          header_name = "Authorization"
          header_value = "Bearer var.openai_api_key"
        }

        logging = {
          log_statistics = true
          log_payloads = true
        }
      }    ]
  }
  tags = []

  control_plane_id = konnect_gateway_control_plane.my_konnect_cp.id
  consumer_group = {
    id = konnect_gateway_consumer_group.my_consumer_group.id
  }
}

Copied!

This example requires the following variables to be added to your manifest. You can specify values at runtime by setting TF_VAR_name=value.

variable "openai_api_key" {
  type = string
}

Copied!

AI Proxy Advanced

Load balancing: Semantic with fallback

Prerequisites

Environment variables

Set up the plugin

Help us make these docs great!

Still need help