Load balancing: Semantic with fallback
v3.13+ Configure the plugin to use three OpenAI models and route requests based on semantic similarity between the prompt and model descriptions.
In this example, two targets share the same description (“Specialist in programming problems”). When a prompt matches this description, the plugin will first route to the target with weight 75 (gpt-4o). If that target fails, it falls back to the target with weight 25 (gpt-4o-mini) using round-robin. The third target with a different description (“Specialist in real life topics”) handles prompts about non-technical topics.
Prerequisites
-
An OpenAI account
-
A Redis instance for vector storage
Environment variables
-
OPENAI_API_KEY: The API key to use to connect to OpenAI.
Add this section to your kong.yaml configuration file:
_format_version: "3.0"
plugins:
- name: ai-proxy-advanced
config:
balancer:
algorithm: semantic
retries: 3
failover_criteria:
- error
- timeout
- http_429
- http_503
- non_idempotent
embeddings:
auth:
header_name: Authorization
header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}
model:
name: text-embedding-3-small
provider: openai
vectordb:
strategy: redis
distance_metric: cosine
threshold: 0.7
dimensions: 1024
redis:
host: localhost
port: 6379
targets:
- model:
name: gpt-4o
provider: openai
options:
max_tokens: 1024
temperature: 1.0
route_type: llm/v1/chat
weight: 2
description: Specialist in real life topics
auth:
header_name: Authorization
header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}
logging:
log_statistics: true
log_payloads: true
- model:
name: gpt-4o
provider: openai
options:
max_tokens: 1024
temperature: 1.0
route_type: llm/v1/chat
weight: 75
description: Specialist in programming problems
auth:
header_name: Authorization
header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}
logging:
log_statistics: true
log_payloads: true
- model:
name: gpt-4o-mini
provider: openai
options:
max_tokens: 1024
temperature: 1.0
route_type: llm/v1/chat
weight: 25
description: Specialist in programming problems
auth:
header_name: Authorization
header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}
logging:
log_statistics: true
log_payloads: true
Make the following request:
curl -i -X POST http://localhost:8001/plugins/ \
--header "Accept: application/json" \
--header "Content-Type: application/json" \
--data '
{
"name": "ai-proxy-advanced",
"config": {
"balancer": {
"algorithm": "semantic",
"retries": 3,
"failover_criteria": [
"error",
"timeout",
"http_429",
"http_503",
"non_idempotent"
]
},
"embeddings": {
"auth": {
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"model": {
"name": "text-embedding-3-small",
"provider": "openai"
}
},
"vectordb": {
"strategy": "redis",
"distance_metric": "cosine",
"threshold": 0.7,
"dimensions": 1024,
"redis": {
"host": "localhost",
"port": 6379
}
},
"targets": [
{
"model": {
"name": "gpt-4o",
"provider": "openai",
"options": {
"max_tokens": 1024,
"temperature": 1.0
}
},
"route_type": "llm/v1/chat",
"weight": 2,
"description": "Specialist in real life topics",
"auth": {
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"logging": {
"log_statistics": true,
"log_payloads": true
}
},
{
"model": {
"name": "gpt-4o",
"provider": "openai",
"options": {
"max_tokens": 1024,
"temperature": 1.0
}
},
"route_type": "llm/v1/chat",
"weight": 75,
"description": "Specialist in programming problems",
"auth": {
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"logging": {
"log_statistics": true,
"log_payloads": true
}
},
{
"model": {
"name": "gpt-4o-mini",
"provider": "openai",
"options": {
"max_tokens": 1024,
"temperature": 1.0
}
},
"route_type": "llm/v1/chat",
"weight": 25,
"description": "Specialist in programming problems",
"auth": {
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"logging": {
"log_statistics": true,
"log_payloads": true
}
}
]
},
"tags": []
}
'
Make the following request:
curl -X POST https://{region}.api.konghq.com/v2/control-planes/{controlPlaneId}/core-entities/plugins/ \
--header "accept: application/json" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer $KONNECT_TOKEN" \
--data '
{
"name": "ai-proxy-advanced",
"config": {
"balancer": {
"algorithm": "semantic",
"retries": 3,
"failover_criteria": [
"error",
"timeout",
"http_429",
"http_503",
"non_idempotent"
]
},
"embeddings": {
"auth": {
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"model": {
"name": "text-embedding-3-small",
"provider": "openai"
}
},
"vectordb": {
"strategy": "redis",
"distance_metric": "cosine",
"threshold": 0.7,
"dimensions": 1024,
"redis": {
"host": "localhost",
"port": 6379
}
},
"targets": [
{
"model": {
"name": "gpt-4o",
"provider": "openai",
"options": {
"max_tokens": 1024,
"temperature": 1.0
}
},
"route_type": "llm/v1/chat",
"weight": 2,
"description": "Specialist in real life topics",
"auth": {
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"logging": {
"log_statistics": true,
"log_payloads": true
}
},
{
"model": {
"name": "gpt-4o",
"provider": "openai",
"options": {
"max_tokens": 1024,
"temperature": 1.0
}
},
"route_type": "llm/v1/chat",
"weight": 75,
"description": "Specialist in programming problems",
"auth": {
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"logging": {
"log_statistics": true,
"log_payloads": true
}
},
{
"model": {
"name": "gpt-4o-mini",
"provider": "openai",
"options": {
"max_tokens": 1024,
"temperature": 1.0
}
},
"route_type": "llm/v1/chat",
"weight": 25,
"description": "Specialist in programming problems",
"auth": {
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"logging": {
"log_statistics": true,
"log_payloads": true
}
}
]
},
"tags": []
}
'
Make sure to replace the following placeholders with your own values:
-
region: Geographic region where your Kong Konnect is hosted and operates. -
KONNECT_TOKEN: Your Personal Access Token (PAT) associated with your Konnect account. -
controlPlaneId: Theidof the control plane.
See the Konnect API reference to learn about region-specific URLs and personal access tokens.
echo "
apiVersion: configuration.konghq.com/v1
kind: KongClusterPlugin
metadata:
name: ai-proxy-advanced
namespace: kong
annotations:
kubernetes.io/ingress.class: kong
konghq.com/tags: ''
labels:
global: 'true'
config:
balancer:
algorithm: semantic
retries: 3
failover_criteria:
- error
- timeout
- http_429
- http_503
- non_idempotent
embeddings:
auth:
header_name: Authorization
header_value: Bearer $OPENAI_API_KEY
model:
name: text-embedding-3-small
provider: openai
vectordb:
strategy: redis
distance_metric: cosine
threshold: 0.7
dimensions: 1024
redis:
host: localhost
port: 6379
targets:
- model:
name: gpt-4o
provider: openai
options:
max_tokens: 1024
temperature: 1.0
route_type: llm/v1/chat
weight: 2
description: Specialist in real life topics
auth:
header_name: Authorization
header_value: Bearer $OPENAI_API_KEY
logging:
log_statistics: true
log_payloads: true
- model:
name: gpt-4o
provider: openai
options:
max_tokens: 1024
temperature: 1.0
route_type: llm/v1/chat
weight: 75
description: Specialist in programming problems
auth:
header_name: Authorization
header_value: Bearer $OPENAI_API_KEY
logging:
log_statistics: true
log_payloads: true
- model:
name: gpt-4o-mini
provider: openai
options:
max_tokens: 1024
temperature: 1.0
route_type: llm/v1/chat
weight: 25
description: Specialist in programming problems
auth:
header_name: Authorization
header_value: Bearer $OPENAI_API_KEY
logging:
log_statistics: true
log_payloads: true
plugin: ai-proxy-advanced
" | kubectl apply -f -
Prerequisite: Configure your Personal Access Token
terraform {
required_providers {
konnect = {
source = "kong/konnect"
}
}
}
provider "konnect" {
personal_access_token = "$KONNECT_TOKEN"
server_url = "https://us.api.konghq.com/"
}
Add the following to your Terraform configuration to create a Konnect Gateway Plugin:
resource "konnect_gateway_plugin_ai_proxy_advanced" "my_ai_proxy_advanced" {
enabled = true
config = {
balancer = {
algorithm = "semantic"
retries = 3
failover_criteria = ["error", "timeout", "http_429", "http_503", "non_idempotent"]
}
embeddings = {
auth = {
header_name = "Authorization"
header_value = "Bearer var.openai_api_key"
}
model = {
name = "text-embedding-3-small"
provider = "openai"
}
}
vectordb = {
strategy = "redis"
distance_metric = "cosine"
threshold = 0.7
dimensions = 1024
redis = {
host = "localhost"
port = 6379
}
}
targets = [
{
model = {
name = "gpt-4o"
provider = "openai"
options = {
max_tokens = 1024
temperature = 1.0
}
}
route_type = "llm/v1/chat"
weight = 2
description = "Specialist in real life topics"
auth = {
header_name = "Authorization"
header_value = "Bearer var.openai_api_key"
}
logging = {
log_statistics = true
log_payloads = true
}
},
{
model = {
name = "gpt-4o"
provider = "openai"
options = {
max_tokens = 1024
temperature = 1.0
}
}
route_type = "llm/v1/chat"
weight = 75
description = "Specialist in programming problems"
auth = {
header_name = "Authorization"
header_value = "Bearer var.openai_api_key"
}
logging = {
log_statistics = true
log_payloads = true
}
},
{
model = {
name = "gpt-4o-mini"
provider = "openai"
options = {
max_tokens = 1024
temperature = 1.0
}
}
route_type = "llm/v1/chat"
weight = 25
description = "Specialist in programming problems"
auth = {
header_name = "Authorization"
header_value = "Bearer var.openai_api_key"
}
logging = {
log_statistics = true
log_payloads = true
}
} ]
}
tags = []
control_plane_id = konnect_gateway_control_plane.my_konnect_cp.id
}
This example requires the following variables to be added to your manifest. You can specify values at runtime by setting TF_VAR_name=value.
variable "openai_api_key" {
type = string
}
Add this section to your kong.yaml configuration file:
_format_version: "3.0"
plugins:
- name: ai-proxy-advanced
service: serviceName|Id
config:
balancer:
algorithm: semantic
retries: 3
failover_criteria:
- error
- timeout
- http_429
- http_503
- non_idempotent
embeddings:
auth:
header_name: Authorization
header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}
model:
name: text-embedding-3-small
provider: openai
vectordb:
strategy: redis
distance_metric: cosine
threshold: 0.7
dimensions: 1024
redis:
host: localhost
port: 6379
targets:
- model:
name: gpt-4o
provider: openai
options:
max_tokens: 1024
temperature: 1.0
route_type: llm/v1/chat
weight: 2
description: Specialist in real life topics
auth:
header_name: Authorization
header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}
logging:
log_statistics: true
log_payloads: true
- model:
name: gpt-4o
provider: openai
options:
max_tokens: 1024
temperature: 1.0
route_type: llm/v1/chat
weight: 75
description: Specialist in programming problems
auth:
header_name: Authorization
header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}
logging:
log_statistics: true
log_payloads: true
- model:
name: gpt-4o-mini
provider: openai
options:
max_tokens: 1024
temperature: 1.0
route_type: llm/v1/chat
weight: 25
description: Specialist in programming problems
auth:
header_name: Authorization
header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}
logging:
log_statistics: true
log_payloads: true
Make sure to replace the following placeholders with your own values:
-
serviceName|Id: Theidornameof the service the plugin configuration will target.
Make the following request:
curl -i -X POST http://localhost:8001/services/{serviceName|Id}/plugins/ \
--header "Accept: application/json" \
--header "Content-Type: application/json" \
--data '
{
"name": "ai-proxy-advanced",
"config": {
"balancer": {
"algorithm": "semantic",
"retries": 3,
"failover_criteria": [
"error",
"timeout",
"http_429",
"http_503",
"non_idempotent"
]
},
"embeddings": {
"auth": {
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"model": {
"name": "text-embedding-3-small",
"provider": "openai"
}
},
"vectordb": {
"strategy": "redis",
"distance_metric": "cosine",
"threshold": 0.7,
"dimensions": 1024,
"redis": {
"host": "localhost",
"port": 6379
}
},
"targets": [
{
"model": {
"name": "gpt-4o",
"provider": "openai",
"options": {
"max_tokens": 1024,
"temperature": 1.0
}
},
"route_type": "llm/v1/chat",
"weight": 2,
"description": "Specialist in real life topics",
"auth": {
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"logging": {
"log_statistics": true,
"log_payloads": true
}
},
{
"model": {
"name": "gpt-4o",
"provider": "openai",
"options": {
"max_tokens": 1024,
"temperature": 1.0
}
},
"route_type": "llm/v1/chat",
"weight": 75,
"description": "Specialist in programming problems",
"auth": {
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"logging": {
"log_statistics": true,
"log_payloads": true
}
},
{
"model": {
"name": "gpt-4o-mini",
"provider": "openai",
"options": {
"max_tokens": 1024,
"temperature": 1.0
}
},
"route_type": "llm/v1/chat",
"weight": 25,
"description": "Specialist in programming problems",
"auth": {
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"logging": {
"log_statistics": true,
"log_payloads": true
}
}
]
},
"tags": []
}
'
Make sure to replace the following placeholders with your own values:
-
serviceName|Id: Theidornameof the service the plugin configuration will target.
Make the following request:
curl -X POST https://{region}.api.konghq.com/v2/control-planes/{controlPlaneId}/core-entities/services/{serviceId}/plugins/ \
--header "accept: application/json" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer $KONNECT_TOKEN" \
--data '
{
"name": "ai-proxy-advanced",
"config": {
"balancer": {
"algorithm": "semantic",
"retries": 3,
"failover_criteria": [
"error",
"timeout",
"http_429",
"http_503",
"non_idempotent"
]
},
"embeddings": {
"auth": {
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"model": {
"name": "text-embedding-3-small",
"provider": "openai"
}
},
"vectordb": {
"strategy": "redis",
"distance_metric": "cosine",
"threshold": 0.7,
"dimensions": 1024,
"redis": {
"host": "localhost",
"port": 6379
}
},
"targets": [
{
"model": {
"name": "gpt-4o",
"provider": "openai",
"options": {
"max_tokens": 1024,
"temperature": 1.0
}
},
"route_type": "llm/v1/chat",
"weight": 2,
"description": "Specialist in real life topics",
"auth": {
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"logging": {
"log_statistics": true,
"log_payloads": true
}
},
{
"model": {
"name": "gpt-4o",
"provider": "openai",
"options": {
"max_tokens": 1024,
"temperature": 1.0
}
},
"route_type": "llm/v1/chat",
"weight": 75,
"description": "Specialist in programming problems",
"auth": {
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"logging": {
"log_statistics": true,
"log_payloads": true
}
},
{
"model": {
"name": "gpt-4o-mini",
"provider": "openai",
"options": {
"max_tokens": 1024,
"temperature": 1.0
}
},
"route_type": "llm/v1/chat",
"weight": 25,
"description": "Specialist in programming problems",
"auth": {
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"logging": {
"log_statistics": true,
"log_payloads": true
}
}
]
},
"tags": []
}
'
Make sure to replace the following placeholders with your own values:
-
region: Geographic region where your Kong Konnect is hosted and operates. -
KONNECT_TOKEN: Your Personal Access Token (PAT) associated with your Konnect account. -
controlPlaneId: Theidof the control plane. -
serviceId: Theidof the service the plugin configuration will target.
See the Konnect API reference to learn about region-specific URLs and personal access tokens.
echo "
apiVersion: configuration.konghq.com/v1
kind: KongPlugin
metadata:
name: ai-proxy-advanced
namespace: kong
annotations:
kubernetes.io/ingress.class: kong
konghq.com/tags: ''
config:
balancer:
algorithm: semantic
retries: 3
failover_criteria:
- error
- timeout
- http_429
- http_503
- non_idempotent
embeddings:
auth:
header_name: Authorization
header_value: Bearer $OPENAI_API_KEY
model:
name: text-embedding-3-small
provider: openai
vectordb:
strategy: redis
distance_metric: cosine
threshold: 0.7
dimensions: 1024
redis:
host: localhost
port: 6379
targets:
- model:
name: gpt-4o
provider: openai
options:
max_tokens: 1024
temperature: 1.0
route_type: llm/v1/chat
weight: 2
description: Specialist in real life topics
auth:
header_name: Authorization
header_value: Bearer $OPENAI_API_KEY
logging:
log_statistics: true
log_payloads: true
- model:
name: gpt-4o
provider: openai
options:
max_tokens: 1024
temperature: 1.0
route_type: llm/v1/chat
weight: 75
description: Specialist in programming problems
auth:
header_name: Authorization
header_value: Bearer $OPENAI_API_KEY
logging:
log_statistics: true
log_payloads: true
- model:
name: gpt-4o-mini
provider: openai
options:
max_tokens: 1024
temperature: 1.0
route_type: llm/v1/chat
weight: 25
description: Specialist in programming problems
auth:
header_name: Authorization
header_value: Bearer $OPENAI_API_KEY
logging:
log_statistics: true
log_payloads: true
plugin: ai-proxy-advanced
" | kubectl apply -f -
Next, apply the KongPlugin resource by annotating the service resource:
kubectl annotate -n kong service SERVICE_NAME konghq.com/plugins=ai-proxy-advanced
Prerequisite: Configure your Personal Access Token
terraform {
required_providers {
konnect = {
source = "kong/konnect"
}
}
}
provider "konnect" {
personal_access_token = "$KONNECT_TOKEN"
server_url = "https://us.api.konghq.com/"
}
Add the following to your Terraform configuration to create a Konnect Gateway Plugin:
resource "konnect_gateway_plugin_ai_proxy_advanced" "my_ai_proxy_advanced" {
enabled = true
config = {
balancer = {
algorithm = "semantic"
retries = 3
failover_criteria = ["error", "timeout", "http_429", "http_503", "non_idempotent"]
}
embeddings = {
auth = {
header_name = "Authorization"
header_value = "Bearer var.openai_api_key"
}
model = {
name = "text-embedding-3-small"
provider = "openai"
}
}
vectordb = {
strategy = "redis"
distance_metric = "cosine"
threshold = 0.7
dimensions = 1024
redis = {
host = "localhost"
port = 6379
}
}
targets = [
{
model = {
name = "gpt-4o"
provider = "openai"
options = {
max_tokens = 1024
temperature = 1.0
}
}
route_type = "llm/v1/chat"
weight = 2
description = "Specialist in real life topics"
auth = {
header_name = "Authorization"
header_value = "Bearer var.openai_api_key"
}
logging = {
log_statistics = true
log_payloads = true
}
},
{
model = {
name = "gpt-4o"
provider = "openai"
options = {
max_tokens = 1024
temperature = 1.0
}
}
route_type = "llm/v1/chat"
weight = 75
description = "Specialist in programming problems"
auth = {
header_name = "Authorization"
header_value = "Bearer var.openai_api_key"
}
logging = {
log_statistics = true
log_payloads = true
}
},
{
model = {
name = "gpt-4o-mini"
provider = "openai"
options = {
max_tokens = 1024
temperature = 1.0
}
}
route_type = "llm/v1/chat"
weight = 25
description = "Specialist in programming problems"
auth = {
header_name = "Authorization"
header_value = "Bearer var.openai_api_key"
}
logging = {
log_statistics = true
log_payloads = true
}
} ]
}
tags = []
control_plane_id = konnect_gateway_control_plane.my_konnect_cp.id
service = {
id = konnect_gateway_service.my_service.id
}
}
This example requires the following variables to be added to your manifest. You can specify values at runtime by setting TF_VAR_name=value.
variable "openai_api_key" {
type = string
}
Add this section to your kong.yaml configuration file:
_format_version: "3.0"
plugins:
- name: ai-proxy-advanced
route: routeName|Id
config:
balancer:
algorithm: semantic
retries: 3
failover_criteria:
- error
- timeout
- http_429
- http_503
- non_idempotent
embeddings:
auth:
header_name: Authorization
header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}
model:
name: text-embedding-3-small
provider: openai
vectordb:
strategy: redis
distance_metric: cosine
threshold: 0.7
dimensions: 1024
redis:
host: localhost
port: 6379
targets:
- model:
name: gpt-4o
provider: openai
options:
max_tokens: 1024
temperature: 1.0
route_type: llm/v1/chat
weight: 2
description: Specialist in real life topics
auth:
header_name: Authorization
header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}
logging:
log_statistics: true
log_payloads: true
- model:
name: gpt-4o
provider: openai
options:
max_tokens: 1024
temperature: 1.0
route_type: llm/v1/chat
weight: 75
description: Specialist in programming problems
auth:
header_name: Authorization
header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}
logging:
log_statistics: true
log_payloads: true
- model:
name: gpt-4o-mini
provider: openai
options:
max_tokens: 1024
temperature: 1.0
route_type: llm/v1/chat
weight: 25
description: Specialist in programming problems
auth:
header_name: Authorization
header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}
logging:
log_statistics: true
log_payloads: true
Make sure to replace the following placeholders with your own values:
-
routeName|Id: Theidornameof the route the plugin configuration will target.
Make the following request:
curl -i -X POST http://localhost:8001/routes/{routeName|Id}/plugins/ \
--header "Accept: application/json" \
--header "Content-Type: application/json" \
--data '
{
"name": "ai-proxy-advanced",
"config": {
"balancer": {
"algorithm": "semantic",
"retries": 3,
"failover_criteria": [
"error",
"timeout",
"http_429",
"http_503",
"non_idempotent"
]
},
"embeddings": {
"auth": {
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"model": {
"name": "text-embedding-3-small",
"provider": "openai"
}
},
"vectordb": {
"strategy": "redis",
"distance_metric": "cosine",
"threshold": 0.7,
"dimensions": 1024,
"redis": {
"host": "localhost",
"port": 6379
}
},
"targets": [
{
"model": {
"name": "gpt-4o",
"provider": "openai",
"options": {
"max_tokens": 1024,
"temperature": 1.0
}
},
"route_type": "llm/v1/chat",
"weight": 2,
"description": "Specialist in real life topics",
"auth": {
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"logging": {
"log_statistics": true,
"log_payloads": true
}
},
{
"model": {
"name": "gpt-4o",
"provider": "openai",
"options": {
"max_tokens": 1024,
"temperature": 1.0
}
},
"route_type": "llm/v1/chat",
"weight": 75,
"description": "Specialist in programming problems",
"auth": {
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"logging": {
"log_statistics": true,
"log_payloads": true
}
},
{
"model": {
"name": "gpt-4o-mini",
"provider": "openai",
"options": {
"max_tokens": 1024,
"temperature": 1.0
}
},
"route_type": "llm/v1/chat",
"weight": 25,
"description": "Specialist in programming problems",
"auth": {
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"logging": {
"log_statistics": true,
"log_payloads": true
}
}
]
},
"tags": []
}
'
Make sure to replace the following placeholders with your own values:
-
routeName|Id: Theidornameof the route the plugin configuration will target.
Make the following request:
curl -X POST https://{region}.api.konghq.com/v2/control-planes/{controlPlaneId}/core-entities/routes/{routeId}/plugins/ \
--header "accept: application/json" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer $KONNECT_TOKEN" \
--data '
{
"name": "ai-proxy-advanced",
"config": {
"balancer": {
"algorithm": "semantic",
"retries": 3,
"failover_criteria": [
"error",
"timeout",
"http_429",
"http_503",
"non_idempotent"
]
},
"embeddings": {
"auth": {
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"model": {
"name": "text-embedding-3-small",
"provider": "openai"
}
},
"vectordb": {
"strategy": "redis",
"distance_metric": "cosine",
"threshold": 0.7,
"dimensions": 1024,
"redis": {
"host": "localhost",
"port": 6379
}
},
"targets": [
{
"model": {
"name": "gpt-4o",
"provider": "openai",
"options": {
"max_tokens": 1024,
"temperature": 1.0
}
},
"route_type": "llm/v1/chat",
"weight": 2,
"description": "Specialist in real life topics",
"auth": {
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"logging": {
"log_statistics": true,
"log_payloads": true
}
},
{
"model": {
"name": "gpt-4o",
"provider": "openai",
"options": {
"max_tokens": 1024,
"temperature": 1.0
}
},
"route_type": "llm/v1/chat",
"weight": 75,
"description": "Specialist in programming problems",
"auth": {
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"logging": {
"log_statistics": true,
"log_payloads": true
}
},
{
"model": {
"name": "gpt-4o-mini",
"provider": "openai",
"options": {
"max_tokens": 1024,
"temperature": 1.0
}
},
"route_type": "llm/v1/chat",
"weight": 25,
"description": "Specialist in programming problems",
"auth": {
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"logging": {
"log_statistics": true,
"log_payloads": true
}
}
]
},
"tags": []
}
'
Make sure to replace the following placeholders with your own values:
-
region: Geographic region where your Kong Konnect is hosted and operates. -
KONNECT_TOKEN: Your Personal Access Token (PAT) associated with your Konnect account. -
controlPlaneId: Theidof the control plane. -
routeId: Theidof the route the plugin configuration will target.
See the Konnect API reference to learn about region-specific URLs and personal access tokens.
echo "
apiVersion: configuration.konghq.com/v1
kind: KongPlugin
metadata:
name: ai-proxy-advanced
namespace: kong
annotations:
kubernetes.io/ingress.class: kong
konghq.com/tags: ''
config:
balancer:
algorithm: semantic
retries: 3
failover_criteria:
- error
- timeout
- http_429
- http_503
- non_idempotent
embeddings:
auth:
header_name: Authorization
header_value: Bearer $OPENAI_API_KEY
model:
name: text-embedding-3-small
provider: openai
vectordb:
strategy: redis
distance_metric: cosine
threshold: 0.7
dimensions: 1024
redis:
host: localhost
port: 6379
targets:
- model:
name: gpt-4o
provider: openai
options:
max_tokens: 1024
temperature: 1.0
route_type: llm/v1/chat
weight: 2
description: Specialist in real life topics
auth:
header_name: Authorization
header_value: Bearer $OPENAI_API_KEY
logging:
log_statistics: true
log_payloads: true
- model:
name: gpt-4o
provider: openai
options:
max_tokens: 1024
temperature: 1.0
route_type: llm/v1/chat
weight: 75
description: Specialist in programming problems
auth:
header_name: Authorization
header_value: Bearer $OPENAI_API_KEY
logging:
log_statistics: true
log_payloads: true
- model:
name: gpt-4o-mini
provider: openai
options:
max_tokens: 1024
temperature: 1.0
route_type: llm/v1/chat
weight: 25
description: Specialist in programming problems
auth:
header_name: Authorization
header_value: Bearer $OPENAI_API_KEY
logging:
log_statistics: true
log_payloads: true
plugin: ai-proxy-advanced
" | kubectl apply -f -
Next, apply the KongPlugin resource by annotating the httproute or ingress resource:
kubectl annotate -n kong httproute konghq.com/plugins=ai-proxy-advanced
kubectl annotate -n kong ingress konghq.com/plugins=ai-proxy-advanced
Prerequisite: Configure your Personal Access Token
terraform {
required_providers {
konnect = {
source = "kong/konnect"
}
}
}
provider "konnect" {
personal_access_token = "$KONNECT_TOKEN"
server_url = "https://us.api.konghq.com/"
}
Add the following to your Terraform configuration to create a Konnect Gateway Plugin:
resource "konnect_gateway_plugin_ai_proxy_advanced" "my_ai_proxy_advanced" {
enabled = true
config = {
balancer = {
algorithm = "semantic"
retries = 3
failover_criteria = ["error", "timeout", "http_429", "http_503", "non_idempotent"]
}
embeddings = {
auth = {
header_name = "Authorization"
header_value = "Bearer var.openai_api_key"
}
model = {
name = "text-embedding-3-small"
provider = "openai"
}
}
vectordb = {
strategy = "redis"
distance_metric = "cosine"
threshold = 0.7
dimensions = 1024
redis = {
host = "localhost"
port = 6379
}
}
targets = [
{
model = {
name = "gpt-4o"
provider = "openai"
options = {
max_tokens = 1024
temperature = 1.0
}
}
route_type = "llm/v1/chat"
weight = 2
description = "Specialist in real life topics"
auth = {
header_name = "Authorization"
header_value = "Bearer var.openai_api_key"
}
logging = {
log_statistics = true
log_payloads = true
}
},
{
model = {
name = "gpt-4o"
provider = "openai"
options = {
max_tokens = 1024
temperature = 1.0
}
}
route_type = "llm/v1/chat"
weight = 75
description = "Specialist in programming problems"
auth = {
header_name = "Authorization"
header_value = "Bearer var.openai_api_key"
}
logging = {
log_statistics = true
log_payloads = true
}
},
{
model = {
name = "gpt-4o-mini"
provider = "openai"
options = {
max_tokens = 1024
temperature = 1.0
}
}
route_type = "llm/v1/chat"
weight = 25
description = "Specialist in programming problems"
auth = {
header_name = "Authorization"
header_value = "Bearer var.openai_api_key"
}
logging = {
log_statistics = true
log_payloads = true
}
} ]
}
tags = []
control_plane_id = konnect_gateway_control_plane.my_konnect_cp.id
route = {
id = konnect_gateway_route.my_route.id
}
}
This example requires the following variables to be added to your manifest. You can specify values at runtime by setting TF_VAR_name=value.
variable "openai_api_key" {
type = string
}
Add this section to your kong.yaml configuration file:
_format_version: "3.0"
plugins:
- name: ai-proxy-advanced
consumer: consumerName|Id
config:
balancer:
algorithm: semantic
retries: 3
failover_criteria:
- error
- timeout
- http_429
- http_503
- non_idempotent
embeddings:
auth:
header_name: Authorization
header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}
model:
name: text-embedding-3-small
provider: openai
vectordb:
strategy: redis
distance_metric: cosine
threshold: 0.7
dimensions: 1024
redis:
host: localhost
port: 6379
targets:
- model:
name: gpt-4o
provider: openai
options:
max_tokens: 1024
temperature: 1.0
route_type: llm/v1/chat
weight: 2
description: Specialist in real life topics
auth:
header_name: Authorization
header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}
logging:
log_statistics: true
log_payloads: true
- model:
name: gpt-4o
provider: openai
options:
max_tokens: 1024
temperature: 1.0
route_type: llm/v1/chat
weight: 75
description: Specialist in programming problems
auth:
header_name: Authorization
header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}
logging:
log_statistics: true
log_payloads: true
- model:
name: gpt-4o-mini
provider: openai
options:
max_tokens: 1024
temperature: 1.0
route_type: llm/v1/chat
weight: 25
description: Specialist in programming problems
auth:
header_name: Authorization
header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}
logging:
log_statistics: true
log_payloads: true
Make sure to replace the following placeholders with your own values:
-
consumerName|Id: Theidornameof the consumer the plugin configuration will target.
Make the following request:
curl -i -X POST http://localhost:8001/consumers/{consumerName|Id}/plugins/ \
--header "Accept: application/json" \
--header "Content-Type: application/json" \
--data '
{
"name": "ai-proxy-advanced",
"config": {
"balancer": {
"algorithm": "semantic",
"retries": 3,
"failover_criteria": [
"error",
"timeout",
"http_429",
"http_503",
"non_idempotent"
]
},
"embeddings": {
"auth": {
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"model": {
"name": "text-embedding-3-small",
"provider": "openai"
}
},
"vectordb": {
"strategy": "redis",
"distance_metric": "cosine",
"threshold": 0.7,
"dimensions": 1024,
"redis": {
"host": "localhost",
"port": 6379
}
},
"targets": [
{
"model": {
"name": "gpt-4o",
"provider": "openai",
"options": {
"max_tokens": 1024,
"temperature": 1.0
}
},
"route_type": "llm/v1/chat",
"weight": 2,
"description": "Specialist in real life topics",
"auth": {
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"logging": {
"log_statistics": true,
"log_payloads": true
}
},
{
"model": {
"name": "gpt-4o",
"provider": "openai",
"options": {
"max_tokens": 1024,
"temperature": 1.0
}
},
"route_type": "llm/v1/chat",
"weight": 75,
"description": "Specialist in programming problems",
"auth": {
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"logging": {
"log_statistics": true,
"log_payloads": true
}
},
{
"model": {
"name": "gpt-4o-mini",
"provider": "openai",
"options": {
"max_tokens": 1024,
"temperature": 1.0
}
},
"route_type": "llm/v1/chat",
"weight": 25,
"description": "Specialist in programming problems",
"auth": {
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"logging": {
"log_statistics": true,
"log_payloads": true
}
}
]
},
"tags": []
}
'
Make sure to replace the following placeholders with your own values:
-
consumerName|Id: Theidornameof the consumer the plugin configuration will target.
Make the following request:
curl -X POST https://{region}.api.konghq.com/v2/control-planes/{controlPlaneId}/core-entities/consumers/{consumerId}/plugins/ \
--header "accept: application/json" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer $KONNECT_TOKEN" \
--data '
{
"name": "ai-proxy-advanced",
"config": {
"balancer": {
"algorithm": "semantic",
"retries": 3,
"failover_criteria": [
"error",
"timeout",
"http_429",
"http_503",
"non_idempotent"
]
},
"embeddings": {
"auth": {
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"model": {
"name": "text-embedding-3-small",
"provider": "openai"
}
},
"vectordb": {
"strategy": "redis",
"distance_metric": "cosine",
"threshold": 0.7,
"dimensions": 1024,
"redis": {
"host": "localhost",
"port": 6379
}
},
"targets": [
{
"model": {
"name": "gpt-4o",
"provider": "openai",
"options": {
"max_tokens": 1024,
"temperature": 1.0
}
},
"route_type": "llm/v1/chat",
"weight": 2,
"description": "Specialist in real life topics",
"auth": {
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"logging": {
"log_statistics": true,
"log_payloads": true
}
},
{
"model": {
"name": "gpt-4o",
"provider": "openai",
"options": {
"max_tokens": 1024,
"temperature": 1.0
}
},
"route_type": "llm/v1/chat",
"weight": 75,
"description": "Specialist in programming problems",
"auth": {
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"logging": {
"log_statistics": true,
"log_payloads": true
}
},
{
"model": {
"name": "gpt-4o-mini",
"provider": "openai",
"options": {
"max_tokens": 1024,
"temperature": 1.0
}
},
"route_type": "llm/v1/chat",
"weight": 25,
"description": "Specialist in programming problems",
"auth": {
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"logging": {
"log_statistics": true,
"log_payloads": true
}
}
]
},
"tags": []
}
'
Make sure to replace the following placeholders with your own values:
-
region: Geographic region where your Kong Konnect is hosted and operates. -
KONNECT_TOKEN: Your Personal Access Token (PAT) associated with your Konnect account. -
controlPlaneId: Theidof the control plane. -
consumerId: Theidof the consumer the plugin configuration will target.
See the Konnect API reference to learn about region-specific URLs and personal access tokens.
echo "
apiVersion: configuration.konghq.com/v1
kind: KongPlugin
metadata:
name: ai-proxy-advanced
namespace: kong
annotations:
kubernetes.io/ingress.class: kong
konghq.com/tags: ''
config:
balancer:
algorithm: semantic
retries: 3
failover_criteria:
- error
- timeout
- http_429
- http_503
- non_idempotent
embeddings:
auth:
header_name: Authorization
header_value: Bearer $OPENAI_API_KEY
model:
name: text-embedding-3-small
provider: openai
vectordb:
strategy: redis
distance_metric: cosine
threshold: 0.7
dimensions: 1024
redis:
host: localhost
port: 6379
targets:
- model:
name: gpt-4o
provider: openai
options:
max_tokens: 1024
temperature: 1.0
route_type: llm/v1/chat
weight: 2
description: Specialist in real life topics
auth:
header_name: Authorization
header_value: Bearer $OPENAI_API_KEY
logging:
log_statistics: true
log_payloads: true
- model:
name: gpt-4o
provider: openai
options:
max_tokens: 1024
temperature: 1.0
route_type: llm/v1/chat
weight: 75
description: Specialist in programming problems
auth:
header_name: Authorization
header_value: Bearer $OPENAI_API_KEY
logging:
log_statistics: true
log_payloads: true
- model:
name: gpt-4o-mini
provider: openai
options:
max_tokens: 1024
temperature: 1.0
route_type: llm/v1/chat
weight: 25
description: Specialist in programming problems
auth:
header_name: Authorization
header_value: Bearer $OPENAI_API_KEY
logging:
log_statistics: true
log_payloads: true
plugin: ai-proxy-advanced
" | kubectl apply -f -
Next, apply the KongPlugin resource by annotating the KongConsumer resource:
kubectl annotate -n kong kongconsumer CONSUMER_NAME konghq.com/plugins=ai-proxy-advanced
Prerequisite: Configure your Personal Access Token
terraform {
required_providers {
konnect = {
source = "kong/konnect"
}
}
}
provider "konnect" {
personal_access_token = "$KONNECT_TOKEN"
server_url = "https://us.api.konghq.com/"
}
Add the following to your Terraform configuration to create a Konnect Gateway Plugin:
resource "konnect_gateway_plugin_ai_proxy_advanced" "my_ai_proxy_advanced" {
enabled = true
config = {
balancer = {
algorithm = "semantic"
retries = 3
failover_criteria = ["error", "timeout", "http_429", "http_503", "non_idempotent"]
}
embeddings = {
auth = {
header_name = "Authorization"
header_value = "Bearer var.openai_api_key"
}
model = {
name = "text-embedding-3-small"
provider = "openai"
}
}
vectordb = {
strategy = "redis"
distance_metric = "cosine"
threshold = 0.7
dimensions = 1024
redis = {
host = "localhost"
port = 6379
}
}
targets = [
{
model = {
name = "gpt-4o"
provider = "openai"
options = {
max_tokens = 1024
temperature = 1.0
}
}
route_type = "llm/v1/chat"
weight = 2
description = "Specialist in real life topics"
auth = {
header_name = "Authorization"
header_value = "Bearer var.openai_api_key"
}
logging = {
log_statistics = true
log_payloads = true
}
},
{
model = {
name = "gpt-4o"
provider = "openai"
options = {
max_tokens = 1024
temperature = 1.0
}
}
route_type = "llm/v1/chat"
weight = 75
description = "Specialist in programming problems"
auth = {
header_name = "Authorization"
header_value = "Bearer var.openai_api_key"
}
logging = {
log_statistics = true
log_payloads = true
}
},
{
model = {
name = "gpt-4o-mini"
provider = "openai"
options = {
max_tokens = 1024
temperature = 1.0
}
}
route_type = "llm/v1/chat"
weight = 25
description = "Specialist in programming problems"
auth = {
header_name = "Authorization"
header_value = "Bearer var.openai_api_key"
}
logging = {
log_statistics = true
log_payloads = true
}
} ]
}
tags = []
control_plane_id = konnect_gateway_control_plane.my_konnect_cp.id
consumer = {
id = konnect_gateway_consumer.my_consumer.id
}
}
This example requires the following variables to be added to your manifest. You can specify values at runtime by setting TF_VAR_name=value.
variable "openai_api_key" {
type = string
}
Add this section to your kong.yaml configuration file:
_format_version: "3.0"
plugins:
- name: ai-proxy-advanced
consumer_group: consumerGroupName|Id
config:
balancer:
algorithm: semantic
retries: 3
failover_criteria:
- error
- timeout
- http_429
- http_503
- non_idempotent
embeddings:
auth:
header_name: Authorization
header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}
model:
name: text-embedding-3-small
provider: openai
vectordb:
strategy: redis
distance_metric: cosine
threshold: 0.7
dimensions: 1024
redis:
host: localhost
port: 6379
targets:
- model:
name: gpt-4o
provider: openai
options:
max_tokens: 1024
temperature: 1.0
route_type: llm/v1/chat
weight: 2
description: Specialist in real life topics
auth:
header_name: Authorization
header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}
logging:
log_statistics: true
log_payloads: true
- model:
name: gpt-4o
provider: openai
options:
max_tokens: 1024
temperature: 1.0
route_type: llm/v1/chat
weight: 75
description: Specialist in programming problems
auth:
header_name: Authorization
header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}
logging:
log_statistics: true
log_payloads: true
- model:
name: gpt-4o-mini
provider: openai
options:
max_tokens: 1024
temperature: 1.0
route_type: llm/v1/chat
weight: 25
description: Specialist in programming problems
auth:
header_name: Authorization
header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}
logging:
log_statistics: true
log_payloads: true
Make sure to replace the following placeholders with your own values:
-
consumerGroupName|Id: Theidornameof the consumer group the plugin configuration will target.
Make the following request:
curl -i -X POST http://localhost:8001/consumer_groups/{consumerGroupName|Id}/plugins/ \
--header "Accept: application/json" \
--header "Content-Type: application/json" \
--data '
{
"name": "ai-proxy-advanced",
"config": {
"balancer": {
"algorithm": "semantic",
"retries": 3,
"failover_criteria": [
"error",
"timeout",
"http_429",
"http_503",
"non_idempotent"
]
},
"embeddings": {
"auth": {
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"model": {
"name": "text-embedding-3-small",
"provider": "openai"
}
},
"vectordb": {
"strategy": "redis",
"distance_metric": "cosine",
"threshold": 0.7,
"dimensions": 1024,
"redis": {
"host": "localhost",
"port": 6379
}
},
"targets": [
{
"model": {
"name": "gpt-4o",
"provider": "openai",
"options": {
"max_tokens": 1024,
"temperature": 1.0
}
},
"route_type": "llm/v1/chat",
"weight": 2,
"description": "Specialist in real life topics",
"auth": {
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"logging": {
"log_statistics": true,
"log_payloads": true
}
},
{
"model": {
"name": "gpt-4o",
"provider": "openai",
"options": {
"max_tokens": 1024,
"temperature": 1.0
}
},
"route_type": "llm/v1/chat",
"weight": 75,
"description": "Specialist in programming problems",
"auth": {
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"logging": {
"log_statistics": true,
"log_payloads": true
}
},
{
"model": {
"name": "gpt-4o-mini",
"provider": "openai",
"options": {
"max_tokens": 1024,
"temperature": 1.0
}
},
"route_type": "llm/v1/chat",
"weight": 25,
"description": "Specialist in programming problems",
"auth": {
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"logging": {
"log_statistics": true,
"log_payloads": true
}
}
]
},
"tags": []
}
'
Make sure to replace the following placeholders with your own values:
-
consumerGroupName|Id: Theidornameof the consumer group the plugin configuration will target.
Make the following request:
curl -X POST https://{region}.api.konghq.com/v2/control-planes/{controlPlaneId}/core-entities/consumer_groups/{consumerGroupId}/plugins/ \
--header "accept: application/json" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer $KONNECT_TOKEN" \
--data '
{
"name": "ai-proxy-advanced",
"config": {
"balancer": {
"algorithm": "semantic",
"retries": 3,
"failover_criteria": [
"error",
"timeout",
"http_429",
"http_503",
"non_idempotent"
]
},
"embeddings": {
"auth": {
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"model": {
"name": "text-embedding-3-small",
"provider": "openai"
}
},
"vectordb": {
"strategy": "redis",
"distance_metric": "cosine",
"threshold": 0.7,
"dimensions": 1024,
"redis": {
"host": "localhost",
"port": 6379
}
},
"targets": [
{
"model": {
"name": "gpt-4o",
"provider": "openai",
"options": {
"max_tokens": 1024,
"temperature": 1.0
}
},
"route_type": "llm/v1/chat",
"weight": 2,
"description": "Specialist in real life topics",
"auth": {
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"logging": {
"log_statistics": true,
"log_payloads": true
}
},
{
"model": {
"name": "gpt-4o",
"provider": "openai",
"options": {
"max_tokens": 1024,
"temperature": 1.0
}
},
"route_type": "llm/v1/chat",
"weight": 75,
"description": "Specialist in programming problems",
"auth": {
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"logging": {
"log_statistics": true,
"log_payloads": true
}
},
{
"model": {
"name": "gpt-4o-mini",
"provider": "openai",
"options": {
"max_tokens": 1024,
"temperature": 1.0
}
},
"route_type": "llm/v1/chat",
"weight": 25,
"description": "Specialist in programming problems",
"auth": {
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"logging": {
"log_statistics": true,
"log_payloads": true
}
}
]
},
"tags": []
}
'
Make sure to replace the following placeholders with your own values:
-
region: Geographic region where your Kong Konnect is hosted and operates. -
KONNECT_TOKEN: Your Personal Access Token (PAT) associated with your Konnect account. -
controlPlaneId: Theidof the control plane. -
consumerGroupId: Theidof the consumer group the plugin configuration will target.
See the Konnect API reference to learn about region-specific URLs and personal access tokens.
echo "
apiVersion: configuration.konghq.com/v1
kind: KongPlugin
metadata:
name: ai-proxy-advanced
namespace: kong
annotations:
kubernetes.io/ingress.class: kong
konghq.com/tags: ''
config:
balancer:
algorithm: semantic
retries: 3
failover_criteria:
- error
- timeout
- http_429
- http_503
- non_idempotent
embeddings:
auth:
header_name: Authorization
header_value: Bearer $OPENAI_API_KEY
model:
name: text-embedding-3-small
provider: openai
vectordb:
strategy: redis
distance_metric: cosine
threshold: 0.7
dimensions: 1024
redis:
host: localhost
port: 6379
targets:
- model:
name: gpt-4o
provider: openai
options:
max_tokens: 1024
temperature: 1.0
route_type: llm/v1/chat
weight: 2
description: Specialist in real life topics
auth:
header_name: Authorization
header_value: Bearer $OPENAI_API_KEY
logging:
log_statistics: true
log_payloads: true
- model:
name: gpt-4o
provider: openai
options:
max_tokens: 1024
temperature: 1.0
route_type: llm/v1/chat
weight: 75
description: Specialist in programming problems
auth:
header_name: Authorization
header_value: Bearer $OPENAI_API_KEY
logging:
log_statistics: true
log_payloads: true
- model:
name: gpt-4o-mini
provider: openai
options:
max_tokens: 1024
temperature: 1.0
route_type: llm/v1/chat
weight: 25
description: Specialist in programming problems
auth:
header_name: Authorization
header_value: Bearer $OPENAI_API_KEY
logging:
log_statistics: true
log_payloads: true
plugin: ai-proxy-advanced
" | kubectl apply -f -
Next, apply the KongPlugin resource by annotating the KongConsumerGroup resource:
kubectl annotate -n kong kongconsumergroup CONSUMERGROUP_NAME konghq.com/plugins=ai-proxy-advanced
Prerequisite: Configure your Personal Access Token
terraform {
required_providers {
konnect = {
source = "kong/konnect"
}
}
}
provider "konnect" {
personal_access_token = "$KONNECT_TOKEN"
server_url = "https://us.api.konghq.com/"
}
Add the following to your Terraform configuration to create a Konnect Gateway Plugin:
resource "konnect_gateway_plugin_ai_proxy_advanced" "my_ai_proxy_advanced" {
enabled = true
config = {
balancer = {
algorithm = "semantic"
retries = 3
failover_criteria = ["error", "timeout", "http_429", "http_503", "non_idempotent"]
}
embeddings = {
auth = {
header_name = "Authorization"
header_value = "Bearer var.openai_api_key"
}
model = {
name = "text-embedding-3-small"
provider = "openai"
}
}
vectordb = {
strategy = "redis"
distance_metric = "cosine"
threshold = 0.7
dimensions = 1024
redis = {
host = "localhost"
port = 6379
}
}
targets = [
{
model = {
name = "gpt-4o"
provider = "openai"
options = {
max_tokens = 1024
temperature = 1.0
}
}
route_type = "llm/v1/chat"
weight = 2
description = "Specialist in real life topics"
auth = {
header_name = "Authorization"
header_value = "Bearer var.openai_api_key"
}
logging = {
log_statistics = true
log_payloads = true
}
},
{
model = {
name = "gpt-4o"
provider = "openai"
options = {
max_tokens = 1024
temperature = 1.0
}
}
route_type = "llm/v1/chat"
weight = 75
description = "Specialist in programming problems"
auth = {
header_name = "Authorization"
header_value = "Bearer var.openai_api_key"
}
logging = {
log_statistics = true
log_payloads = true
}
},
{
model = {
name = "gpt-4o-mini"
provider = "openai"
options = {
max_tokens = 1024
temperature = 1.0
}
}
route_type = "llm/v1/chat"
weight = 25
description = "Specialist in programming problems"
auth = {
header_name = "Authorization"
header_value = "Bearer var.openai_api_key"
}
logging = {
log_statistics = true
log_payloads = true
}
} ]
}
tags = []
control_plane_id = konnect_gateway_control_plane.my_konnect_cp.id
consumer_group = {
id = konnect_gateway_consumer_group.my_consumer_group.id
}
}
This example requires the following variables to be added to your manifest. You can specify values at runtime by setting TF_VAR_name=value.
variable "openai_api_key" {
type = string
}