Route OpenAI chat traffic using semantic balancing and Vault-stored keys
Configure the AI Proxy Advanced plugin to resolve OpenAI API keys dynamically from HashiCorp Vault, then route chat traffic to the most relevant model using semantic balancing based on user input.
Prerequisites
Series Prerequisites
This page is part of the Configure dynamic authentication to LLM providers series.
Complete the previous page, Configure dynamic authentication to LLM providers using HashiCorp vault before completing this page.
decK
decK is a CLI tool for managing Kong Gateway declaratively with state files. To complete this tutorial you will first need to install decK.
Required entities
For this tutorial, you’ll need Kong Gateway entities, like Gateway Services and Routes, pre-configured. These entities are essential for Kong Gateway to function but installing them isn’t the focus of this guide. Follow these steps to pre-configure them:
-
Run the following command:
echo ' _format_version: "3.0" services: - name: example-service url: http://httpbin.konghq.com/anything routes: - name: example-route paths: - "/anything" service: name: example-service ' | deck gateway apply -
To learn more about entities, you can read our entities documentation.
Redis stack
To complete this tutorial, make sure you have the following:
- A Redis Stack running and accessible from the environment where Kong is deployed.
- Port
6379
, or your custom Redis port is open and reachable from Kong. -
Redis host set as an environment variable so the plugin can connect:
export DECK_REDIS_HOST='YOUR-REDIS-HOST'
If you’re testing locally with Docker, use
host.docker.internal
as the host value.
Configure the plugin
We configure the AI Proxy Advanced plugin to route chat requests to different LLM providers based on semantic similarity, using secure API keys stored in HashiCorp Vault. Secrets for OpenAI and Mistral are referenced securely using the {vault://...}
syntax. The plugin uses OpenAI’s text-embedding-3-small
model to embed incoming requests and compares them against target descriptions in a Redis vector database. Based on this similarity, the semantic balancer chooses the best-matching target:
- GPT-3.5 for programming queries.
- GPT-4o for prompts related to mathematics.
- Mistral tiny as the catchall fallback when no close semantic match is found.
echo '
_format_version: "3.0"
plugins:
- name: ai-proxy-advanced
config:
embeddings:
auth:
header_name: Authorization
header_value: "{vault://hashicorp-vault/openai/key}"
model:
provider: openai
name: text-embedding-3-small
vectordb:
dimensions: 1536
distance_metric: cosine
strategy: redis
threshold: 0.8
redis:
host: "${{ env "DECK_DECK_REDIS_HOST" }}"
port: 6379
balancer:
algorithm: semantic
targets:
- route_type: llm/v1/chat
auth:
header_name: Authorization
header_value: "{vault://hashicorp-vault/openai/key}"
model:
provider: openai
name: gpt-3.5-turbo
options:
max_tokens: 826
temperature: 0
input_cost: 1.0
output_cost: 2.0
description: programming, coding, software development, Python, JavaScript,
APIs, debugging
- route_type: llm/v1/chat
auth:
header_name: Authorization
header_value: "{vault://hashicorp-vault/openai/key}"
model:
provider: openai
name: gpt-4o
options:
max_tokens: 512
temperature: 0.3
input_cost: 1.0
output_cost: 2.0
description: mathematics, algebra, calculus, trigonometry, equations, integrals,
derivatives, theorems
- route_type: llm/v1/chat
auth:
header_name: Authorization
header_value: "{vault://hashicorp-vault/mistral/key}"
model:
provider: mistral
name: mistral-tiny
options:
mistral_format: openai
upstream_url: https://api.mistral.ai/v1/chat/completions
description: CATCHALL
' | deck gateway apply -
Validate configuration
You can test the plugin’s semantic routing logic by sending prompts that align with the intent of each configured target. The AI Proxy Advanced uses dynamic authentication to inject the appropriate API key from HashiCorp Vault based on the selected model. Responses should include the correct "model"
value, confirming that the request was both routed and authenticated as expected.
Programming questions
These prompts are routed to OpenAI GPT-3.5-Turbo, since it performs well on technical and programming-related tasks. The responses should include "model": "gpt-3.5-turbo"
.
curl "$KONNECT_PROXY_URL/anything" \
-H "Content-Type: application/json" \
--json '{
"messages": [
{
"role": "user",
"content": "How can I build a REST API using Flask?"
}
]
}'
curl "http://localhost:8000/anything" \
-H "Content-Type: application/json" \
--json '{
"messages": [
{
"role": "user",
"content": "How can I build a REST API using Flask?"
}
]
}'
You can also try a question regarding debugging JavaScript code:
curl "$KONNECT_PROXY_URL/anything" \
-H "Content-Type: application/json" \
--json '{
"messages": [
{
"role": "user",
"content": "How can you effectively debug asynchronous code in JavaScript to identify where a Promise or callback might be failing?"
}
]
}'
curl "http://localhost:8000/anything" \
-H "Content-Type: application/json" \
--json '{
"messages": [
{
"role": "user",
"content": "How can you effectively debug asynchronous code in JavaScript to identify where a Promise or callback might be failing?"
}
]
}'
Math questions
These prompts should match the OpenAI GPT-4o target, which is designated for mathematics topics like algebra and calculus. The responses should include "model": "gpt-4o"
.
curl "$KONNECT_PROXY_URL/anything" \
-H "Content-Type: application/json" \
--json '{
"messages": [
{
"role": "user",
"content": "What is the derivative of sin(x)?"
}
]
}'
curl "http://localhost:8000/anything" \
-H "Content-Type: application/json" \
--json '{
"messages": [
{
"role": "user",
"content": "What is the derivative of sin(x)?"
}
]
}'
You can also try asking a question related to theorems:
curl "$KONNECT_PROXY_URL/anything" \
-H "Content-Type: application/json" \
--json '{
"messages": [
{
"role": "user",
"content": "Explain me Gödel`s incompleteness theorem."
}
]
}'
curl "http://localhost:8000/anything" \
-H "Content-Type: application/json" \
--json '{
"messages": [
{
"role": "user",
"content": "Explain me Gödel`s incompleteness theorem."
}
]
}'
Test fallback questions
These general-purpose or unmatched prompts are routed to Mistral Tiny, acting as the fallback target. The responses should include "model": "mistral-tiny"
.
curl "$KONNECT_PROXY_URL/anything" \
-H "Content-Type: application/json" \
--json '{
"messages": [
{
"role": "user",
"content": "What is Wulfila Bible?"
}
]
}'
curl "http://localhost:8000/anything" \
-H "Content-Type: application/json" \
--json '{
"messages": [
{
"role": "user",
"content": "What is Wulfila Bible?"
}
]
}'
You can also try another general question:
curl "$KONNECT_PROXY_URL/anything" \
-H "Content-Type: application/json" \
--json '{
"messages": [
{
"role": "user",
"content": "Who was Edward Gibbon and what he is famous for?"
}
]
}'
curl "http://localhost:8000/anything" \
-H "Content-Type: application/json" \
--json '{
"messages": [
{
"role": "user",
"content": "Who was Edward Gibbon and what he is famous for?"
}
]
}'