Route OpenAI chat traffic using semantic balancing and Vault-stored keys

Uses: Kong Gateway AI Gateway deck

Deployment Platform:

Prerequisites

Series Prerequisites

This page is part of the Configure dynamic authentication to LLM providers series.

Complete the previous page, Configure dynamic authentication to LLM providers using HashiCorp vault before completing this page.

decK

decK is a CLI tool for managing Kong Gateway declaratively with state files. To complete this tutorial you will first need to install decK.

Required entities

For this tutorial, you’ll need Kong Gateway entities, like Gateway Services and Routes, pre-configured. These entities are essential for Kong Gateway to function but installing them isn’t the focus of this guide. Follow these steps to pre-configure them:

Run the following command:

echo '
_format_version: "3.0"
services:
  - name: example-service
    url: http://httpbin.konghq.com/anything
routes:
  - name: example-route
    paths:
    - "/anything"
    service:
      name: example-service
' | deck gateway apply -

To learn more about entities, you can read our entities documentation.

Redis stack

To complete this tutorial, make sure you have the following:

A Redis Stack running and accessible from the environment where Kong is deployed.
Port 6379, or your custom Redis port is open and reachable from Kong.
Redis host set as an environment variable so the plugin can connect:
```
export DECK_REDIS_HOST='YOUR-REDIS-HOST'
```

If you’re testing locally with Docker, use host.docker.internal as the host value.

Configure the plugin

We configure the AI Proxy Advanced plugin to route chat requests to different LLM providers based on semantic similarity, using secure API keys stored in HashiCorp Vault. Secrets for OpenAI and Mistral are referenced securely using the {vault://...} syntax. The plugin uses OpenAI’s text-embedding-3-small model to embed incoming requests and compares them against target descriptions in a Redis vector database. Based on this similarity, the semantic balancer chooses the best-matching target:

GPT-3.5 for programming queries.
GPT-4o for prompts related to mathematics.
Mistral tiny as the catchall fallback when no close semantic match is found.

echo '
_format_version: "3.0"
plugins:
  - name: ai-proxy-advanced
    config:
      embeddings:
        auth:
          header_name: Authorization
          header_value: "{vault://hashicorp-vault/openai/key}"
        model:
          provider: openai
          name: text-embedding-3-small
      vectordb:
        dimensions: 1536
        distance_metric: cosine
        strategy: redis
        threshold: 0.8
        redis:
          host: "${{ env "DECK_DECK_REDIS_HOST" }}"
          port: 6379
      balancer:
        algorithm: semantic
      targets:
      - route_type: llm/v1/chat
        auth:
          header_name: Authorization
          header_value: "{vault://hashicorp-vault/openai/key}"
        model:
          provider: openai
          name: gpt-3.5-turbo
          options:
            max_tokens: 826
            temperature: 0
            input_cost: 1.0
            output_cost: 2.0
        description: programming, coding, software development, Python, JavaScript,
          APIs, debugging
      - route_type: llm/v1/chat
        auth:
          header_name: Authorization
          header_value: "{vault://hashicorp-vault/openai/key}"
        model:
          provider: openai
          name: gpt-4o
          options:
            max_tokens: 512
            temperature: 0.3
            input_cost: 1.0
            output_cost: 2.0
        description: mathematics, algebra, calculus, trigonometry, equations, integrals,
          derivatives, theorems
      - route_type: llm/v1/chat
        auth:
          header_name: Authorization
          header_value: "{vault://hashicorp-vault/mistral/key}"
        model:
          provider: mistral
          name: mistral-tiny
          options:
            mistral_format: openai
            upstream_url: https://api.mistral.ai/v1/chat/completions
        description: CATCHALL
' | deck gateway apply -

Validate configuration

You can test the plugin’s semantic routing logic by sending prompts that align with the intent of each configured target. The AI Proxy Advanced uses dynamic authentication to inject the appropriate API key from HashiCorp Vault based on the selected model. Responses should include the correct "model" value, confirming that the request was both routed and authenticated as expected.

Programming questions

These prompts are routed to OpenAI GPT-3.5-Turbo, since it performs well on technical and programming-related tasks. The responses should include "model": "gpt-3.5-turbo".

 curl "$KONNECT_PROXY_URL/anything" \
     -H "Content-Type: application/json" \
     --json '{
       "messages": [
         {
           "role": "user",
           "content": "How can I build a REST API using Flask?"
         }
       ]
     }'

 curl "http://localhost:8000/anything" \
     -H "Content-Type: application/json" \
     --json '{
       "messages": [
         {
           "role": "user",
           "content": "How can I build a REST API using Flask?"
         }
       ]
     }'

You can also try a question regarding debugging JavaScript code:

 curl "$KONNECT_PROXY_URL/anything" \
     -H "Content-Type: application/json" \
     --json '{
       "messages": [
         {
           "role": "user",
           "content": "How can you effectively debug asynchronous code in JavaScript to identify where a Promise or callback might be failing?"
         }
       ]
     }'

 curl "http://localhost:8000/anything" \
     -H "Content-Type: application/json" \
     --json '{
       "messages": [
         {
           "role": "user",
           "content": "How can you effectively debug asynchronous code in JavaScript to identify where a Promise or callback might be failing?"
         }
       ]
     }'

Math questions

These prompts should match the OpenAI GPT-4o target, which is designated for mathematics topics like algebra and calculus. The responses should include "model": "gpt-4o".

 curl "$KONNECT_PROXY_URL/anything" \
     -H "Content-Type: application/json" \
     --json '{
       "messages": [
         {
           "role": "user",
           "content": "What is the derivative of sin(x)?"
         }
       ]
     }'

 curl "http://localhost:8000/anything" \
     -H "Content-Type: application/json" \
     --json '{
       "messages": [
         {
           "role": "user",
           "content": "What is the derivative of sin(x)?"
         }
       ]
     }'

You can also try asking a question related to theorems:

 curl "$KONNECT_PROXY_URL/anything" \
     -H "Content-Type: application/json" \
     --json '{
       "messages": [
         {
           "role": "user",
           "content": "Explain me Gödel`s incompleteness theorem."
         }
       ]
     }'

 curl "http://localhost:8000/anything" \
     -H "Content-Type: application/json" \
     --json '{
       "messages": [
         {
           "role": "user",
           "content": "Explain me Gödel`s incompleteness theorem."
         }
       ]
     }'

Test fallback questions

These general-purpose or unmatched prompts are routed to Mistral Tiny, acting as the fallback target. The responses should include "model": "mistral-tiny".

 curl "$KONNECT_PROXY_URL/anything" \
     -H "Content-Type: application/json" \
     --json '{
       "messages": [
         {
           "role": "user",
           "content": "What is Wulfila Bible?"
         }
       ]
     }'

 curl "http://localhost:8000/anything" \
     -H "Content-Type: application/json" \
     --json '{
       "messages": [
         {
           "role": "user",
           "content": "What is Wulfila Bible?"
         }
       ]
     }'

You can also try another general question:

 curl "$KONNECT_PROXY_URL/anything" \
     -H "Content-Type: application/json" \
     --json '{
       "messages": [
         {
           "role": "user",
           "content": "Who was Edward Gibbon and what he is famous for?"
         }
       ]
     }'

 curl "http://localhost:8000/anything" \
     -H "Content-Type: application/json" \
     --json '{
       "messages": [
         {
           "role": "user",
           "content": "Who was Edward Gibbon and what he is famous for?"
         }
       ]
     }'

← Previous 1. Configure dynamic authentication to LLM providers using HashiCorp vault