AI LLM as Judge: Configure the AI LLM as Judge plugin - Plugin

Configure the AI LLM as Judge plugin

Evaluate responses by assigning a correctness score for AI-assisted learning and assessment.

Check this how-to guide to see how the plugin works in a real-life scenario.

Prerequisites

You have a working OpenAI API key
You have enabled the AI Proxy or AI Proxy Advanced plugin

Environment variables

OPENAI_API_KEY

Set up the plugin

Add this section to your kong.yaml configuration file:

kong.yaml

Copied!

_format_version: "3.0"
plugins:
  - name: ai-llm-as-judge
    config:
      prompt: |
        You are a strict evaluator. You will be given a request and a response.
        Your task is to judge whether the response is correct or incorrect. You must
        assign a score between 1 and 100, where: 100 represents a completely correct
        and ideal response, 1 represents a completely incorrect or irrelevant response.
        Your score must be a single number only — no text, labels, or explanations.
        Use the full range of values (e.g., 13, 47, 86), not just round numbers like
        10, 50, or 100. Be accurate and consistent, as this score will be used by another
        model for learning and evaluation.
      http_timeout: 60000
      https_verify: true
      ignore_assistant_prompts: true
      ignore_system_prompts: true
      ignore_tool_prompts: true
      sampling_rate: 1
      llm:
        auth:
          allow_override: false
          header_name: Authorization
          header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}
        logging:
          log_payloads: true
          log_statistics: true
        model:
          name: gpt-4o
          provider: openai
          options:
            temperature: 2
            max_tokens: 5
            top_p: 1
            cohere:
              embedding_input_type: classification
        route_type: llm/v1/chat
      message_countback: 3

Make the following request:

curl -i -X POST http://localhost:8001/plugins/ \
    --header "Accept: application/json" \
    --header "Content-Type: application/json" \
    --data '
    {
      "name": "ai-llm-as-judge",
      "config": {
        "prompt": "You are a strict evaluator. You will be given a request and a response.\nYour task is to judge whether the response is correct or incorrect. You must\nassign a score between 1 and 100, where: 100 represents a completely correct\nand ideal response, 1 represents a completely incorrect or irrelevant response.\nYour score must be a single number only — no text, labels, or explanations.\nUse the full range of values (e.g., 13, 47, 86), not just round numbers like\n10, 50, or 100. Be accurate and consistent, as this score will be used by another\nmodel for learning and evaluation.\n",
        "http_timeout": 60000,
        "https_verify": true,
        "ignore_assistant_prompts": true,
        "ignore_system_prompts": true,
        "ignore_tool_prompts": true,
        "sampling_rate": 1,
        "llm": {
          "auth": {
            "allow_override": false,
            "header_name": "Authorization",
            "header_value": "Bearer '$OPENAI_API_KEY'"
          },
          "logging": {
            "log_payloads": true,
            "log_statistics": true
          },
          "model": {
            "name": "gpt-4o",
            "provider": "openai",
            "options": {
              "temperature": 2,
              "max_tokens": 5,
              "top_p": 1,
              "cohere": {
                "embedding_input_type": "classification"
              }
            }
          },
          "route_type": "llm/v1/chat"
        },
        "message_countback": 3
      },
      "tags": []
    }
    '

Copied!

Make the following request:

curl -X POST https://{region}.api.konghq.com/v2/control-planes/{controlPlaneId}/core-entities/plugins/ \
    --header "accept: application/json" \
    --header "Content-Type: application/json" \
    --header "Authorization: Bearer $KONNECT_TOKEN" \
    --data '
    {
      "name": "ai-llm-as-judge",
      "config": {
        "prompt": "You are a strict evaluator. You will be given a request and a response.\nYour task is to judge whether the response is correct or incorrect. You must\nassign a score between 1 and 100, where: 100 represents a completely correct\nand ideal response, 1 represents a completely incorrect or irrelevant response.\nYour score must be a single number only — no text, labels, or explanations.\nUse the full range of values (e.g., 13, 47, 86), not just round numbers like\n10, 50, or 100. Be accurate and consistent, as this score will be used by another\nmodel for learning and evaluation.\n",
        "http_timeout": 60000,
        "https_verify": true,
        "ignore_assistant_prompts": true,
        "ignore_system_prompts": true,
        "ignore_tool_prompts": true,
        "sampling_rate": 1,
        "llm": {
          "auth": {
            "allow_override": false,
            "header_name": "Authorization",
            "header_value": "Bearer '$OPENAI_API_KEY'"
          },
          "logging": {
            "log_payloads": true,
            "log_statistics": true
          },
          "model": {
            "name": "gpt-4o",
            "provider": "openai",
            "options": {
              "temperature": 2,
              "max_tokens": 5,
              "top_p": 1,
              "cohere": {
                "embedding_input_type": "classification"
              }
            }
          },
          "route_type": "llm/v1/chat"
        },
        "message_countback": 3
      },
      "tags": []
    }
    '

Copied!

Make sure to replace the following placeholders with your own values:

region: Geographic region where your Kong Konnect is hosted and operates.
KONNECT_TOKEN: Your Personal Access Token (PAT) associated with your Konnect account.
controlPlaneId: The id of the control plane.

See the Konnect API reference to learn about region-specific URLs and personal access tokens.

echo "
apiVersion: configuration.konghq.com/v1
kind: KongClusterPlugin
metadata:
  name: ai-llm-as-judge
  namespace: kong
  annotations:
    kubernetes.io/ingress.class: kong
    konghq.com/tags: ''
  labels:
    global: 'true'
config:
  prompt: |
    You are a strict evaluator. You will be given a request and a response.
    Your task is to judge whether the response is correct or incorrect. You must
    assign a score between 1 and 100, where: 100 represents a completely correct
    and ideal response, 1 represents a completely incorrect or irrelevant response.
    Your score must be a single number only — no text, labels, or explanations.
    Use the full range of values (e.g., 13, 47, 86), not just round numbers like
    10, 50, or 100. Be accurate and consistent, as this score will be used by another
    model for learning and evaluation.
  http_timeout: 60000
  https_verify: true
  ignore_assistant_prompts: true
  ignore_system_prompts: true
  ignore_tool_prompts: true
  sampling_rate: 1
  llm:
    auth:
      allow_override: false
      header_name: Authorization
      header_value: Bearer $OPENAI_API_KEY
    logging:
      log_payloads: true
      log_statistics: true
    model:
      name: gpt-4o
      provider: openai
      options:
        temperature: 2
        max_tokens: 5
        top_p: 1
        cohere:
          embedding_input_type: classification
    route_type: llm/v1/chat
  message_countback: 3
plugin: ai-llm-as-judge
" | kubectl apply -f -

Copied!

Prerequisite: Configure your Personal Access Token

terraform {
  required_providers {
    konnect = {
      source  = "kong/konnect"
    }
  }
}

provider "konnect" {
  personal_access_token = "$KONNECT_TOKEN"
  server_url            = "https://us.api.konghq.com/"
}

Copied!

Add the following to your Terraform configuration to create a Konnect Gateway Plugin:

resource "konnect_gateway_plugin_ai_llm_as_judge" "my_ai_llm_as_judge" {
  enabled = true

  config = {
    prompt = <<EOF
You are a strict evaluator. You will be given a request and a response.
Your task is to judge whether the response is correct or incorrect. You must
assign a score between 1 and 100, where: 100 represents a completely correct
and ideal response, 1 represents a completely incorrect or irrelevant response.
Your score must be a single number only — no text, labels, or explanations.
Use the full range of values (e.g., 13, 47, 86), not just round numbers like
10, 50, or 100. Be accurate and consistent, as this score will be used by another
model for learning and evaluation.
EOF
    http_timeout = 60000
    https_verify = true
    ignore_assistant_prompts = true
    ignore_system_prompts = true
    ignore_tool_prompts = true
    sampling_rate = 1

    llm = {

      auth = {
        allow_override = false
        header_name = "Authorization"
        header_value = "Bearer var.openai_api_key"
      }

      logging = {
        log_payloads = true
        log_statistics = true
      }

      model = {
        name = "gpt-4o"
        provider = "openai"

        options = {
          temperature = 2
          max_tokens = 5
          top_p = 1

          cohere = {
            embedding_input_type = "classification"
          }
        }
      }
      route_type = "llm/v1/chat"
    }
    message_countback = 3
  }
  tags = []

  control_plane_id = konnect_gateway_control_plane.my_konnect_cp.id
}

Copied!

This example requires the following variables to be added to your manifest. You can specify values at runtime by setting TF_VAR_name=value.

variable "openai_api_key" {
  type = string
}

Copied!

Add this section to your kong.yaml configuration file:

kong.yaml

Copied!

_format_version: "3.0"
plugins:
  - name: ai-llm-as-judge
    service: serviceName|Id
    config:
      prompt: |
        You are a strict evaluator. You will be given a request and a response.
        Your task is to judge whether the response is correct or incorrect. You must
        assign a score between 1 and 100, where: 100 represents a completely correct
        and ideal response, 1 represents a completely incorrect or irrelevant response.
        Your score must be a single number only — no text, labels, or explanations.
        Use the full range of values (e.g., 13, 47, 86), not just round numbers like
        10, 50, or 100. Be accurate and consistent, as this score will be used by another
        model for learning and evaluation.
      http_timeout: 60000
      https_verify: true
      ignore_assistant_prompts: true
      ignore_system_prompts: true
      ignore_tool_prompts: true
      sampling_rate: 1
      llm:
        auth:
          allow_override: false
          header_name: Authorization
          header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}
        logging:
          log_payloads: true
          log_statistics: true
        model:
          name: gpt-4o
          provider: openai
          options:
            temperature: 2
            max_tokens: 5
            top_p: 1
            cohere:
              embedding_input_type: classification
        route_type: llm/v1/chat
      message_countback: 3

Make sure to replace the following placeholders with your own values:

serviceName|Id: The id or name of the service the plugin configuration will target.

Make the following request:

curl -i -X POST http://localhost:8001/services/{serviceName|Id}/plugins/ \
    --header "Accept: application/json" \
    --header "Content-Type: application/json" \
    --data '
    {
      "name": "ai-llm-as-judge",
      "config": {
        "prompt": "You are a strict evaluator. You will be given a request and a response.\nYour task is to judge whether the response is correct or incorrect. You must\nassign a score between 1 and 100, where: 100 represents a completely correct\nand ideal response, 1 represents a completely incorrect or irrelevant response.\nYour score must be a single number only — no text, labels, or explanations.\nUse the full range of values (e.g., 13, 47, 86), not just round numbers like\n10, 50, or 100. Be accurate and consistent, as this score will be used by another\nmodel for learning and evaluation.\n",
        "http_timeout": 60000,
        "https_verify": true,
        "ignore_assistant_prompts": true,
        "ignore_system_prompts": true,
        "ignore_tool_prompts": true,
        "sampling_rate": 1,
        "llm": {
          "auth": {
            "allow_override": false,
            "header_name": "Authorization",
            "header_value": "Bearer '$OPENAI_API_KEY'"
          },
          "logging": {
            "log_payloads": true,
            "log_statistics": true
          },
          "model": {
            "name": "gpt-4o",
            "provider": "openai",
            "options": {
              "temperature": 2,
              "max_tokens": 5,
              "top_p": 1,
              "cohere": {
                "embedding_input_type": "classification"
              }
            }
          },
          "route_type": "llm/v1/chat"
        },
        "message_countback": 3
      },
      "tags": []
    }
    '

Copied!

Make sure to replace the following placeholders with your own values:

serviceName|Id: The id or name of the service the plugin configuration will target.

Make the following request:

curl -X POST https://{region}.api.konghq.com/v2/control-planes/{controlPlaneId}/core-entities/services/{serviceId}/plugins/ \
    --header "accept: application/json" \
    --header "Content-Type: application/json" \
    --header "Authorization: Bearer $KONNECT_TOKEN" \
    --data '
    {
      "name": "ai-llm-as-judge",
      "config": {
        "prompt": "You are a strict evaluator. You will be given a request and a response.\nYour task is to judge whether the response is correct or incorrect. You must\nassign a score between 1 and 100, where: 100 represents a completely correct\nand ideal response, 1 represents a completely incorrect or irrelevant response.\nYour score must be a single number only — no text, labels, or explanations.\nUse the full range of values (e.g., 13, 47, 86), not just round numbers like\n10, 50, or 100. Be accurate and consistent, as this score will be used by another\nmodel for learning and evaluation.\n",
        "http_timeout": 60000,
        "https_verify": true,
        "ignore_assistant_prompts": true,
        "ignore_system_prompts": true,
        "ignore_tool_prompts": true,
        "sampling_rate": 1,
        "llm": {
          "auth": {
            "allow_override": false,
            "header_name": "Authorization",
            "header_value": "Bearer '$OPENAI_API_KEY'"
          },
          "logging": {
            "log_payloads": true,
            "log_statistics": true
          },
          "model": {
            "name": "gpt-4o",
            "provider": "openai",
            "options": {
              "temperature": 2,
              "max_tokens": 5,
              "top_p": 1,
              "cohere": {
                "embedding_input_type": "classification"
              }
            }
          },
          "route_type": "llm/v1/chat"
        },
        "message_countback": 3
      },
      "tags": []
    }
    '

Copied!

Make sure to replace the following placeholders with your own values:

region: Geographic region where your Kong Konnect is hosted and operates.
KONNECT_TOKEN: Your Personal Access Token (PAT) associated with your Konnect account.
controlPlaneId: The id of the control plane.
serviceId: The id of the service the plugin configuration will target.

See the Konnect API reference to learn about region-specific URLs and personal access tokens.

echo "
apiVersion: configuration.konghq.com/v1
kind: KongPlugin
metadata:
  name: ai-llm-as-judge
  namespace: kong
  annotations:
    kubernetes.io/ingress.class: kong
    konghq.com/tags: ''
config:
  prompt: |
    You are a strict evaluator. You will be given a request and a response.
    Your task is to judge whether the response is correct or incorrect. You must
    assign a score between 1 and 100, where: 100 represents a completely correct
    and ideal response, 1 represents a completely incorrect or irrelevant response.
    Your score must be a single number only — no text, labels, or explanations.
    Use the full range of values (e.g., 13, 47, 86), not just round numbers like
    10, 50, or 100. Be accurate and consistent, as this score will be used by another
    model for learning and evaluation.
  http_timeout: 60000
  https_verify: true
  ignore_assistant_prompts: true
  ignore_system_prompts: true
  ignore_tool_prompts: true
  sampling_rate: 1
  llm:
    auth:
      allow_override: false
      header_name: Authorization
      header_value: Bearer $OPENAI_API_KEY
    logging:
      log_payloads: true
      log_statistics: true
    model:
      name: gpt-4o
      provider: openai
      options:
        temperature: 2
        max_tokens: 5
        top_p: 1
        cohere:
          embedding_input_type: classification
    route_type: llm/v1/chat
  message_countback: 3
plugin: ai-llm-as-judge
" | kubectl apply -f -

Copied!

Next, apply the KongPlugin resource by annotating the service resource:

kubectl annotate -n kong service SERVICE_NAME konghq.com/plugins=ai-llm-as-judge

Copied!

Prerequisite: Configure your Personal Access Token

terraform {
  required_providers {
    konnect = {
      source  = "kong/konnect"
    }
  }
}

provider "konnect" {
  personal_access_token = "$KONNECT_TOKEN"
  server_url            = "https://us.api.konghq.com/"
}

Copied!

Add the following to your Terraform configuration to create a Konnect Gateway Plugin:

resource "konnect_gateway_plugin_ai_llm_as_judge" "my_ai_llm_as_judge" {
  enabled = true

  config = {
    prompt = <<EOF
You are a strict evaluator. You will be given a request and a response.
Your task is to judge whether the response is correct or incorrect. You must
assign a score between 1 and 100, where: 100 represents a completely correct
and ideal response, 1 represents a completely incorrect or irrelevant response.
Your score must be a single number only — no text, labels, or explanations.
Use the full range of values (e.g., 13, 47, 86), not just round numbers like
10, 50, or 100. Be accurate and consistent, as this score will be used by another
model for learning and evaluation.
EOF
    http_timeout = 60000
    https_verify = true
    ignore_assistant_prompts = true
    ignore_system_prompts = true
    ignore_tool_prompts = true
    sampling_rate = 1

    llm = {

      auth = {
        allow_override = false
        header_name = "Authorization"
        header_value = "Bearer var.openai_api_key"
      }

      logging = {
        log_payloads = true
        log_statistics = true
      }

      model = {
        name = "gpt-4o"
        provider = "openai"

        options = {
          temperature = 2
          max_tokens = 5
          top_p = 1

          cohere = {
            embedding_input_type = "classification"
          }
        }
      }
      route_type = "llm/v1/chat"
    }
    message_countback = 3
  }
  tags = []

  control_plane_id = konnect_gateway_control_plane.my_konnect_cp.id
  service = {
    id = konnect_gateway_service.my_service.id
  }
}

Copied!

This example requires the following variables to be added to your manifest. You can specify values at runtime by setting TF_VAR_name=value.

variable "openai_api_key" {
  type = string
}

Copied!

Add this section to your kong.yaml configuration file:

kong.yaml

Copied!

_format_version: "3.0"
plugins:
  - name: ai-llm-as-judge
    route: routeName|Id
    config:
      prompt: |
        You are a strict evaluator. You will be given a request and a response.
        Your task is to judge whether the response is correct or incorrect. You must
        assign a score between 1 and 100, where: 100 represents a completely correct
        and ideal response, 1 represents a completely incorrect or irrelevant response.
        Your score must be a single number only — no text, labels, or explanations.
        Use the full range of values (e.g., 13, 47, 86), not just round numbers like
        10, 50, or 100. Be accurate and consistent, as this score will be used by another
        model for learning and evaluation.
      http_timeout: 60000
      https_verify: true
      ignore_assistant_prompts: true
      ignore_system_prompts: true
      ignore_tool_prompts: true
      sampling_rate: 1
      llm:
        auth:
          allow_override: false
          header_name: Authorization
          header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}
        logging:
          log_payloads: true
          log_statistics: true
        model:
          name: gpt-4o
          provider: openai
          options:
            temperature: 2
            max_tokens: 5
            top_p: 1
            cohere:
              embedding_input_type: classification
        route_type: llm/v1/chat
      message_countback: 3

Make sure to replace the following placeholders with your own values:

routeName|Id: The id or name of the route the plugin configuration will target.

Make the following request:

curl -i -X POST http://localhost:8001/routes/{routeName|Id}/plugins/ \
    --header "Accept: application/json" \
    --header "Content-Type: application/json" \
    --data '
    {
      "name": "ai-llm-as-judge",
      "config": {
        "prompt": "You are a strict evaluator. You will be given a request and a response.\nYour task is to judge whether the response is correct or incorrect. You must\nassign a score between 1 and 100, where: 100 represents a completely correct\nand ideal response, 1 represents a completely incorrect or irrelevant response.\nYour score must be a single number only — no text, labels, or explanations.\nUse the full range of values (e.g., 13, 47, 86), not just round numbers like\n10, 50, or 100. Be accurate and consistent, as this score will be used by another\nmodel for learning and evaluation.\n",
        "http_timeout": 60000,
        "https_verify": true,
        "ignore_assistant_prompts": true,
        "ignore_system_prompts": true,
        "ignore_tool_prompts": true,
        "sampling_rate": 1,
        "llm": {
          "auth": {
            "allow_override": false,
            "header_name": "Authorization",
            "header_value": "Bearer '$OPENAI_API_KEY'"
          },
          "logging": {
            "log_payloads": true,
            "log_statistics": true
          },
          "model": {
            "name": "gpt-4o",
            "provider": "openai",
            "options": {
              "temperature": 2,
              "max_tokens": 5,
              "top_p": 1,
              "cohere": {
                "embedding_input_type": "classification"
              }
            }
          },
          "route_type": "llm/v1/chat"
        },
        "message_countback": 3
      },
      "tags": []
    }
    '

Copied!

Make sure to replace the following placeholders with your own values:

routeName|Id: The id or name of the route the plugin configuration will target.

Make the following request:

curl -X POST https://{region}.api.konghq.com/v2/control-planes/{controlPlaneId}/core-entities/routes/{routeId}/plugins/ \
    --header "accept: application/json" \
    --header "Content-Type: application/json" \
    --header "Authorization: Bearer $KONNECT_TOKEN" \
    --data '
    {
      "name": "ai-llm-as-judge",
      "config": {
        "prompt": "You are a strict evaluator. You will be given a request and a response.\nYour task is to judge whether the response is correct or incorrect. You must\nassign a score between 1 and 100, where: 100 represents a completely correct\nand ideal response, 1 represents a completely incorrect or irrelevant response.\nYour score must be a single number only — no text, labels, or explanations.\nUse the full range of values (e.g., 13, 47, 86), not just round numbers like\n10, 50, or 100. Be accurate and consistent, as this score will be used by another\nmodel for learning and evaluation.\n",
        "http_timeout": 60000,
        "https_verify": true,
        "ignore_assistant_prompts": true,
        "ignore_system_prompts": true,
        "ignore_tool_prompts": true,
        "sampling_rate": 1,
        "llm": {
          "auth": {
            "allow_override": false,
            "header_name": "Authorization",
            "header_value": "Bearer '$OPENAI_API_KEY'"
          },
          "logging": {
            "log_payloads": true,
            "log_statistics": true
          },
          "model": {
            "name": "gpt-4o",
            "provider": "openai",
            "options": {
              "temperature": 2,
              "max_tokens": 5,
              "top_p": 1,
              "cohere": {
                "embedding_input_type": "classification"
              }
            }
          },
          "route_type": "llm/v1/chat"
        },
        "message_countback": 3
      },
      "tags": []
    }
    '

Copied!

Make sure to replace the following placeholders with your own values:

region: Geographic region where your Kong Konnect is hosted and operates.
KONNECT_TOKEN: Your Personal Access Token (PAT) associated with your Konnect account.
controlPlaneId: The id of the control plane.
routeId: The id of the route the plugin configuration will target.

See the Konnect API reference to learn about region-specific URLs and personal access tokens.

echo "
apiVersion: configuration.konghq.com/v1
kind: KongPlugin
metadata:
  name: ai-llm-as-judge
  namespace: kong
  annotations:
    kubernetes.io/ingress.class: kong
    konghq.com/tags: ''
config:
  prompt: |
    You are a strict evaluator. You will be given a request and a response.
    Your task is to judge whether the response is correct or incorrect. You must
    assign a score between 1 and 100, where: 100 represents a completely correct
    and ideal response, 1 represents a completely incorrect or irrelevant response.
    Your score must be a single number only — no text, labels, or explanations.
    Use the full range of values (e.g., 13, 47, 86), not just round numbers like
    10, 50, or 100. Be accurate and consistent, as this score will be used by another
    model for learning and evaluation.
  http_timeout: 60000
  https_verify: true
  ignore_assistant_prompts: true
  ignore_system_prompts: true
  ignore_tool_prompts: true
  sampling_rate: 1
  llm:
    auth:
      allow_override: false
      header_name: Authorization
      header_value: Bearer $OPENAI_API_KEY
    logging:
      log_payloads: true
      log_statistics: true
    model:
      name: gpt-4o
      provider: openai
      options:
        temperature: 2
        max_tokens: 5
        top_p: 1
        cohere:
          embedding_input_type: classification
    route_type: llm/v1/chat
  message_countback: 3
plugin: ai-llm-as-judge
" | kubectl apply -f -

Copied!

Next, apply the KongPlugin resource by annotating the httproute or ingress resource:

kubectl annotate -n kong httproute  konghq.com/plugins=ai-llm-as-judge

Copied!

kubectl annotate -n kong ingress  konghq.com/plugins=ai-llm-as-judge

Copied!

Prerequisite: Configure your Personal Access Token

terraform {
  required_providers {
    konnect = {
      source  = "kong/konnect"
    }
  }
}

provider "konnect" {
  personal_access_token = "$KONNECT_TOKEN"
  server_url            = "https://us.api.konghq.com/"
}

Copied!

Add the following to your Terraform configuration to create a Konnect Gateway Plugin:

resource "konnect_gateway_plugin_ai_llm_as_judge" "my_ai_llm_as_judge" {
  enabled = true

  config = {
    prompt = <<EOF
You are a strict evaluator. You will be given a request and a response.
Your task is to judge whether the response is correct or incorrect. You must
assign a score between 1 and 100, where: 100 represents a completely correct
and ideal response, 1 represents a completely incorrect or irrelevant response.
Your score must be a single number only — no text, labels, or explanations.
Use the full range of values (e.g., 13, 47, 86), not just round numbers like
10, 50, or 100. Be accurate and consistent, as this score will be used by another
model for learning and evaluation.
EOF
    http_timeout = 60000
    https_verify = true
    ignore_assistant_prompts = true
    ignore_system_prompts = true
    ignore_tool_prompts = true
    sampling_rate = 1

    llm = {

      auth = {
        allow_override = false
        header_name = "Authorization"
        header_value = "Bearer var.openai_api_key"
      }

      logging = {
        log_payloads = true
        log_statistics = true
      }

      model = {
        name = "gpt-4o"
        provider = "openai"

        options = {
          temperature = 2
          max_tokens = 5
          top_p = 1

          cohere = {
            embedding_input_type = "classification"
          }
        }
      }
      route_type = "llm/v1/chat"
    }
    message_countback = 3
  }
  tags = []

  control_plane_id = konnect_gateway_control_plane.my_konnect_cp.id
  route = {
    id = konnect_gateway_route.my_route.id
  }
}

Copied!

This example requires the following variables to be added to your manifest. You can specify values at runtime by setting TF_VAR_name=value.

variable "openai_api_key" {
  type = string
}

Copied!

Add this section to your kong.yaml configuration file:

kong.yaml

Copied!

_format_version: "3.0"
plugins:
  - name: ai-llm-as-judge
    consumer: consumerName|Id
    config:
      prompt: |
        You are a strict evaluator. You will be given a request and a response.
        Your task is to judge whether the response is correct or incorrect. You must
        assign a score between 1 and 100, where: 100 represents a completely correct
        and ideal response, 1 represents a completely incorrect or irrelevant response.
        Your score must be a single number only — no text, labels, or explanations.
        Use the full range of values (e.g., 13, 47, 86), not just round numbers like
        10, 50, or 100. Be accurate and consistent, as this score will be used by another
        model for learning and evaluation.
      http_timeout: 60000
      https_verify: true
      ignore_assistant_prompts: true
      ignore_system_prompts: true
      ignore_tool_prompts: true
      sampling_rate: 1
      llm:
        auth:
          allow_override: false
          header_name: Authorization
          header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}
        logging:
          log_payloads: true
          log_statistics: true
        model:
          name: gpt-4o
          provider: openai
          options:
            temperature: 2
            max_tokens: 5
            top_p: 1
            cohere:
              embedding_input_type: classification
        route_type: llm/v1/chat
      message_countback: 3

Make sure to replace the following placeholders with your own values:

consumerName|Id: The id or name of the consumer the plugin configuration will target.

Make the following request:

curl -i -X POST http://localhost:8001/consumers/{consumerName|Id}/plugins/ \
    --header "Accept: application/json" \
    --header "Content-Type: application/json" \
    --data '
    {
      "name": "ai-llm-as-judge",
      "config": {
        "prompt": "You are a strict evaluator. You will be given a request and a response.\nYour task is to judge whether the response is correct or incorrect. You must\nassign a score between 1 and 100, where: 100 represents a completely correct\nand ideal response, 1 represents a completely incorrect or irrelevant response.\nYour score must be a single number only — no text, labels, or explanations.\nUse the full range of values (e.g., 13, 47, 86), not just round numbers like\n10, 50, or 100. Be accurate and consistent, as this score will be used by another\nmodel for learning and evaluation.\n",
        "http_timeout": 60000,
        "https_verify": true,
        "ignore_assistant_prompts": true,
        "ignore_system_prompts": true,
        "ignore_tool_prompts": true,
        "sampling_rate": 1,
        "llm": {
          "auth": {
            "allow_override": false,
            "header_name": "Authorization",
            "header_value": "Bearer '$OPENAI_API_KEY'"
          },
          "logging": {
            "log_payloads": true,
            "log_statistics": true
          },
          "model": {
            "name": "gpt-4o",
            "provider": "openai",
            "options": {
              "temperature": 2,
              "max_tokens": 5,
              "top_p": 1,
              "cohere": {
                "embedding_input_type": "classification"
              }
            }
          },
          "route_type": "llm/v1/chat"
        },
        "message_countback": 3
      },
      "tags": []
    }
    '

Copied!

Make sure to replace the following placeholders with your own values:

consumerName|Id: The id or name of the consumer the plugin configuration will target.

Make the following request:

curl -X POST https://{region}.api.konghq.com/v2/control-planes/{controlPlaneId}/core-entities/consumers/{consumerId}/plugins/ \
    --header "accept: application/json" \
    --header "Content-Type: application/json" \
    --header "Authorization: Bearer $KONNECT_TOKEN" \
    --data '
    {
      "name": "ai-llm-as-judge",
      "config": {
        "prompt": "You are a strict evaluator. You will be given a request and a response.\nYour task is to judge whether the response is correct or incorrect. You must\nassign a score between 1 and 100, where: 100 represents a completely correct\nand ideal response, 1 represents a completely incorrect or irrelevant response.\nYour score must be a single number only — no text, labels, or explanations.\nUse the full range of values (e.g., 13, 47, 86), not just round numbers like\n10, 50, or 100. Be accurate and consistent, as this score will be used by another\nmodel for learning and evaluation.\n",
        "http_timeout": 60000,
        "https_verify": true,
        "ignore_assistant_prompts": true,
        "ignore_system_prompts": true,
        "ignore_tool_prompts": true,
        "sampling_rate": 1,
        "llm": {
          "auth": {
            "allow_override": false,
            "header_name": "Authorization",
            "header_value": "Bearer '$OPENAI_API_KEY'"
          },
          "logging": {
            "log_payloads": true,
            "log_statistics": true
          },
          "model": {
            "name": "gpt-4o",
            "provider": "openai",
            "options": {
              "temperature": 2,
              "max_tokens": 5,
              "top_p": 1,
              "cohere": {
                "embedding_input_type": "classification"
              }
            }
          },
          "route_type": "llm/v1/chat"
        },
        "message_countback": 3
      },
      "tags": []
    }
    '

Copied!

Make sure to replace the following placeholders with your own values:

region: Geographic region where your Kong Konnect is hosted and operates.
KONNECT_TOKEN: Your Personal Access Token (PAT) associated with your Konnect account.
controlPlaneId: The id of the control plane.
consumerId: The id of the consumer the plugin configuration will target.

See the Konnect API reference to learn about region-specific URLs and personal access tokens.

echo "
apiVersion: configuration.konghq.com/v1
kind: KongPlugin
metadata:
  name: ai-llm-as-judge
  namespace: kong
  annotations:
    kubernetes.io/ingress.class: kong
    konghq.com/tags: ''
config:
  prompt: |
    You are a strict evaluator. You will be given a request and a response.
    Your task is to judge whether the response is correct or incorrect. You must
    assign a score between 1 and 100, where: 100 represents a completely correct
    and ideal response, 1 represents a completely incorrect or irrelevant response.
    Your score must be a single number only — no text, labels, or explanations.
    Use the full range of values (e.g., 13, 47, 86), not just round numbers like
    10, 50, or 100. Be accurate and consistent, as this score will be used by another
    model for learning and evaluation.
  http_timeout: 60000
  https_verify: true
  ignore_assistant_prompts: true
  ignore_system_prompts: true
  ignore_tool_prompts: true
  sampling_rate: 1
  llm:
    auth:
      allow_override: false
      header_name: Authorization
      header_value: Bearer $OPENAI_API_KEY
    logging:
      log_payloads: true
      log_statistics: true
    model:
      name: gpt-4o
      provider: openai
      options:
        temperature: 2
        max_tokens: 5
        top_p: 1
        cohere:
          embedding_input_type: classification
    route_type: llm/v1/chat
  message_countback: 3
plugin: ai-llm-as-judge
" | kubectl apply -f -

Copied!

Next, apply the KongPlugin resource by annotating the KongConsumer resource:

kubectl annotate -n kong kongconsumer CONSUMER_NAME konghq.com/plugins=ai-llm-as-judge

Copied!

Prerequisite: Configure your Personal Access Token

terraform {
  required_providers {
    konnect = {
      source  = "kong/konnect"
    }
  }
}

provider "konnect" {
  personal_access_token = "$KONNECT_TOKEN"
  server_url            = "https://us.api.konghq.com/"
}

Copied!

Add the following to your Terraform configuration to create a Konnect Gateway Plugin:

resource "konnect_gateway_plugin_ai_llm_as_judge" "my_ai_llm_as_judge" {
  enabled = true

  config = {
    prompt = <<EOF
You are a strict evaluator. You will be given a request and a response.
Your task is to judge whether the response is correct or incorrect. You must
assign a score between 1 and 100, where: 100 represents a completely correct
and ideal response, 1 represents a completely incorrect or irrelevant response.
Your score must be a single number only — no text, labels, or explanations.
Use the full range of values (e.g., 13, 47, 86), not just round numbers like
10, 50, or 100. Be accurate and consistent, as this score will be used by another
model for learning and evaluation.
EOF
    http_timeout = 60000
    https_verify = true
    ignore_assistant_prompts = true
    ignore_system_prompts = true
    ignore_tool_prompts = true
    sampling_rate = 1

    llm = {

      auth = {
        allow_override = false
        header_name = "Authorization"
        header_value = "Bearer var.openai_api_key"
      }

      logging = {
        log_payloads = true
        log_statistics = true
      }

      model = {
        name = "gpt-4o"
        provider = "openai"

        options = {
          temperature = 2
          max_tokens = 5
          top_p = 1

          cohere = {
            embedding_input_type = "classification"
          }
        }
      }
      route_type = "llm/v1/chat"
    }
    message_countback = 3
  }
  tags = []

  control_plane_id = konnect_gateway_control_plane.my_konnect_cp.id
  consumer = {
    id = konnect_gateway_consumer.my_consumer.id
  }
}

Copied!

This example requires the following variables to be added to your manifest. You can specify values at runtime by setting TF_VAR_name=value.

variable "openai_api_key" {
  type = string
}

Copied!

Add this section to your kong.yaml configuration file:

kong.yaml

Copied!

_format_version: "3.0"
plugins:
  - name: ai-llm-as-judge
    consumer_group: consumerGroupName|Id
    config:
      prompt: |
        You are a strict evaluator. You will be given a request and a response.
        Your task is to judge whether the response is correct or incorrect. You must
        assign a score between 1 and 100, where: 100 represents a completely correct
        and ideal response, 1 represents a completely incorrect or irrelevant response.
        Your score must be a single number only — no text, labels, or explanations.
        Use the full range of values (e.g., 13, 47, 86), not just round numbers like
        10, 50, or 100. Be accurate and consistent, as this score will be used by another
        model for learning and evaluation.
      http_timeout: 60000
      https_verify: true
      ignore_assistant_prompts: true
      ignore_system_prompts: true
      ignore_tool_prompts: true
      sampling_rate: 1
      llm:
        auth:
          allow_override: false
          header_name: Authorization
          header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}
        logging:
          log_payloads: true
          log_statistics: true
        model:
          name: gpt-4o
          provider: openai
          options:
            temperature: 2
            max_tokens: 5
            top_p: 1
            cohere:
              embedding_input_type: classification
        route_type: llm/v1/chat
      message_countback: 3

Make sure to replace the following placeholders with your own values:

consumerGroupName|Id: The id or name of the consumer group the plugin configuration will target.

Make the following request:

curl -i -X POST http://localhost:8001/consumer_groups/{consumerGroupName|Id}/plugins/ \
    --header "Accept: application/json" \
    --header "Content-Type: application/json" \
    --data '
    {
      "name": "ai-llm-as-judge",
      "config": {
        "prompt": "You are a strict evaluator. You will be given a request and a response.\nYour task is to judge whether the response is correct or incorrect. You must\nassign a score between 1 and 100, where: 100 represents a completely correct\nand ideal response, 1 represents a completely incorrect or irrelevant response.\nYour score must be a single number only — no text, labels, or explanations.\nUse the full range of values (e.g., 13, 47, 86), not just round numbers like\n10, 50, or 100. Be accurate and consistent, as this score will be used by another\nmodel for learning and evaluation.\n",
        "http_timeout": 60000,
        "https_verify": true,
        "ignore_assistant_prompts": true,
        "ignore_system_prompts": true,
        "ignore_tool_prompts": true,
        "sampling_rate": 1,
        "llm": {
          "auth": {
            "allow_override": false,
            "header_name": "Authorization",
            "header_value": "Bearer '$OPENAI_API_KEY'"
          },
          "logging": {
            "log_payloads": true,
            "log_statistics": true
          },
          "model": {
            "name": "gpt-4o",
            "provider": "openai",
            "options": {
              "temperature": 2,
              "max_tokens": 5,
              "top_p": 1,
              "cohere": {
                "embedding_input_type": "classification"
              }
            }
          },
          "route_type": "llm/v1/chat"
        },
        "message_countback": 3
      },
      "tags": []
    }
    '

Copied!

Make sure to replace the following placeholders with your own values:

consumerGroupName|Id: The id or name of the consumer group the plugin configuration will target.

Make the following request:

curl -X POST https://{region}.api.konghq.com/v2/control-planes/{controlPlaneId}/core-entities/consumer_groups/{consumerGroupId}/plugins/ \
    --header "accept: application/json" \
    --header "Content-Type: application/json" \
    --header "Authorization: Bearer $KONNECT_TOKEN" \
    --data '
    {
      "name": "ai-llm-as-judge",
      "config": {
        "prompt": "You are a strict evaluator. You will be given a request and a response.\nYour task is to judge whether the response is correct or incorrect. You must\nassign a score between 1 and 100, where: 100 represents a completely correct\nand ideal response, 1 represents a completely incorrect or irrelevant response.\nYour score must be a single number only — no text, labels, or explanations.\nUse the full range of values (e.g., 13, 47, 86), not just round numbers like\n10, 50, or 100. Be accurate and consistent, as this score will be used by another\nmodel for learning and evaluation.\n",
        "http_timeout": 60000,
        "https_verify": true,
        "ignore_assistant_prompts": true,
        "ignore_system_prompts": true,
        "ignore_tool_prompts": true,
        "sampling_rate": 1,
        "llm": {
          "auth": {
            "allow_override": false,
            "header_name": "Authorization",
            "header_value": "Bearer '$OPENAI_API_KEY'"
          },
          "logging": {
            "log_payloads": true,
            "log_statistics": true
          },
          "model": {
            "name": "gpt-4o",
            "provider": "openai",
            "options": {
              "temperature": 2,
              "max_tokens": 5,
              "top_p": 1,
              "cohere": {
                "embedding_input_type": "classification"
              }
            }
          },
          "route_type": "llm/v1/chat"
        },
        "message_countback": 3
      },
      "tags": []
    }
    '

Copied!

Make sure to replace the following placeholders with your own values:

region: Geographic region where your Kong Konnect is hosted and operates.
KONNECT_TOKEN: Your Personal Access Token (PAT) associated with your Konnect account.
controlPlaneId: The id of the control plane.
consumerGroupId: The id of the consumer group the plugin configuration will target.

See the Konnect API reference to learn about region-specific URLs and personal access tokens.

echo "
apiVersion: configuration.konghq.com/v1
kind: KongPlugin
metadata:
  name: ai-llm-as-judge
  namespace: kong
  annotations:
    kubernetes.io/ingress.class: kong
    konghq.com/tags: ''
config:
  prompt: |
    You are a strict evaluator. You will be given a request and a response.
    Your task is to judge whether the response is correct or incorrect. You must
    assign a score between 1 and 100, where: 100 represents a completely correct
    and ideal response, 1 represents a completely incorrect or irrelevant response.
    Your score must be a single number only — no text, labels, or explanations.
    Use the full range of values (e.g., 13, 47, 86), not just round numbers like
    10, 50, or 100. Be accurate and consistent, as this score will be used by another
    model for learning and evaluation.
  http_timeout: 60000
  https_verify: true
  ignore_assistant_prompts: true
  ignore_system_prompts: true
  ignore_tool_prompts: true
  sampling_rate: 1
  llm:
    auth:
      allow_override: false
      header_name: Authorization
      header_value: Bearer $OPENAI_API_KEY
    logging:
      log_payloads: true
      log_statistics: true
    model:
      name: gpt-4o
      provider: openai
      options:
        temperature: 2
        max_tokens: 5
        top_p: 1
        cohere:
          embedding_input_type: classification
    route_type: llm/v1/chat
  message_countback: 3
plugin: ai-llm-as-judge
" | kubectl apply -f -

Copied!

Next, apply the KongPlugin resource by annotating the KongConsumerGroup resource:

kubectl annotate -n kong kongconsumergroup CONSUMERGROUP_NAME konghq.com/plugins=ai-llm-as-judge

Copied!

Prerequisite: Configure your Personal Access Token

terraform {
  required_providers {
    konnect = {
      source  = "kong/konnect"
    }
  }
}

provider "konnect" {
  personal_access_token = "$KONNECT_TOKEN"
  server_url            = "https://us.api.konghq.com/"
}

Copied!

Add the following to your Terraform configuration to create a Konnect Gateway Plugin:

resource "konnect_gateway_plugin_ai_llm_as_judge" "my_ai_llm_as_judge" {
  enabled = true

  config = {
    prompt = <<EOF
You are a strict evaluator. You will be given a request and a response.
Your task is to judge whether the response is correct or incorrect. You must
assign a score between 1 and 100, where: 100 represents a completely correct
and ideal response, 1 represents a completely incorrect or irrelevant response.
Your score must be a single number only — no text, labels, or explanations.
Use the full range of values (e.g., 13, 47, 86), not just round numbers like
10, 50, or 100. Be accurate and consistent, as this score will be used by another
model for learning and evaluation.
EOF
    http_timeout = 60000
    https_verify = true
    ignore_assistant_prompts = true
    ignore_system_prompts = true
    ignore_tool_prompts = true
    sampling_rate = 1

    llm = {

      auth = {
        allow_override = false
        header_name = "Authorization"
        header_value = "Bearer var.openai_api_key"
      }

      logging = {
        log_payloads = true
        log_statistics = true
      }

      model = {
        name = "gpt-4o"
        provider = "openai"

        options = {
          temperature = 2
          max_tokens = 5
          top_p = 1

          cohere = {
            embedding_input_type = "classification"
          }
        }
      }
      route_type = "llm/v1/chat"
    }
    message_countback = 3
  }
  tags = []

  control_plane_id = konnect_gateway_control_plane.my_konnect_cp.id
  consumer_group = {
    id = konnect_gateway_consumer_group.my_consumer_group.id
  }
}

Copied!

This example requires the following variables to be added to your manifest. You can specify values at runtime by setting TF_VAR_name=value.

variable "openai_api_key" {
  type = string
}

Copied!

AI LLM as Judge

Configure the AI LLM as Judge plugin

Prerequisites

Environment variables

Set up the plugin

Help us make these docs great!

Still need help