Configure the AI LLM as Judge pluginv3.12+
Evaluate responses by assigning a correctness score for AI-assisted learning and assessment.
Check this how-to guide to see how the plugin works in a real-life scenario.
Prerequisites
-
You have a working OpenAI API key
-
You have enabled the AI Proxy or AI Proxy Advanced plugin
Environment variables
OPENAI_API_KEY
Add this section to your kong.yaml
configuration file:
_format_version: "3.0"
plugins:
- name: ai-llm-as-judge
config:
prompt: |
You are a strict evaluator. You will be given a request and a response.
Your task is to judge whether the response is correct or incorrect. You must
assign a score between 1 and 100, where: 100 represents a completely correct
and ideal response, 1 represents a completely incorrect or irrelevant response.
Your score must be a single number only — no text, labels, or explanations.
Use the full range of values (e.g., 13, 47, 86), not just round numbers like
10, 50, or 100. Be accurate and consistent, as this score will be used by another
model for learning and evaluation.
http_timeout: 60000
https_verify: true
ignore_assistant_prompts: true
ignore_system_prompts: true
ignore_tool_prompts: true
sampling_rate: 1
llm:
auth:
allow_override: false
header_name: Authorization
header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}
logging:
log_payloads: true
log_statistics: true
model:
name: gpt-4o
provider: openai
options:
temperature: 2
max_tokens: 5
top_p: 1
cohere:
embedding_input_type: classification
route_type: llm/v1/chat
message_countback: 3
Make the following request:
curl -i -X POST http://localhost:8001/plugins/ \
--header "Accept: application/json" \
--header "Content-Type: application/json" \
--data '
{
"name": "ai-llm-as-judge",
"config": {
"prompt": "You are a strict evaluator. You will be given a request and a response.\nYour task is to judge whether the response is correct or incorrect. You must\nassign a score between 1 and 100, where: 100 represents a completely correct\nand ideal response, 1 represents a completely incorrect or irrelevant response.\nYour score must be a single number only — no text, labels, or explanations.\nUse the full range of values (e.g., 13, 47, 86), not just round numbers like\n10, 50, or 100. Be accurate and consistent, as this score will be used by another\nmodel for learning and evaluation.\n",
"http_timeout": 60000,
"https_verify": true,
"ignore_assistant_prompts": true,
"ignore_system_prompts": true,
"ignore_tool_prompts": true,
"sampling_rate": 1,
"llm": {
"auth": {
"allow_override": false,
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"logging": {
"log_payloads": true,
"log_statistics": true
},
"model": {
"name": "gpt-4o",
"provider": "openai",
"options": {
"temperature": 2,
"max_tokens": 5,
"top_p": 1,
"cohere": {
"embedding_input_type": "classification"
}
}
},
"route_type": "llm/v1/chat"
},
"message_countback": 3
}
}
'
Make the following request:
curl -X POST https://{region}.api.konghq.com/v2/control-planes/{controlPlaneId}/core-entities/plugins/ \
--header "accept: application/json" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer $KONNECT_TOKEN" \
--data '
{
"name": "ai-llm-as-judge",
"config": {
"prompt": "You are a strict evaluator. You will be given a request and a response.\nYour task is to judge whether the response is correct or incorrect. You must\nassign a score between 1 and 100, where: 100 represents a completely correct\nand ideal response, 1 represents a completely incorrect or irrelevant response.\nYour score must be a single number only — no text, labels, or explanations.\nUse the full range of values (e.g., 13, 47, 86), not just round numbers like\n10, 50, or 100. Be accurate and consistent, as this score will be used by another\nmodel for learning and evaluation.\n",
"http_timeout": 60000,
"https_verify": true,
"ignore_assistant_prompts": true,
"ignore_system_prompts": true,
"ignore_tool_prompts": true,
"sampling_rate": 1,
"llm": {
"auth": {
"allow_override": false,
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"logging": {
"log_payloads": true,
"log_statistics": true
},
"model": {
"name": "gpt-4o",
"provider": "openai",
"options": {
"temperature": 2,
"max_tokens": 5,
"top_p": 1,
"cohere": {
"embedding_input_type": "classification"
}
}
},
"route_type": "llm/v1/chat"
},
"message_countback": 3
}
}
'
Make sure to replace the following placeholders with your own values:
-
region
: Geographic region where your Kong Konnect is hosted and operates. -
controlPlaneId
: Theid
of the control plane. -
KONNECT_TOKEN
: Your Personal Access Token (PAT) associated with your Konnect account.
See the Konnect API reference to learn about region-specific URLs and personal access tokens.
echo "
apiVersion: configuration.konghq.com/v1
kind: KongClusterPlugin
metadata:
name: ai-llm-as-judge
namespace: kong
annotations:
kubernetes.io/ingress.class: kong
labels:
global: 'true'
config:
prompt: |
You are a strict evaluator. You will be given a request and a response.
Your task is to judge whether the response is correct or incorrect. You must
assign a score between 1 and 100, where: 100 represents a completely correct
and ideal response, 1 represents a completely incorrect or irrelevant response.
Your score must be a single number only — no text, labels, or explanations.
Use the full range of values (e.g., 13, 47, 86), not just round numbers like
10, 50, or 100. Be accurate and consistent, as this score will be used by another
model for learning and evaluation.
http_timeout: 60000
https_verify: true
ignore_assistant_prompts: true
ignore_system_prompts: true
ignore_tool_prompts: true
sampling_rate: 1
llm:
auth:
allow_override: false
header_name: Authorization
header_value: Bearer $OPENAI_API_KEY
logging:
log_payloads: true
log_statistics: true
model:
name: gpt-4o
provider: openai
options:
temperature: 2
max_tokens: 5
top_p: 1
cohere:
embedding_input_type: classification
route_type: llm/v1/chat
message_countback: 3
plugin: ai-llm-as-judge
" | kubectl apply -f -
Prerequisite: Configure your Personal Access Token
terraform {
required_providers {
konnect = {
source = "kong/konnect"
}
}
}
provider "konnect" {
personal_access_token = "$KONNECT_TOKEN"
server_url = "https://us.api.konghq.com/"
}
Add the following to your Terraform configuration to create a Konnect Gateway Plugin:
resource "konnect_gateway_plugin_ai_llm_as_judge" "my_ai_llm_as_judge" {
enabled = true
config = {
prompt = <<EOF
You are a strict evaluator. You will be given a request and a response.
Your task is to judge whether the response is correct or incorrect. You must
assign a score between 1 and 100, where: 100 represents a completely correct
and ideal response, 1 represents a completely incorrect or irrelevant response.
Your score must be a single number only — no text, labels, or explanations.
Use the full range of values (e.g., 13, 47, 86), not just round numbers like
10, 50, or 100. Be accurate and consistent, as this score will be used by another
model for learning and evaluation.
EOF
http_timeout = 60000
https_verify = true
ignore_assistant_prompts = true
ignore_system_prompts = true
ignore_tool_prompts = true
sampling_rate = 1
llm = {
auth = {
allow_override = false
header_name = "Authorization"
header_value = "Bearer var.openai_api_key"
}
logging = {
log_payloads = true
log_statistics = true
}
model = {
name = "gpt-4o"
provider = "openai"
options = {
temperature = 2
max_tokens = 5
top_p = 1
cohere = {
embedding_input_type = "classification"
}
}
}
route_type = "llm/v1/chat"
}
message_countback = 3
}
control_plane_id = konnect_gateway_control_plane.my_konnect_cp.id
}
This example requires the following variables to be added to your manifest. You can specify values at runtime by setting TF_VAR_name=value
.
variable "openai_api_key" {
type = string
}
Add this section to your kong.yaml
configuration file:
_format_version: "3.0"
plugins:
- name: ai-llm-as-judge
service: serviceName|Id
config:
prompt: |
You are a strict evaluator. You will be given a request and a response.
Your task is to judge whether the response is correct or incorrect. You must
assign a score between 1 and 100, where: 100 represents a completely correct
and ideal response, 1 represents a completely incorrect or irrelevant response.
Your score must be a single number only — no text, labels, or explanations.
Use the full range of values (e.g., 13, 47, 86), not just round numbers like
10, 50, or 100. Be accurate and consistent, as this score will be used by another
model for learning and evaluation.
http_timeout: 60000
https_verify: true
ignore_assistant_prompts: true
ignore_system_prompts: true
ignore_tool_prompts: true
sampling_rate: 1
llm:
auth:
allow_override: false
header_name: Authorization
header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}
logging:
log_payloads: true
log_statistics: true
model:
name: gpt-4o
provider: openai
options:
temperature: 2
max_tokens: 5
top_p: 1
cohere:
embedding_input_type: classification
route_type: llm/v1/chat
message_countback: 3
Make sure to replace the following placeholders with your own values:
-
serviceName|Id
: Theid
orname
of the service the plugin configuration will target.
Make the following request:
curl -i -X POST http://localhost:8001/services/{serviceName|Id}/plugins/ \
--header "Accept: application/json" \
--header "Content-Type: application/json" \
--data '
{
"name": "ai-llm-as-judge",
"config": {
"prompt": "You are a strict evaluator. You will be given a request and a response.\nYour task is to judge whether the response is correct or incorrect. You must\nassign a score between 1 and 100, where: 100 represents a completely correct\nand ideal response, 1 represents a completely incorrect or irrelevant response.\nYour score must be a single number only — no text, labels, or explanations.\nUse the full range of values (e.g., 13, 47, 86), not just round numbers like\n10, 50, or 100. Be accurate and consistent, as this score will be used by another\nmodel for learning and evaluation.\n",
"http_timeout": 60000,
"https_verify": true,
"ignore_assistant_prompts": true,
"ignore_system_prompts": true,
"ignore_tool_prompts": true,
"sampling_rate": 1,
"llm": {
"auth": {
"allow_override": false,
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"logging": {
"log_payloads": true,
"log_statistics": true
},
"model": {
"name": "gpt-4o",
"provider": "openai",
"options": {
"temperature": 2,
"max_tokens": 5,
"top_p": 1,
"cohere": {
"embedding_input_type": "classification"
}
}
},
"route_type": "llm/v1/chat"
},
"message_countback": 3
}
}
'
Make sure to replace the following placeholders with your own values:
-
serviceName|Id
: Theid
orname
of the service the plugin configuration will target.
Make the following request:
curl -X POST https://{region}.api.konghq.com/v2/control-planes/{controlPlaneId}/core-entities/services/{serviceId}/plugins/ \
--header "accept: application/json" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer $KONNECT_TOKEN" \
--data '
{
"name": "ai-llm-as-judge",
"config": {
"prompt": "You are a strict evaluator. You will be given a request and a response.\nYour task is to judge whether the response is correct or incorrect. You must\nassign a score between 1 and 100, where: 100 represents a completely correct\nand ideal response, 1 represents a completely incorrect or irrelevant response.\nYour score must be a single number only — no text, labels, or explanations.\nUse the full range of values (e.g., 13, 47, 86), not just round numbers like\n10, 50, or 100. Be accurate and consistent, as this score will be used by another\nmodel for learning and evaluation.\n",
"http_timeout": 60000,
"https_verify": true,
"ignore_assistant_prompts": true,
"ignore_system_prompts": true,
"ignore_tool_prompts": true,
"sampling_rate": 1,
"llm": {
"auth": {
"allow_override": false,
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"logging": {
"log_payloads": true,
"log_statistics": true
},
"model": {
"name": "gpt-4o",
"provider": "openai",
"options": {
"temperature": 2,
"max_tokens": 5,
"top_p": 1,
"cohere": {
"embedding_input_type": "classification"
}
}
},
"route_type": "llm/v1/chat"
},
"message_countback": 3
}
}
'
Make sure to replace the following placeholders with your own values:
-
region
: Geographic region where your Kong Konnect is hosted and operates. -
controlPlaneId
: Theid
of the control plane. -
KONNECT_TOKEN
: Your Personal Access Token (PAT) associated with your Konnect account. -
serviceId
: Theid
of the service the plugin configuration will target.
See the Konnect API reference to learn about region-specific URLs and personal access tokens.
echo "
apiVersion: configuration.konghq.com/v1
kind: KongPlugin
metadata:
name: ai-llm-as-judge
namespace: kong
annotations:
kubernetes.io/ingress.class: kong
config:
prompt: |
You are a strict evaluator. You will be given a request and a response.
Your task is to judge whether the response is correct or incorrect. You must
assign a score between 1 and 100, where: 100 represents a completely correct
and ideal response, 1 represents a completely incorrect or irrelevant response.
Your score must be a single number only — no text, labels, or explanations.
Use the full range of values (e.g., 13, 47, 86), not just round numbers like
10, 50, or 100. Be accurate and consistent, as this score will be used by another
model for learning and evaluation.
http_timeout: 60000
https_verify: true
ignore_assistant_prompts: true
ignore_system_prompts: true
ignore_tool_prompts: true
sampling_rate: 1
llm:
auth:
allow_override: false
header_name: Authorization
header_value: Bearer $OPENAI_API_KEY
logging:
log_payloads: true
log_statistics: true
model:
name: gpt-4o
provider: openai
options:
temperature: 2
max_tokens: 5
top_p: 1
cohere:
embedding_input_type: classification
route_type: llm/v1/chat
message_countback: 3
plugin: ai-llm-as-judge
" | kubectl apply -f -
Next, apply the KongPlugin
resource by annotating the service
resource:
kubectl annotate -n kong service SERVICE_NAME konghq.com/plugins=ai-llm-as-judge
Prerequisite: Configure your Personal Access Token
terraform {
required_providers {
konnect = {
source = "kong/konnect"
}
}
}
provider "konnect" {
personal_access_token = "$KONNECT_TOKEN"
server_url = "https://us.api.konghq.com/"
}
Add the following to your Terraform configuration to create a Konnect Gateway Plugin:
resource "konnect_gateway_plugin_ai_llm_as_judge" "my_ai_llm_as_judge" {
enabled = true
config = {
prompt = <<EOF
You are a strict evaluator. You will be given a request and a response.
Your task is to judge whether the response is correct or incorrect. You must
assign a score between 1 and 100, where: 100 represents a completely correct
and ideal response, 1 represents a completely incorrect or irrelevant response.
Your score must be a single number only — no text, labels, or explanations.
Use the full range of values (e.g., 13, 47, 86), not just round numbers like
10, 50, or 100. Be accurate and consistent, as this score will be used by another
model for learning and evaluation.
EOF
http_timeout = 60000
https_verify = true
ignore_assistant_prompts = true
ignore_system_prompts = true
ignore_tool_prompts = true
sampling_rate = 1
llm = {
auth = {
allow_override = false
header_name = "Authorization"
header_value = "Bearer var.openai_api_key"
}
logging = {
log_payloads = true
log_statistics = true
}
model = {
name = "gpt-4o"
provider = "openai"
options = {
temperature = 2
max_tokens = 5
top_p = 1
cohere = {
embedding_input_type = "classification"
}
}
}
route_type = "llm/v1/chat"
}
message_countback = 3
}
control_plane_id = konnect_gateway_control_plane.my_konnect_cp.id
service = {
id = konnect_gateway_service.my_service.id
}
}
This example requires the following variables to be added to your manifest. You can specify values at runtime by setting TF_VAR_name=value
.
variable "openai_api_key" {
type = string
}
Add this section to your kong.yaml
configuration file:
_format_version: "3.0"
plugins:
- name: ai-llm-as-judge
route: routeName|Id
config:
prompt: |
You are a strict evaluator. You will be given a request and a response.
Your task is to judge whether the response is correct or incorrect. You must
assign a score between 1 and 100, where: 100 represents a completely correct
and ideal response, 1 represents a completely incorrect or irrelevant response.
Your score must be a single number only — no text, labels, or explanations.
Use the full range of values (e.g., 13, 47, 86), not just round numbers like
10, 50, or 100. Be accurate and consistent, as this score will be used by another
model for learning and evaluation.
http_timeout: 60000
https_verify: true
ignore_assistant_prompts: true
ignore_system_prompts: true
ignore_tool_prompts: true
sampling_rate: 1
llm:
auth:
allow_override: false
header_name: Authorization
header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}
logging:
log_payloads: true
log_statistics: true
model:
name: gpt-4o
provider: openai
options:
temperature: 2
max_tokens: 5
top_p: 1
cohere:
embedding_input_type: classification
route_type: llm/v1/chat
message_countback: 3
Make sure to replace the following placeholders with your own values:
-
routeName|Id
: Theid
orname
of the route the plugin configuration will target.
Make the following request:
curl -i -X POST http://localhost:8001/routes/{routeName|Id}/plugins/ \
--header "Accept: application/json" \
--header "Content-Type: application/json" \
--data '
{
"name": "ai-llm-as-judge",
"config": {
"prompt": "You are a strict evaluator. You will be given a request and a response.\nYour task is to judge whether the response is correct or incorrect. You must\nassign a score between 1 and 100, where: 100 represents a completely correct\nand ideal response, 1 represents a completely incorrect or irrelevant response.\nYour score must be a single number only — no text, labels, or explanations.\nUse the full range of values (e.g., 13, 47, 86), not just round numbers like\n10, 50, or 100. Be accurate and consistent, as this score will be used by another\nmodel for learning and evaluation.\n",
"http_timeout": 60000,
"https_verify": true,
"ignore_assistant_prompts": true,
"ignore_system_prompts": true,
"ignore_tool_prompts": true,
"sampling_rate": 1,
"llm": {
"auth": {
"allow_override": false,
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"logging": {
"log_payloads": true,
"log_statistics": true
},
"model": {
"name": "gpt-4o",
"provider": "openai",
"options": {
"temperature": 2,
"max_tokens": 5,
"top_p": 1,
"cohere": {
"embedding_input_type": "classification"
}
}
},
"route_type": "llm/v1/chat"
},
"message_countback": 3
}
}
'
Make sure to replace the following placeholders with your own values:
-
routeName|Id
: Theid
orname
of the route the plugin configuration will target.
Make the following request:
curl -X POST https://{region}.api.konghq.com/v2/control-planes/{controlPlaneId}/core-entities/routes/{routeId}/plugins/ \
--header "accept: application/json" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer $KONNECT_TOKEN" \
--data '
{
"name": "ai-llm-as-judge",
"config": {
"prompt": "You are a strict evaluator. You will be given a request and a response.\nYour task is to judge whether the response is correct or incorrect. You must\nassign a score between 1 and 100, where: 100 represents a completely correct\nand ideal response, 1 represents a completely incorrect or irrelevant response.\nYour score must be a single number only — no text, labels, or explanations.\nUse the full range of values (e.g., 13, 47, 86), not just round numbers like\n10, 50, or 100. Be accurate and consistent, as this score will be used by another\nmodel for learning and evaluation.\n",
"http_timeout": 60000,
"https_verify": true,
"ignore_assistant_prompts": true,
"ignore_system_prompts": true,
"ignore_tool_prompts": true,
"sampling_rate": 1,
"llm": {
"auth": {
"allow_override": false,
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"logging": {
"log_payloads": true,
"log_statistics": true
},
"model": {
"name": "gpt-4o",
"provider": "openai",
"options": {
"temperature": 2,
"max_tokens": 5,
"top_p": 1,
"cohere": {
"embedding_input_type": "classification"
}
}
},
"route_type": "llm/v1/chat"
},
"message_countback": 3
}
}
'
Make sure to replace the following placeholders with your own values:
-
region
: Geographic region where your Kong Konnect is hosted and operates. -
controlPlaneId
: Theid
of the control plane. -
KONNECT_TOKEN
: Your Personal Access Token (PAT) associated with your Konnect account. -
routeId
: Theid
of the route the plugin configuration will target.
See the Konnect API reference to learn about region-specific URLs and personal access tokens.
echo "
apiVersion: configuration.konghq.com/v1
kind: KongPlugin
metadata:
name: ai-llm-as-judge
namespace: kong
annotations:
kubernetes.io/ingress.class: kong
config:
prompt: |
You are a strict evaluator. You will be given a request and a response.
Your task is to judge whether the response is correct or incorrect. You must
assign a score between 1 and 100, where: 100 represents a completely correct
and ideal response, 1 represents a completely incorrect or irrelevant response.
Your score must be a single number only — no text, labels, or explanations.
Use the full range of values (e.g., 13, 47, 86), not just round numbers like
10, 50, or 100. Be accurate and consistent, as this score will be used by another
model for learning and evaluation.
http_timeout: 60000
https_verify: true
ignore_assistant_prompts: true
ignore_system_prompts: true
ignore_tool_prompts: true
sampling_rate: 1
llm:
auth:
allow_override: false
header_name: Authorization
header_value: Bearer $OPENAI_API_KEY
logging:
log_payloads: true
log_statistics: true
model:
name: gpt-4o
provider: openai
options:
temperature: 2
max_tokens: 5
top_p: 1
cohere:
embedding_input_type: classification
route_type: llm/v1/chat
message_countback: 3
plugin: ai-llm-as-judge
" | kubectl apply -f -
Next, apply the KongPlugin
resource by annotating the httproute
or ingress
resource:
kubectl annotate -n kong httproute konghq.com/plugins=ai-llm-as-judge
kubectl annotate -n kong ingress konghq.com/plugins=ai-llm-as-judge
Prerequisite: Configure your Personal Access Token
terraform {
required_providers {
konnect = {
source = "kong/konnect"
}
}
}
provider "konnect" {
personal_access_token = "$KONNECT_TOKEN"
server_url = "https://us.api.konghq.com/"
}
Add the following to your Terraform configuration to create a Konnect Gateway Plugin:
resource "konnect_gateway_plugin_ai_llm_as_judge" "my_ai_llm_as_judge" {
enabled = true
config = {
prompt = <<EOF
You are a strict evaluator. You will be given a request and a response.
Your task is to judge whether the response is correct or incorrect. You must
assign a score between 1 and 100, where: 100 represents a completely correct
and ideal response, 1 represents a completely incorrect or irrelevant response.
Your score must be a single number only — no text, labels, or explanations.
Use the full range of values (e.g., 13, 47, 86), not just round numbers like
10, 50, or 100. Be accurate and consistent, as this score will be used by another
model for learning and evaluation.
EOF
http_timeout = 60000
https_verify = true
ignore_assistant_prompts = true
ignore_system_prompts = true
ignore_tool_prompts = true
sampling_rate = 1
llm = {
auth = {
allow_override = false
header_name = "Authorization"
header_value = "Bearer var.openai_api_key"
}
logging = {
log_payloads = true
log_statistics = true
}
model = {
name = "gpt-4o"
provider = "openai"
options = {
temperature = 2
max_tokens = 5
top_p = 1
cohere = {
embedding_input_type = "classification"
}
}
}
route_type = "llm/v1/chat"
}
message_countback = 3
}
control_plane_id = konnect_gateway_control_plane.my_konnect_cp.id
route = {
id = konnect_gateway_route.my_route.id
}
}
This example requires the following variables to be added to your manifest. You can specify values at runtime by setting TF_VAR_name=value
.
variable "openai_api_key" {
type = string
}
Add this section to your kong.yaml
configuration file:
_format_version: "3.0"
plugins:
- name: ai-llm-as-judge
consumer: consumerName|Id
config:
prompt: |
You are a strict evaluator. You will be given a request and a response.
Your task is to judge whether the response is correct or incorrect. You must
assign a score between 1 and 100, where: 100 represents a completely correct
and ideal response, 1 represents a completely incorrect or irrelevant response.
Your score must be a single number only — no text, labels, or explanations.
Use the full range of values (e.g., 13, 47, 86), not just round numbers like
10, 50, or 100. Be accurate and consistent, as this score will be used by another
model for learning and evaluation.
http_timeout: 60000
https_verify: true
ignore_assistant_prompts: true
ignore_system_prompts: true
ignore_tool_prompts: true
sampling_rate: 1
llm:
auth:
allow_override: false
header_name: Authorization
header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}
logging:
log_payloads: true
log_statistics: true
model:
name: gpt-4o
provider: openai
options:
temperature: 2
max_tokens: 5
top_p: 1
cohere:
embedding_input_type: classification
route_type: llm/v1/chat
message_countback: 3
Make sure to replace the following placeholders with your own values:
-
consumerName|Id
: Theid
orname
of the consumer the plugin configuration will target.
Make the following request:
curl -i -X POST http://localhost:8001/consumers/{consumerName|Id}/plugins/ \
--header "Accept: application/json" \
--header "Content-Type: application/json" \
--data '
{
"name": "ai-llm-as-judge",
"config": {
"prompt": "You are a strict evaluator. You will be given a request and a response.\nYour task is to judge whether the response is correct or incorrect. You must\nassign a score between 1 and 100, where: 100 represents a completely correct\nand ideal response, 1 represents a completely incorrect or irrelevant response.\nYour score must be a single number only — no text, labels, or explanations.\nUse the full range of values (e.g., 13, 47, 86), not just round numbers like\n10, 50, or 100. Be accurate and consistent, as this score will be used by another\nmodel for learning and evaluation.\n",
"http_timeout": 60000,
"https_verify": true,
"ignore_assistant_prompts": true,
"ignore_system_prompts": true,
"ignore_tool_prompts": true,
"sampling_rate": 1,
"llm": {
"auth": {
"allow_override": false,
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"logging": {
"log_payloads": true,
"log_statistics": true
},
"model": {
"name": "gpt-4o",
"provider": "openai",
"options": {
"temperature": 2,
"max_tokens": 5,
"top_p": 1,
"cohere": {
"embedding_input_type": "classification"
}
}
},
"route_type": "llm/v1/chat"
},
"message_countback": 3
}
}
'
Make sure to replace the following placeholders with your own values:
-
consumerName|Id
: Theid
orname
of the consumer the plugin configuration will target.
Make the following request:
curl -X POST https://{region}.api.konghq.com/v2/control-planes/{controlPlaneId}/core-entities/consumers/{consumerId}/plugins/ \
--header "accept: application/json" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer $KONNECT_TOKEN" \
--data '
{
"name": "ai-llm-as-judge",
"config": {
"prompt": "You are a strict evaluator. You will be given a request and a response.\nYour task is to judge whether the response is correct or incorrect. You must\nassign a score between 1 and 100, where: 100 represents a completely correct\nand ideal response, 1 represents a completely incorrect or irrelevant response.\nYour score must be a single number only — no text, labels, or explanations.\nUse the full range of values (e.g., 13, 47, 86), not just round numbers like\n10, 50, or 100. Be accurate and consistent, as this score will be used by another\nmodel for learning and evaluation.\n",
"http_timeout": 60000,
"https_verify": true,
"ignore_assistant_prompts": true,
"ignore_system_prompts": true,
"ignore_tool_prompts": true,
"sampling_rate": 1,
"llm": {
"auth": {
"allow_override": false,
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"logging": {
"log_payloads": true,
"log_statistics": true
},
"model": {
"name": "gpt-4o",
"provider": "openai",
"options": {
"temperature": 2,
"max_tokens": 5,
"top_p": 1,
"cohere": {
"embedding_input_type": "classification"
}
}
},
"route_type": "llm/v1/chat"
},
"message_countback": 3
}
}
'
Make sure to replace the following placeholders with your own values:
-
region
: Geographic region where your Kong Konnect is hosted and operates. -
controlPlaneId
: Theid
of the control plane. -
KONNECT_TOKEN
: Your Personal Access Token (PAT) associated with your Konnect account. -
consumerId
: Theid
of the consumer the plugin configuration will target.
See the Konnect API reference to learn about region-specific URLs and personal access tokens.
echo "
apiVersion: configuration.konghq.com/v1
kind: KongPlugin
metadata:
name: ai-llm-as-judge
namespace: kong
annotations:
kubernetes.io/ingress.class: kong
config:
prompt: |
You are a strict evaluator. You will be given a request and a response.
Your task is to judge whether the response is correct or incorrect. You must
assign a score between 1 and 100, where: 100 represents a completely correct
and ideal response, 1 represents a completely incorrect or irrelevant response.
Your score must be a single number only — no text, labels, or explanations.
Use the full range of values (e.g., 13, 47, 86), not just round numbers like
10, 50, or 100. Be accurate and consistent, as this score will be used by another
model for learning and evaluation.
http_timeout: 60000
https_verify: true
ignore_assistant_prompts: true
ignore_system_prompts: true
ignore_tool_prompts: true
sampling_rate: 1
llm:
auth:
allow_override: false
header_name: Authorization
header_value: Bearer $OPENAI_API_KEY
logging:
log_payloads: true
log_statistics: true
model:
name: gpt-4o
provider: openai
options:
temperature: 2
max_tokens: 5
top_p: 1
cohere:
embedding_input_type: classification
route_type: llm/v1/chat
message_countback: 3
plugin: ai-llm-as-judge
" | kubectl apply -f -
Next, apply the KongPlugin
resource by annotating the KongConsumer
resource:
kubectl annotate -n kong CONSUMER_NAME konghq.com/plugins=ai-llm-as-judge
Prerequisite: Configure your Personal Access Token
terraform {
required_providers {
konnect = {
source = "kong/konnect"
}
}
}
provider "konnect" {
personal_access_token = "$KONNECT_TOKEN"
server_url = "https://us.api.konghq.com/"
}
Add the following to your Terraform configuration to create a Konnect Gateway Plugin:
resource "konnect_gateway_plugin_ai_llm_as_judge" "my_ai_llm_as_judge" {
enabled = true
config = {
prompt = <<EOF
You are a strict evaluator. You will be given a request and a response.
Your task is to judge whether the response is correct or incorrect. You must
assign a score between 1 and 100, where: 100 represents a completely correct
and ideal response, 1 represents a completely incorrect or irrelevant response.
Your score must be a single number only — no text, labels, or explanations.
Use the full range of values (e.g., 13, 47, 86), not just round numbers like
10, 50, or 100. Be accurate and consistent, as this score will be used by another
model for learning and evaluation.
EOF
http_timeout = 60000
https_verify = true
ignore_assistant_prompts = true
ignore_system_prompts = true
ignore_tool_prompts = true
sampling_rate = 1
llm = {
auth = {
allow_override = false
header_name = "Authorization"
header_value = "Bearer var.openai_api_key"
}
logging = {
log_payloads = true
log_statistics = true
}
model = {
name = "gpt-4o"
provider = "openai"
options = {
temperature = 2
max_tokens = 5
top_p = 1
cohere = {
embedding_input_type = "classification"
}
}
}
route_type = "llm/v1/chat"
}
message_countback = 3
}
control_plane_id = konnect_gateway_control_plane.my_konnect_cp.id
consumer = {
id = konnect_gateway_consumer.my_consumer.id
}
}
This example requires the following variables to be added to your manifest. You can specify values at runtime by setting TF_VAR_name=value
.
variable "openai_api_key" {
type = string
}
Add this section to your kong.yaml
configuration file:
_format_version: "3.0"
plugins:
- name: ai-llm-as-judge
consumer_group: consumerGroupName|Id
config:
prompt: |
You are a strict evaluator. You will be given a request and a response.
Your task is to judge whether the response is correct or incorrect. You must
assign a score between 1 and 100, where: 100 represents a completely correct
and ideal response, 1 represents a completely incorrect or irrelevant response.
Your score must be a single number only — no text, labels, or explanations.
Use the full range of values (e.g., 13, 47, 86), not just round numbers like
10, 50, or 100. Be accurate and consistent, as this score will be used by another
model for learning and evaluation.
http_timeout: 60000
https_verify: true
ignore_assistant_prompts: true
ignore_system_prompts: true
ignore_tool_prompts: true
sampling_rate: 1
llm:
auth:
allow_override: false
header_name: Authorization
header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}
logging:
log_payloads: true
log_statistics: true
model:
name: gpt-4o
provider: openai
options:
temperature: 2
max_tokens: 5
top_p: 1
cohere:
embedding_input_type: classification
route_type: llm/v1/chat
message_countback: 3
Make sure to replace the following placeholders with your own values:
-
consumerGroupName|Id
: Theid
orname
of the consumer group the plugin configuration will target.
Make the following request:
curl -i -X POST http://localhost:8001/consumer_groups/{consumerGroupName|Id}/plugins/ \
--header "Accept: application/json" \
--header "Content-Type: application/json" \
--data '
{
"name": "ai-llm-as-judge",
"config": {
"prompt": "You are a strict evaluator. You will be given a request and a response.\nYour task is to judge whether the response is correct or incorrect. You must\nassign a score between 1 and 100, where: 100 represents a completely correct\nand ideal response, 1 represents a completely incorrect or irrelevant response.\nYour score must be a single number only — no text, labels, or explanations.\nUse the full range of values (e.g., 13, 47, 86), not just round numbers like\n10, 50, or 100. Be accurate and consistent, as this score will be used by another\nmodel for learning and evaluation.\n",
"http_timeout": 60000,
"https_verify": true,
"ignore_assistant_prompts": true,
"ignore_system_prompts": true,
"ignore_tool_prompts": true,
"sampling_rate": 1,
"llm": {
"auth": {
"allow_override": false,
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"logging": {
"log_payloads": true,
"log_statistics": true
},
"model": {
"name": "gpt-4o",
"provider": "openai",
"options": {
"temperature": 2,
"max_tokens": 5,
"top_p": 1,
"cohere": {
"embedding_input_type": "classification"
}
}
},
"route_type": "llm/v1/chat"
},
"message_countback": 3
}
}
'
Make sure to replace the following placeholders with your own values:
-
consumerGroupName|Id
: Theid
orname
of the consumer group the plugin configuration will target.
Make the following request:
curl -X POST https://{region}.api.konghq.com/v2/control-planes/{controlPlaneId}/core-entities/consumer_groups/{consumerGroupId}/plugins/ \
--header "accept: application/json" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer $KONNECT_TOKEN" \
--data '
{
"name": "ai-llm-as-judge",
"config": {
"prompt": "You are a strict evaluator. You will be given a request and a response.\nYour task is to judge whether the response is correct or incorrect. You must\nassign a score between 1 and 100, where: 100 represents a completely correct\nand ideal response, 1 represents a completely incorrect or irrelevant response.\nYour score must be a single number only — no text, labels, or explanations.\nUse the full range of values (e.g., 13, 47, 86), not just round numbers like\n10, 50, or 100. Be accurate and consistent, as this score will be used by another\nmodel for learning and evaluation.\n",
"http_timeout": 60000,
"https_verify": true,
"ignore_assistant_prompts": true,
"ignore_system_prompts": true,
"ignore_tool_prompts": true,
"sampling_rate": 1,
"llm": {
"auth": {
"allow_override": false,
"header_name": "Authorization",
"header_value": "Bearer '$OPENAI_API_KEY'"
},
"logging": {
"log_payloads": true,
"log_statistics": true
},
"model": {
"name": "gpt-4o",
"provider": "openai",
"options": {
"temperature": 2,
"max_tokens": 5,
"top_p": 1,
"cohere": {
"embedding_input_type": "classification"
}
}
},
"route_type": "llm/v1/chat"
},
"message_countback": 3
}
}
'
Make sure to replace the following placeholders with your own values:
-
region
: Geographic region where your Kong Konnect is hosted and operates. -
controlPlaneId
: Theid
of the control plane. -
KONNECT_TOKEN
: Your Personal Access Token (PAT) associated with your Konnect account. -
consumerGroupId
: Theid
of the consumer group the plugin configuration will target.
See the Konnect API reference to learn about region-specific URLs and personal access tokens.
echo "
apiVersion: configuration.konghq.com/v1
kind: KongPlugin
metadata:
name: ai-llm-as-judge
namespace: kong
annotations:
kubernetes.io/ingress.class: kong
config:
prompt: |
You are a strict evaluator. You will be given a request and a response.
Your task is to judge whether the response is correct or incorrect. You must
assign a score between 1 and 100, where: 100 represents a completely correct
and ideal response, 1 represents a completely incorrect or irrelevant response.
Your score must be a single number only — no text, labels, or explanations.
Use the full range of values (e.g., 13, 47, 86), not just round numbers like
10, 50, or 100. Be accurate and consistent, as this score will be used by another
model for learning and evaluation.
http_timeout: 60000
https_verify: true
ignore_assistant_prompts: true
ignore_system_prompts: true
ignore_tool_prompts: true
sampling_rate: 1
llm:
auth:
allow_override: false
header_name: Authorization
header_value: Bearer $OPENAI_API_KEY
logging:
log_payloads: true
log_statistics: true
model:
name: gpt-4o
provider: openai
options:
temperature: 2
max_tokens: 5
top_p: 1
cohere:
embedding_input_type: classification
route_type: llm/v1/chat
message_countback: 3
plugin: ai-llm-as-judge
" | kubectl apply -f -
Next, apply the KongPlugin
resource by annotating the KongConsumerGroup
resource:
kubectl annotate -n kong CONSUMERGROUP_NAME konghq.com/plugins=ai-llm-as-judge
Prerequisite: Configure your Personal Access Token
terraform {
required_providers {
konnect = {
source = "kong/konnect"
}
}
}
provider "konnect" {
personal_access_token = "$KONNECT_TOKEN"
server_url = "https://us.api.konghq.com/"
}
Add the following to your Terraform configuration to create a Konnect Gateway Plugin:
resource "konnect_gateway_plugin_ai_llm_as_judge" "my_ai_llm_as_judge" {
enabled = true
config = {
prompt = <<EOF
You are a strict evaluator. You will be given a request and a response.
Your task is to judge whether the response is correct or incorrect. You must
assign a score between 1 and 100, where: 100 represents a completely correct
and ideal response, 1 represents a completely incorrect or irrelevant response.
Your score must be a single number only — no text, labels, or explanations.
Use the full range of values (e.g., 13, 47, 86), not just round numbers like
10, 50, or 100. Be accurate and consistent, as this score will be used by another
model for learning and evaluation.
EOF
http_timeout = 60000
https_verify = true
ignore_assistant_prompts = true
ignore_system_prompts = true
ignore_tool_prompts = true
sampling_rate = 1
llm = {
auth = {
allow_override = false
header_name = "Authorization"
header_value = "Bearer var.openai_api_key"
}
logging = {
log_payloads = true
log_statistics = true
}
model = {
name = "gpt-4o"
provider = "openai"
options = {
temperature = 2
max_tokens = 5
top_p = 1
cohere = {
embedding_input_type = "classification"
}
}
}
route_type = "llm/v1/chat"
}
message_countback = 3
}
control_plane_id = konnect_gateway_control_plane.my_konnect_cp.id
consumer_group = {
id = konnect_gateway_consumer_group.my_consumer_group.id
}
}
This example requires the following variables to be added to your manifest. You can specify values at runtime by setting TF_VAR_name=value
.
variable "openai_api_key" {
type = string
}