AI Rate Limiting Advanced: Enable LLM model rate limiting - Plugin

Enable LLM model rate limitingv3.14+

Protect your LLM services with model rate limiting. The AI Rate Limiting Advanced plugin will analyze query costs and token response to provide an enterprise-grade rate limiting strategy.

The following example uses GPT 5.1, but you can apply the same strategies to any LLM model.

Prerequisites

AI Proxy Advanced plugin configured with an LLM service

Set up the plugin

Add this section to your kong.yaml configuration file:

kong.yaml

Copied!

_format_version: "3.0"
plugins:
  - name: ai-rate-limiting-advanced
    config:
      policies:
      - match:
        - type: model
          partition_by: true
          values:
          - gpt-5.1
        limits:
        - limit: 100
          window_size: 60
        - limit: 1000
          window_size: 3600

Make the following request:

curl -i -X POST http://localhost:8001/plugins/ \
    --header "Accept: application/json" \
    --header "Content-Type: application/json" \
    --data '
    {
      "name": "ai-rate-limiting-advanced",
      "config": {
        "policies": [
          {
            "match": [
              {
                "type": "model",
                "partition_by": true,
                "values": [
                  "gpt-5.1"
                ]
              }
            ],
            "limits": [
              {
                "limit": 100,
                "window_size": 60
              },
              {
                "limit": 1000,
                "window_size": 3600
              }
            ]
          }
        ]
      },
      "tags": []
    }
    '

Copied!

Make the following request:

curl -X POST https://{region}.api.konghq.com/v2/control-planes/{controlPlaneId}/core-entities/plugins/ \
    --header "accept: application/json" \
    --header "Content-Type: application/json" \
    --header "Authorization: Bearer $KONNECT_TOKEN" \
    --data '
    {
      "name": "ai-rate-limiting-advanced",
      "config": {
        "policies": [
          {
            "match": [
              {
                "type": "model",
                "partition_by": true,
                "values": [
                  "gpt-5.1"
                ]
              }
            ],
            "limits": [
              {
                "limit": 100,
                "window_size": 60
              },
              {
                "limit": 1000,
                "window_size": 3600
              }
            ]
          }
        ]
      },
      "tags": []
    }
    '

Copied!

Make sure to replace the following placeholders with your own values:

region: Geographic region where your Kong Konnect is hosted and operates.
KONNECT_TOKEN: Your Personal Access Token (PAT) associated with your Konnect account.
controlPlaneId: The id of the control plane.

See the Konnect API reference to learn about region-specific URLs and personal access tokens.

echo "
apiVersion: configuration.konghq.com/v1
kind: KongClusterPlugin
metadata:
  name: ai-rate-limiting-advanced
  namespace: kong
  annotations:
    kubernetes.io/ingress.class: kong
    konghq.com/tags: ''
  labels:
    global: 'true'
config:
  policies:
  - match:
    - type: model
      partition_by: true
      values:
      - gpt-5.1
    limits:
    - limit: 100
      window_size: 60
    - limit: 1000
      window_size: 3600
plugin: ai-rate-limiting-advanced
" | kubectl apply -f -

Copied!

Prerequisite: Configure your Personal Access Token

terraform {
  required_providers {
    konnect = {
      source  = "kong/konnect"
    }
  }
}

provider "konnect" {
  personal_access_token = "$KONNECT_TOKEN"
  server_url            = "https://us.api.konghq.com/"
}

Copied!

Add the following to your Terraform configuration to create a Konnect Gateway Plugin:

resource "konnect_gateway_plugin_ai_rate_limiting_advanced" "my_ai_rate_limiting_advanced" {
  enabled = true

  config = {
    policies = [
      {
        match = [
          {
            type = "model"
            partition_by = true
            values = ["gpt-5.1"]
          }        ]
        limits = [
          {
            limit = 100
            window_size = 60
          }, 

          {
            limit = 1000
            window_size = 3600
          }        ]
      }    ]
  }
  tags = []

  control_plane_id = konnect_gateway_control_plane.my_konnect_cp.id
}

Copied!

Add this section to your kong.yaml configuration file:

kong.yaml

Copied!

_format_version: "3.0"
plugins:
  - name: ai-rate-limiting-advanced
    service: serviceName|Id
    config:
      policies:
      - match:
        - type: model
          partition_by: true
          values:
          - gpt-5.1
        limits:
        - limit: 100
          window_size: 60
        - limit: 1000
          window_size: 3600

Make sure to replace the following placeholders with your own values:

serviceName|Id: The id or name of the service the plugin configuration will target.

Make the following request:

curl -i -X POST http://localhost:8001/services/{serviceName|Id}/plugins/ \
    --header "Accept: application/json" \
    --header "Content-Type: application/json" \
    --data '
    {
      "name": "ai-rate-limiting-advanced",
      "config": {
        "policies": [
          {
            "match": [
              {
                "type": "model",
                "partition_by": true,
                "values": [
                  "gpt-5.1"
                ]
              }
            ],
            "limits": [
              {
                "limit": 100,
                "window_size": 60
              },
              {
                "limit": 1000,
                "window_size": 3600
              }
            ]
          }
        ]
      },
      "tags": []
    }
    '

Copied!

Make sure to replace the following placeholders with your own values:

serviceName|Id: The id or name of the service the plugin configuration will target.

Make the following request:

curl -X POST https://{region}.api.konghq.com/v2/control-planes/{controlPlaneId}/core-entities/services/{serviceId}/plugins/ \
    --header "accept: application/json" \
    --header "Content-Type: application/json" \
    --header "Authorization: Bearer $KONNECT_TOKEN" \
    --data '
    {
      "name": "ai-rate-limiting-advanced",
      "config": {
        "policies": [
          {
            "match": [
              {
                "type": "model",
                "partition_by": true,
                "values": [
                  "gpt-5.1"
                ]
              }
            ],
            "limits": [
              {
                "limit": 100,
                "window_size": 60
              },
              {
                "limit": 1000,
                "window_size": 3600
              }
            ]
          }
        ]
      },
      "tags": []
    }
    '

Copied!

Make sure to replace the following placeholders with your own values:

region: Geographic region where your Kong Konnect is hosted and operates.
KONNECT_TOKEN: Your Personal Access Token (PAT) associated with your Konnect account.
controlPlaneId: The id of the control plane.
serviceId: The id of the service the plugin configuration will target.

See the Konnect API reference to learn about region-specific URLs and personal access tokens.

echo "
apiVersion: configuration.konghq.com/v1
kind: KongPlugin
metadata:
  name: ai-rate-limiting-advanced
  namespace: kong
  annotations:
    kubernetes.io/ingress.class: kong
    konghq.com/tags: ''
config:
  policies:
  - match:
    - type: model
      partition_by: true
      values:
      - gpt-5.1
    limits:
    - limit: 100
      window_size: 60
    - limit: 1000
      window_size: 3600
plugin: ai-rate-limiting-advanced
" | kubectl apply -f -

Copied!

Next, apply the KongPlugin resource by annotating the service resource:

kubectl annotate -n kong service SERVICE_NAME konghq.com/plugins=ai-rate-limiting-advanced

Copied!

Prerequisite: Configure your Personal Access Token

terraform {
  required_providers {
    konnect = {
      source  = "kong/konnect"
    }
  }
}

provider "konnect" {
  personal_access_token = "$KONNECT_TOKEN"
  server_url            = "https://us.api.konghq.com/"
}

Copied!

Add the following to your Terraform configuration to create a Konnect Gateway Plugin:

resource "konnect_gateway_plugin_ai_rate_limiting_advanced" "my_ai_rate_limiting_advanced" {
  enabled = true

  config = {
    policies = [
      {
        match = [
          {
            type = "model"
            partition_by = true
            values = ["gpt-5.1"]
          }        ]
        limits = [
          {
            limit = 100
            window_size = 60
          }, 

          {
            limit = 1000
            window_size = 3600
          }        ]
      }    ]
  }
  tags = []

  control_plane_id = konnect_gateway_control_plane.my_konnect_cp.id
  service = {
    id = konnect_gateway_service.my_service.id
  }
}

Copied!

Add this section to your kong.yaml configuration file:

kong.yaml

Copied!

_format_version: "3.0"
plugins:
  - name: ai-rate-limiting-advanced
    route: routeName|Id
    config:
      policies:
      - match:
        - type: model
          partition_by: true
          values:
          - gpt-5.1
        limits:
        - limit: 100
          window_size: 60
        - limit: 1000
          window_size: 3600

Make sure to replace the following placeholders with your own values:

routeName|Id: The id or name of the route the plugin configuration will target.

Make the following request:

curl -i -X POST http://localhost:8001/routes/{routeName|Id}/plugins/ \
    --header "Accept: application/json" \
    --header "Content-Type: application/json" \
    --data '
    {
      "name": "ai-rate-limiting-advanced",
      "config": {
        "policies": [
          {
            "match": [
              {
                "type": "model",
                "partition_by": true,
                "values": [
                  "gpt-5.1"
                ]
              }
            ],
            "limits": [
              {
                "limit": 100,
                "window_size": 60
              },
              {
                "limit": 1000,
                "window_size": 3600
              }
            ]
          }
        ]
      },
      "tags": []
    }
    '

Copied!

Make sure to replace the following placeholders with your own values:

routeName|Id: The id or name of the route the plugin configuration will target.

Make the following request:

curl -X POST https://{region}.api.konghq.com/v2/control-planes/{controlPlaneId}/core-entities/routes/{routeId}/plugins/ \
    --header "accept: application/json" \
    --header "Content-Type: application/json" \
    --header "Authorization: Bearer $KONNECT_TOKEN" \
    --data '
    {
      "name": "ai-rate-limiting-advanced",
      "config": {
        "policies": [
          {
            "match": [
              {
                "type": "model",
                "partition_by": true,
                "values": [
                  "gpt-5.1"
                ]
              }
            ],
            "limits": [
              {
                "limit": 100,
                "window_size": 60
              },
              {
                "limit": 1000,
                "window_size": 3600
              }
            ]
          }
        ]
      },
      "tags": []
    }
    '

Copied!

Make sure to replace the following placeholders with your own values:

region: Geographic region where your Kong Konnect is hosted and operates.
KONNECT_TOKEN: Your Personal Access Token (PAT) associated with your Konnect account.
controlPlaneId: The id of the control plane.
routeId: The id of the route the plugin configuration will target.

See the Konnect API reference to learn about region-specific URLs and personal access tokens.

echo "
apiVersion: configuration.konghq.com/v1
kind: KongPlugin
metadata:
  name: ai-rate-limiting-advanced
  namespace: kong
  annotations:
    kubernetes.io/ingress.class: kong
    konghq.com/tags: ''
config:
  policies:
  - match:
    - type: model
      partition_by: true
      values:
      - gpt-5.1
    limits:
    - limit: 100
      window_size: 60
    - limit: 1000
      window_size: 3600
plugin: ai-rate-limiting-advanced
" | kubectl apply -f -

Copied!

Next, apply the KongPlugin resource by annotating the httproute or ingress resource:

kubectl annotate -n kong httproute  konghq.com/plugins=ai-rate-limiting-advanced

Copied!

kubectl annotate -n kong ingress  konghq.com/plugins=ai-rate-limiting-advanced

Copied!

Prerequisite: Configure your Personal Access Token

terraform {
  required_providers {
    konnect = {
      source  = "kong/konnect"
    }
  }
}

provider "konnect" {
  personal_access_token = "$KONNECT_TOKEN"
  server_url            = "https://us.api.konghq.com/"
}

Copied!

Add the following to your Terraform configuration to create a Konnect Gateway Plugin:

resource "konnect_gateway_plugin_ai_rate_limiting_advanced" "my_ai_rate_limiting_advanced" {
  enabled = true

  config = {
    policies = [
      {
        match = [
          {
            type = "model"
            partition_by = true
            values = ["gpt-5.1"]
          }        ]
        limits = [
          {
            limit = 100
            window_size = 60
          }, 

          {
            limit = 1000
            window_size = 3600
          }        ]
      }    ]
  }
  tags = []

  control_plane_id = konnect_gateway_control_plane.my_konnect_cp.id
  route = {
    id = konnect_gateway_route.my_route.id
  }
}

Copied!

Add this section to your kong.yaml configuration file:

kong.yaml

Copied!

_format_version: "3.0"
plugins:
  - name: ai-rate-limiting-advanced
    consumer: consumerName|Id
    config:
      policies:
      - match:
        - type: model
          partition_by: true
          values:
          - gpt-5.1
        limits:
        - limit: 100
          window_size: 60
        - limit: 1000
          window_size: 3600

Make sure to replace the following placeholders with your own values:

consumerName|Id: The id or name of the consumer the plugin configuration will target.

Make the following request:

curl -i -X POST http://localhost:8001/consumers/{consumerName|Id}/plugins/ \
    --header "Accept: application/json" \
    --header "Content-Type: application/json" \
    --data '
    {
      "name": "ai-rate-limiting-advanced",
      "config": {
        "policies": [
          {
            "match": [
              {
                "type": "model",
                "partition_by": true,
                "values": [
                  "gpt-5.1"
                ]
              }
            ],
            "limits": [
              {
                "limit": 100,
                "window_size": 60
              },
              {
                "limit": 1000,
                "window_size": 3600
              }
            ]
          }
        ]
      },
      "tags": []
    }
    '

Copied!

Make sure to replace the following placeholders with your own values:

consumerName|Id: The id or name of the consumer the plugin configuration will target.

Make the following request:

curl -X POST https://{region}.api.konghq.com/v2/control-planes/{controlPlaneId}/core-entities/consumers/{consumerId}/plugins/ \
    --header "accept: application/json" \
    --header "Content-Type: application/json" \
    --header "Authorization: Bearer $KONNECT_TOKEN" \
    --data '
    {
      "name": "ai-rate-limiting-advanced",
      "config": {
        "policies": [
          {
            "match": [
              {
                "type": "model",
                "partition_by": true,
                "values": [
                  "gpt-5.1"
                ]
              }
            ],
            "limits": [
              {
                "limit": 100,
                "window_size": 60
              },
              {
                "limit": 1000,
                "window_size": 3600
              }
            ]
          }
        ]
      },
      "tags": []
    }
    '

Copied!

Make sure to replace the following placeholders with your own values:

region: Geographic region where your Kong Konnect is hosted and operates.
KONNECT_TOKEN: Your Personal Access Token (PAT) associated with your Konnect account.
controlPlaneId: The id of the control plane.
consumerId: The id of the consumer the plugin configuration will target.

See the Konnect API reference to learn about region-specific URLs and personal access tokens.

echo "
apiVersion: configuration.konghq.com/v1
kind: KongPlugin
metadata:
  name: ai-rate-limiting-advanced
  namespace: kong
  annotations:
    kubernetes.io/ingress.class: kong
    konghq.com/tags: ''
config:
  policies:
  - match:
    - type: model
      partition_by: true
      values:
      - gpt-5.1
    limits:
    - limit: 100
      window_size: 60
    - limit: 1000
      window_size: 3600
plugin: ai-rate-limiting-advanced
" | kubectl apply -f -

Copied!

Next, apply the KongPlugin resource by annotating the KongConsumer resource:

kubectl annotate -n kong kongconsumer CONSUMER_NAME konghq.com/plugins=ai-rate-limiting-advanced

Copied!

Prerequisite: Configure your Personal Access Token

terraform {
  required_providers {
    konnect = {
      source  = "kong/konnect"
    }
  }
}

provider "konnect" {
  personal_access_token = "$KONNECT_TOKEN"
  server_url            = "https://us.api.konghq.com/"
}

Copied!

Add the following to your Terraform configuration to create a Konnect Gateway Plugin:

resource "konnect_gateway_plugin_ai_rate_limiting_advanced" "my_ai_rate_limiting_advanced" {
  enabled = true

  config = {
    policies = [
      {
        match = [
          {
            type = "model"
            partition_by = true
            values = ["gpt-5.1"]
          }        ]
        limits = [
          {
            limit = 100
            window_size = 60
          }, 

          {
            limit = 1000
            window_size = 3600
          }        ]
      }    ]
  }
  tags = []

  control_plane_id = konnect_gateway_control_plane.my_konnect_cp.id
  consumer = {
    id = konnect_gateway_consumer.my_consumer.id
  }
}

Copied!

Add this section to your kong.yaml configuration file:

kong.yaml

Copied!

_format_version: "3.0"
plugins:
  - name: ai-rate-limiting-advanced
    consumer_group: consumerGroupName|Id
    config:
      policies:
      - match:
        - type: model
          partition_by: true
          values:
          - gpt-5.1
        limits:
        - limit: 100
          window_size: 60
        - limit: 1000
          window_size: 3600

Make sure to replace the following placeholders with your own values:

consumerGroupName|Id: The id or name of the consumer group the plugin configuration will target.

Make the following request:

curl -i -X POST http://localhost:8001/consumer_groups/{consumerGroupName|Id}/plugins/ \
    --header "Accept: application/json" \
    --header "Content-Type: application/json" \
    --data '
    {
      "name": "ai-rate-limiting-advanced",
      "config": {
        "policies": [
          {
            "match": [
              {
                "type": "model",
                "partition_by": true,
                "values": [
                  "gpt-5.1"
                ]
              }
            ],
            "limits": [
              {
                "limit": 100,
                "window_size": 60
              },
              {
                "limit": 1000,
                "window_size": 3600
              }
            ]
          }
        ]
      },
      "tags": []
    }
    '

Copied!

Make sure to replace the following placeholders with your own values:

consumerGroupName|Id: The id or name of the consumer group the plugin configuration will target.

Make the following request:

curl -X POST https://{region}.api.konghq.com/v2/control-planes/{controlPlaneId}/core-entities/consumer_groups/{consumerGroupId}/plugins/ \
    --header "accept: application/json" \
    --header "Content-Type: application/json" \
    --header "Authorization: Bearer $KONNECT_TOKEN" \
    --data '
    {
      "name": "ai-rate-limiting-advanced",
      "config": {
        "policies": [
          {
            "match": [
              {
                "type": "model",
                "partition_by": true,
                "values": [
                  "gpt-5.1"
                ]
              }
            ],
            "limits": [
              {
                "limit": 100,
                "window_size": 60
              },
              {
                "limit": 1000,
                "window_size": 3600
              }
            ]
          }
        ]
      },
      "tags": []
    }
    '

Copied!

Make sure to replace the following placeholders with your own values:

region: Geographic region where your Kong Konnect is hosted and operates.
KONNECT_TOKEN: Your Personal Access Token (PAT) associated with your Konnect account.
controlPlaneId: The id of the control plane.
consumerGroupId: The id of the consumer group the plugin configuration will target.

See the Konnect API reference to learn about region-specific URLs and personal access tokens.

echo "
apiVersion: configuration.konghq.com/v1
kind: KongPlugin
metadata:
  name: ai-rate-limiting-advanced
  namespace: kong
  annotations:
    kubernetes.io/ingress.class: kong
    konghq.com/tags: ''
config:
  policies:
  - match:
    - type: model
      partition_by: true
      values:
      - gpt-5.1
    limits:
    - limit: 100
      window_size: 60
    - limit: 1000
      window_size: 3600
plugin: ai-rate-limiting-advanced
" | kubectl apply -f -

Copied!

Next, apply the KongPlugin resource by annotating the KongConsumerGroup resource:

kubectl annotate -n kong kongconsumergroup CONSUMERGROUP_NAME konghq.com/plugins=ai-rate-limiting-advanced

Copied!

Prerequisite: Configure your Personal Access Token

terraform {
  required_providers {
    konnect = {
      source  = "kong/konnect"
    }
  }
}

provider "konnect" {
  personal_access_token = "$KONNECT_TOKEN"
  server_url            = "https://us.api.konghq.com/"
}

Copied!

Add the following to your Terraform configuration to create a Konnect Gateway Plugin:

resource "konnect_gateway_plugin_ai_rate_limiting_advanced" "my_ai_rate_limiting_advanced" {
  enabled = true

  config = {
    policies = [
      {
        match = [
          {
            type = "model"
            partition_by = true
            values = ["gpt-5.1"]
          }        ]
        limits = [
          {
            limit = 100
            window_size = 60
          }, 

          {
            limit = 1000
            window_size = 3600
          }        ]
      }    ]
  }
  tags = []

  control_plane_id = konnect_gateway_control_plane.my_konnect_cp.id
  consumer_group = {
    id = konnect_gateway_consumer_group.my_consumer_group.id
  }
}

Copied!

AI Rate Limiting Advanced

Enable LLM model rate limitingv3.14+

Prerequisites

Set up the plugin

Help us make these docs great!

Still need help?