Enforce AI rate limits with a custom function

Uses: Kong Gateway AI Gateway deck

Deployment Platform

konnect

on-prem

Prerequisites

Kong Konnect

This is a Konnect tutorial and requires a Konnect personal access token.

Create a new personal access token by opening the Konnect PAT page and selecting Generate Token.
Export your token to an environment variable:
```
 export KONNECT_TOKEN='YOUR_KONNECT_PAT'
```
Copied!
Run the quickstart script to automatically provision a Control Plane and Data Plane, and configure your environment:
```
 curl -Ls https://get.konghq.com/quickstart | bash -s -- -k $KONNECT_TOKEN --deck-output
```
Copied!
This sets up a Konnect Control Plane named quickstart, provisions a local Data Plane, and prints out the following environment variable exports:
```
 export DECK_KONNECT_TOKEN=$KONNECT_TOKEN
 export DECK_KONNECT_CONTROL_PLANE_NAME=quickstart
 export KONNECT_CONTROL_PLANE_URL=https://us.api.konghq.com
 export KONNECT_PROXY_URL='http://localhost:8000'
```
Copied!
Copy and paste these into your terminal to configure your session.

Kong Gateway running

This tutorial requires Kong Gateway Enterprise. If you don’t have Kong Gateway set up yet, you can use the quickstart script with an enterprise license to get an instance of Kong Gateway running almost instantly.

Export your license to an environment variable:

 export KONG_LICENSE_DATA='LICENSE-CONTENTS-GO-HERE'

Copied!

Run the quickstart script:

curl -Ls https://get.konghq.com/quickstart | bash -s -- -e KONG_LICENSE_DATA

Copied!

Once Kong Gateway is ready, you will see the following message:

 Kong Gateway Ready

decK v1.43+

decK is a CLI tool for managing Kong Gateway declaratively with state files. To complete this tutorial, install decK version 1.43 or later.

This guide uses deck gateway apply, which directly applies entity configuration to your Gateway instance. We recommend upgrading your decK installation to take advantage of this tool.

You can check your current decK version with deck version.

Required entities

For this tutorial, you’ll need Kong Gateway entities, like Gateway Services and Routes, pre-configured. These entities are essential for Kong Gateway to function but installing them isn’t the focus of this guide. Follow these steps to pre-configure them:

Run the following command:

echo '
_format_version: "3.0"
services:
  - name: example-service
    url: http://httpbin.konghq.com/anything
routes:
  - name: example-route
    paths:
    - "/anything"
    service:
      name: example-service
' | deck gateway apply -

Copied!

To learn more about entities, you can read our entities documentation.

Cohere

For this task, you need an Anthropic API key.

Create a Cohere account.
Generate an API key from the dashboard.
Create a decK variable with your API key:
```
export DECK_COHERE_API_KEY='COHERE API KEY'
```
Copied!

Redis

To complete this tutorial, make sure you have the following:

A Redis Stack running and accessible from the environment where Kong is deployed.
Port 6379, or your custom Redis port is open and reachable from Kong.
Redis host set as an environment variable so the plugin can connect:
```
export DECK_REDIS_HOST='YOUR-REDIS-HOST'
```
Copied!

If you’re testing locally with Docker, use host.docker.internal as the host value.

Configure the plugin

Enable the AI Proxy plugin with your Cohere API key and the model details to proxy requests to Cohere. In this example, we’ll use the command-a-03-2025 model.

echo '
_format_version: "3.0"
plugins:
  - name: ai-proxy
    config:
      route_type: llm/v1/chat
      auth:
        header_name: Authorization
        header_value: Bearer ${{ env "DECK_COHERE_API_KEY" }}
      model:
        provider: cohere
        name: command-a-03-2025
        options:
          max_tokens: 512
          temperature: 1.0
' | deck gateway apply -

Copied!

Configure the AI Rate Limiting Advanced plugin

Now, configure the AI Rate Limiting Advanced plugin. This configuration enforces usage limits on AI model requests by tracking token consumption through a custom Lua function. Rate limit counters are stored in Redis, and the x-prompt-count HTTP header is used to count tokens per request. This setup helps prevent quota overruns and protects your AI infrastructure from excessive usage.

echo '
_format_version: "3.0"
plugins:
  - name: ai-rate-limiting-advanced
    config:
      strategy: redis
      redis:
        host: "${{ env "DECK_REDIS_HOST" }}"
        port: 16379
      sync_rate: 0
      llm_providers:
      - name: cohere
        limit:
        - 100
        - 1000
        window_size:
        - 60
        - 3600
      request_prompt_count_function: |
        local header_count = tonumber(kong.request.get_header("x-prompt-count"))
        if header_count then
          return header_count
        end
        return 0
' | deck gateway apply -

Copied!

Validate the configuration

Now, you can test the rate limiting configuration.

The first request sends a x-prompt-count of 100000, which is within the configured token limits and should receive a 200 OK response.
The second request, sent shortly after with a x-prompt-count of 950000, exceeds the allowed token quota and is expected to return a 429 response.

 curl  "$KONNECT_PROXY_URL/anything" \
     --no-progress-meter --fail-with-body  \
     -H "Content-Type: application/json"\
     -H "x-prompt-count: 100000" \
     --json '{
       "messages": [
         {
           "role": "system",
           "content": "You are an IT specialist."
         },
         {
           "role": "user",
           "content": "Tell me about Google?"
         }
       ]
     }'

Copied!

You should see the following response:

OK

Copied!

 curl  "http://localhost:8000/anything" \
     --no-progress-meter --fail-with-body  \
     -H "Content-Type: application/json"\
     -H "x-prompt-count: 100000" \
     --json '{
       "messages": [
         {
           "role": "system",
           "content": "You are an IT specialist."
         },
         {
           "role": "user",
           "content": "Tell me about Google?"
         }
       ]
     }'

Copied!

You should see the following response:

OK

Copied!

Now, you can test the rate limiting function by sending the following request:

 curl  "$KONNECT_PROXY_URL/anything" \
     --no-progress-meter --fail-with-body  \
     -H "Content-Type: application/json"\
     -H "x-prompt-count: 950000" \
     --json '{
       "messages": [
         {
           "role": "system",
           "content": "You are an IT specialist."
         },
         {
           "role": "user",
           "content": "Tell me about Google?"
         }
       ]
     }'

Copied!

You should see the following response:

Rate limit exceeded for provider: cohere

Copied!

 curl  "http://localhost:8000/anything" \
     --no-progress-meter --fail-with-body  \
     -H "Content-Type: application/json"\
     -H "x-prompt-count: 950000" \
     --json '{
       "messages": [
         {
           "role": "system",
           "content": "You are an IT specialist."
         },
         {
           "role": "user",
           "content": "Tell me about Google?"
         }
       ]
     }'

Copied!

You should see the following response:

Rate limit exceeded for provider: cohere

Copied!

Cleanup

Clean up Konnect environment

If you created a new control plane and want to conserve your free trial credits or avoid unnecessary charges, delete the new control plane used in this tutorial.

Destroy the Kong Gateway container

curl -Ls https://get.konghq.com/quickstart | bash -s -- -d

Copied!

Enforce AI rate limits with a custom function

Prerequisites

Kong Konnect

Kong Gateway running

decK v1.43+

Required entities

Cohere

Redis

Configure the plugin

Configure the AI Rate Limiting Advanced plugin

Validate the configuration

Cleanup

Clean up Konnect environment

Destroy the Kong Gateway container

Help us make these docs great!

Still need help