Use Kong AI Gateway to govern GitHub MCP traffic

Uses: Kong Gateway AI Gateway decK
TL;DR

Use Kong’s AI Proxy Advanced plugin to load balance MCP requests across multiple OpenAI models, and secure the traffic with the AI Prompt Guard plugin. The guard plugin filters prompts based on allow and deny patterns, ensuring only safe, relevant requests reach your GitHub MCP server, while blocking potentially harmful or unauthorized commands.

Prerequisites

decK is a CLI tool for managing Kong Gateway declaratively with state files. To complete this tutorial you will first need to install decK.

Reconfigure the AI Proxy Advanced plugin

This configuration uses the AI Proxy Advanced plugin to load balance requests between OpenAI’s gpt-4 and gpt-4o models using a round-robin algorithm. Both models are configured to call a GitHub-hosted remote MCP server via the llm/v1/responses route. The plugin injects the required OpenAI API key for authentication and logs both payloads and statistics. With equal weights assigned to each target, traffic is split evenly between the two models.

This configuration is for demonstration purposes only and is not intended for production use.

echo '
_format_version: "3.0"
plugins:
  - name: ai-proxy-advanced
    config:
      balancer:
        algorithm: round-robin
      targets:
      - model:
          provider: openai
          name: gpt-4
          options:
            max_tokens: 512
            temperature: 1.0
        route_type: llm/v1/responses
        auth:
          header_name: Authorization
          header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}
        weight: 50
      - model:
          provider: openai
          name: gpt-4o
          options:
            max_tokens: 512
            temperature: 1.0
        route_type: llm/v1/responses
        auth:
          header_name: Authorization
          header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}
        weight: 50
' | deck gateway apply -

Validate MCP Traffic balancing

Now that the AI Proxy Advanced plugin is configured with round-robin load balancing, you can verify that traffic is distributed across both OpenAI models. This script sends 10 test requests to the MCP server Route and prints the model used in each response. If load balancing is working correctly, the output should alternate between gpt-4 and gpt-4o based on their configured weights.

for i in {1..10}; do
  echo -n "Request #$i — Model: "
  curl -s -X POST "http://localhost:8000/anything" \
    -H "Accept: application/json" \
    -H "apikey: hello_world" \
    -H "Content-Type: application/json" \
    --json "{
      \"tools\": [
        {
          \"type\": \"mcp\",
          \"server_label\": \"gitmcp\",
          \"server_url\": \"https://api.githubcopilot.com/mcp/x/repos\",
          \"require_approval\": \"never\",
          \"headers\": {
            \"Authorization\": \"Bearer $GITHUB_PAT\"
          }
        }
      ],
      \"input\": \"Test\"
    }" | jq -r '.model'
  sleep 3
done

If successful, the script should give the following output:

Request #1 — Model: gpt-4o-2024-08-06
Request #2 — Model: gpt-4-0613
Request #3 — Model: gpt-4o-2024-08-06
Request #4 — Model: gpt-4-0613
Request #5 — Model: gpt-4o-2024-08-06
Request #6 — Model: gpt-4o-2024-08-06
Request #7 — Model: gpt-4o-2024-08-06
Request #8 — Model: gpt-4o-2024-08-06
Request #9 — Model: gpt-4o-2024-08-06
Request #10 — Model: gpt-4-0613

Configure the AI Prompt Guard plugin

In this step, we’ll secure our MCP traffic even further by adding the AI Prompt Guard plugin. This plugin enforces content-level filtering using allow and deny patterns. It ensures only safe, relevant prompts reach the model—for example, questions about GitHub MCP capabilities—while blocking potentially harmful or abusive inputs like exploit attempts or security threats.

echo '
_format_version: "3.0"
plugins:
  - name: ai-prompt-guard
    config:
      allow_patterns:
      - "(?i).*GitHub MCP.*"
      - "(?i).*MCP server.*"
      - "(?i).*(tools?|features?|capabilities?|options?) available.*"
      - "(?i).*what can I do.*"
      - "(?i).*how do I.*"
      - "(?i).*(notifications?|issues?|pull requests?|PRs?|code reviews?|repos?|branches?|scanning).*"
      - "(?i).*(create|update|view|get|comment|merge|manage).*"
      - "(?i).*(workflow|assistant|automatically|initialize|fork|diff).*"
      - "(?i).*(auth(entication)?|token|scan|quality|security|setup).*"
      - "(?i).*test*"
      deny_patterns:
      - ".*(hacking|hijacking|exploit|bypass|malware|backdoor|ddos|phishing|payload|sql
        injection).*"
      - ".*(root access|unauthorized|breach|exfiltrate|data leak|ransomware).*"
      - ".*(zero[- ]day|CVE-\\\\d{4}-\\\\d+|buffer overflow).*"
' | deck gateway apply -

Validate your configuration

Now you can validate your configuration by testing tool requests and sending requests that should be denied.

Allowed tool requests to GitHub MCP Server

Replace YOUR_REPOSITORY_NAME in the example requests below with your repository path, using the format: owner-name/repository-name.

Denied requests

Each input below matches a deny pattern like .*(backdoor|exfiltrate|CVE-\d{4}-\d+).*, which should trigger rejection by the AI Prompt Guard plugin. Let me know if you’d like boundary cases or false positives to test.

Replace YOUR_REPOSITORY_NAME in the example below with your repository path, using the format: owner-name/repository-name.

Enforce local rate limiting

Now you can enforce local rate limiting by configuring the AI Rate Limiting Advanced plugin to enforce strict rate limits on requests to the OpenAI provider. This uses a local strategy with a fixed window of 10 seconds, which only allows one request per 10-second window. Requests exceeding this limit within the same window will receive a 429 Too Many Requests response, effectively controlling request bursts and protecting backend resources.

Note: This configuration is for testing purposes only. In a production environment, rate limits and window sizes should be adjusted to match actual usage patterns and performance requirements.

echo '
_format_version: "3.0"
plugins:
  - name: ai-rate-limiting-advanced
    config:
      llm_providers:
      - name: openai
        limit:
        - 1
        window_size:
        - 10
' | deck gateway apply -

Validate rate limiting

We can test our simple rate limiting configuration by running the following script:

for i in {1..6}; do
  echo -n "Request #$i — Status: "
  http_status=$(curl -s -o /dev/null -w '%{http_code}' -X POST "http://localhost:8000/anything" \
    -H "Accept: application/json" \
    -H "apikey: hello_world" \
    -H "Content-Type: application/json" \
    --json "{
      \"tools\": [
        {
          \"type\": \"mcp\",
          \"server_label\": \"gitmcp\",
          \"server_url\": \"https://api.githubcopilot.com/mcp/x/repos\",
          \"require_approval\": \"never\",
          \"headers\": {
            \"Authorization\": \"Bearer $GITHUB_PAT\"
          }
        }
      ],
      \"input\": \"How do I\"
    }")
  echo "$http_status"

  if [[ $i == 3 ]]; then
    sleep 20
  else
    sleep 2
  fi
done

Which should give the following output:

Request #1 — Status: 200
Request #2 — Status: 429
Request #3 — Status: 429
Request #4 — Status: 200
Request #5 — Status: 429
Request #6 — Status: 429

Our rate limit configuration is allowing one request per 60 seconds, which means that:

  • Request #1 is allowed (status 200). The first request within the rate limit.

  • Request #2 gets status 429 (Too Many Requests). The limit is one request per 10 seconds, so this second request exceeds it.

  • Request #3 also gets 429 because it’s still within the same 10-second window, so rate limiting blocks it.

  • After sleeping 20 seconds (more than the 10-second window), the rate limit resets.

  • Request #4 falls into a fresh window and is allowed (status 200).

  • Requests #5 and #6 happen shortly after and exceed the limit again, getting 429.

Something wrong?

Help us make these docs great!

Kong Developer docs are open source. If you find these useful and want to make them better, contribute today!