Use Kong AI Gateway to govern GitHub MCP traffic
Use Kong’s AI Proxy Advanced plugin to load balance MCP requests across multiple OpenAI models, and secure the traffic with the AI Prompt Guard plugin. The guard plugin filters prompts based on allow and deny patterns, ensuring only safe, relevant requests reach your GitHub MCP server, while blocking potentially harmful or unauthorized commands.
Prerequisites
Series Prerequisites
This page is part of the Secure, govern and observe MCP traffic with Kong AI Gateway series.
Complete the previous page, Secure GitHub MCP Server traffic with Kong Gateway and AI Gateway before completing this page.
Reconfigure the AI Proxy Advanced plugin
This configuration uses the AI Proxy Advanced plugin to load balance requests between OpenAI’s gpt-4
and gpt-4o
models using a round-robin algorithm. Both models are configured to call a GitHub-hosted remote MCP server via the llm/v1/responses
route. The plugin injects the required OpenAI API key for authentication and logs both payloads and statistics. With equal weights assigned to each target, traffic is split evenly between the two models.
This configuration is for demonstration purposes only and is not intended for production use.
echo '
_format_version: "3.0"
plugins:
- name: ai-proxy-advanced
config:
balancer:
algorithm: round-robin
targets:
- model:
provider: openai
name: gpt-4
options:
max_tokens: 512
temperature: 1.0
route_type: llm/v1/responses
auth:
header_name: Authorization
header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}
weight: 50
- model:
provider: openai
name: gpt-4o
options:
max_tokens: 512
temperature: 1.0
route_type: llm/v1/responses
auth:
header_name: Authorization
header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}
weight: 50
' | deck gateway apply -
Validate MCP Traffic balancing
Now that the AI Proxy Advanced plugin is configured with round-robin load balancing, you can verify that traffic is distributed across both OpenAI models. This script sends 10 test requests to the MCP server Route and prints the model used in each response. If load balancing is working correctly, the output should alternate between gpt-4
and gpt-4o
based on their configured weights.
for i in {1..10}; do
echo -n "Request #$i — Model: "
curl -s -X POST "http://localhost:8000/anything" \
-H "Accept: application/json" \
-H "apikey: hello_world" \
-H "Content-Type: application/json" \
--json "{
\"tools\": [
{
\"type\": \"mcp\",
\"server_label\": \"gitmcp\",
\"server_url\": \"https://api.githubcopilot.com/mcp/x/repos\",
\"require_approval\": \"never\",
\"headers\": {
\"Authorization\": \"Bearer $GITHUB_PAT\"
}
}
],
\"input\": \"Test\"
}" | jq -r '.model'
sleep 3
done
If successful, the script should give the following output:
Request #1 — Model: gpt-4o-2024-08-06
Request #2 — Model: gpt-4-0613
Request #3 — Model: gpt-4o-2024-08-06
Request #4 — Model: gpt-4-0613
Request #5 — Model: gpt-4o-2024-08-06
Request #6 — Model: gpt-4o-2024-08-06
Request #7 — Model: gpt-4o-2024-08-06
Request #8 — Model: gpt-4o-2024-08-06
Request #9 — Model: gpt-4o-2024-08-06
Request #10 — Model: gpt-4-0613
Configure the AI Prompt Guard plugin
In this step, we’ll secure our MCP traffic even further by adding the AI Prompt Guard plugin. This plugin enforces content-level filtering using allow and deny patterns. It ensures only safe, relevant prompts reach the model—for example, questions about GitHub MCP capabilities—while blocking potentially harmful or abusive inputs like exploit attempts or security threats.
echo '
_format_version: "3.0"
plugins:
- name: ai-prompt-guard
config:
allow_patterns:
- "(?i).*GitHub MCP.*"
- "(?i).*MCP server.*"
- "(?i).*(tools?|features?|capabilities?|options?) available.*"
- "(?i).*what can I do.*"
- "(?i).*how do I.*"
- "(?i).*(notifications?|issues?|pull requests?|PRs?|code reviews?|repos?|branches?|scanning).*"
- "(?i).*(create|update|view|get|comment|merge|manage).*"
- "(?i).*(workflow|assistant|automatically|initialize|fork|diff).*"
- "(?i).*(auth(entication)?|token|scan|quality|security|setup).*"
- "(?i).*test*"
deny_patterns:
- ".*(hacking|hijacking|exploit|bypass|malware|backdoor|ddos|phishing|payload|sql
injection).*"
- ".*(root access|unauthorized|breach|exfiltrate|data leak|ransomware).*"
- ".*(zero[- ]day|CVE-\\\\d{4}-\\\\d+|buffer overflow).*"
' | deck gateway apply -
Validate your configuration
Now you can validate your configuration by testing tool requests and sending requests that should be denied.
Allowed tool requests to GitHub MCP Server
Replace
YOUR_REPOSITORY_NAME
in the example requests below with your repository path, using the format:owner-name/repository-name
.
Denied requests
Each input below matches a deny pattern like .*(backdoor|exfiltrate|CVE-\d{4}-\d+).*
, which should trigger rejection by the AI Prompt Guard plugin. Let me know if you’d like boundary cases or false positives to test.
Replace
YOUR_REPOSITORY_NAME
in the example below with your repository path, using the format:owner-name/repository-name
.
Enforce local rate limiting
Now you can enforce local rate limiting by configuring the AI Rate Limiting Advanced plugin to enforce strict rate limits on requests to the OpenAI provider. This uses a local strategy with a fixed window of 10 seconds, which only allows one request per 10-second window. Requests exceeding this limit within the same window will receive a 429 Too Many Requests
response, effectively controlling request bursts and protecting backend resources.
Note: This configuration is for testing purposes only. In a production environment, rate limits and window sizes should be adjusted to match actual usage patterns and performance requirements.
echo '
_format_version: "3.0"
plugins:
- name: ai-rate-limiting-advanced
config:
llm_providers:
- name: openai
limit:
- 1
window_size:
- 10
' | deck gateway apply -
Validate rate limiting
We can test our simple rate limiting configuration by running the following script:
for i in {1..6}; do
echo -n "Request #$i — Status: "
http_status=$(curl -s -o /dev/null -w '%{http_code}' -X POST "http://localhost:8000/anything" \
-H "Accept: application/json" \
-H "apikey: hello_world" \
-H "Content-Type: application/json" \
--json "{
\"tools\": [
{
\"type\": \"mcp\",
\"server_label\": \"gitmcp\",
\"server_url\": \"https://api.githubcopilot.com/mcp/x/repos\",
\"require_approval\": \"never\",
\"headers\": {
\"Authorization\": \"Bearer $GITHUB_PAT\"
}
}
],
\"input\": \"How do I\"
}")
echo "$http_status"
if [[ $i == 3 ]]; then
sleep 20
else
sleep 2
fi
done
Which should give the following output:
Request #1 — Status: 200
Request #2 — Status: 429
Request #3 — Status: 429
Request #4 — Status: 200
Request #5 — Status: 429
Request #6 — Status: 429
Our rate limit configuration is allowing one request per 60 seconds, which means that:
-
Request #1 is allowed (status 200). The first request within the rate limit.
-
Request #2 gets status 429 (Too Many Requests). The limit is one request per 10 seconds, so this second request exceeds it.
-
Request #3 also gets 429 because it’s still within the same 10-second window, so rate limiting blocks it.
-
After sleeping 20 seconds (more than the 10-second window), the rate limit resets.
-
Request #4 falls into a fresh window and is allowed (status 200).
-
Requests #5 and #6 happen shortly after and exceed the limit again, getting 429.