Now, you can test the rate limiting configuration.
- The first request sends a
x-prompt-count of 100000, which is within the configured token limits and should receive a 200 OK response.
- The second request, sent shortly after with a
x-prompt-count of 950000, exceeds the allowed token quota and is expected to return a 429 response.
curl -i -X POST "$KONNECT_PROXY_URL/anything" \
--no-progress-meter --fail-with-body \
-H "Content-Type: application/json"\
-H "x-prompt-count: 100000" \
--json '{
"messages": [
{
"role": "system",
"content": "You are an IT specialist."
},
{
"role": "user",
"content": "Tell me about Google?"
}
]
}'
You should see the following response:
curl -i -X POST "http://localhost:8000/anything" \
--no-progress-meter --fail-with-body \
-H "Content-Type: application/json"\
-H "x-prompt-count: 100000" \
--json '{
"messages": [
{
"role": "system",
"content": "You are an IT specialist."
},
{
"role": "user",
"content": "Tell me about Google?"
}
]
}'
You should see the following response:
Now, you can test the rate limiting function by sending the following request:
curl -i -X POST "$KONNECT_PROXY_URL/anything" \
--no-progress-meter --fail-with-body \
-H "Content-Type: application/json"\
-H "x-prompt-count: 950000" \
--json '{
"messages": [
{
"role": "system",
"content": "You are an IT specialist."
},
{
"role": "user",
"content": "Tell me about Google?"
}
]
}'
You should see the following response:
HTTP/1.1 429 AI token rate limit exceeded for provider(s): cohere
curl -i -X POST "http://localhost:8000/anything" \
--no-progress-meter --fail-with-body \
-H "Content-Type: application/json"\
-H "x-prompt-count: 950000" \
--json '{
"messages": [
{
"role": "system",
"content": "You are an IT specialist."
},
{
"role": "user",
"content": "Tell me about Google?"
}
]
}'
You should see the following response:
HTTP/1.1 429 AI token rate limit exceeded for provider(s): cohere