Use AWS Bedrock rerank API with AI Proxy

Uses: Kong Gateway AI Gateway deck

Deployment Platform

konnect

on-prem

Prerequisites

Kong Konnect

This is a Konnect tutorial and requires a Konnect personal access token.

Create a new personal access token by opening the Konnect PAT page and selecting Generate Token.
Export your token to an environment variable:
```
 export KONNECT_TOKEN='YOUR_KONNECT_PAT'
```
Copied!
Run the quickstart script to automatically provision a Control Plane and Data Plane, and configure your environment:
```
 curl -Ls https://get.konghq.com/quickstart | bash -s -- -k $KONNECT_TOKEN --deck-output
```
Copied!
This sets up a Konnect Control Plane named quickstart, provisions a local Data Plane, and prints out the following environment variable exports:
```
 export DECK_KONNECT_TOKEN=$KONNECT_TOKEN
 export DECK_KONNECT_CONTROL_PLANE_NAME=quickstart
 export KONNECT_CONTROL_PLANE_URL=https://us.api.konghq.com
 export KONNECT_PROXY_URL='http://localhost:8000'
```
Copied!
Copy and paste these into your terminal to configure your session.

Kong Gateway running

This tutorial requires Kong Gateway Enterprise. If you don’t have Kong Gateway set up yet, you can use the quickstart script with an enterprise license to get an instance of Kong Gateway running almost instantly.

Export your license to an environment variable:

 export KONG_LICENSE_DATA='LICENSE-CONTENTS-GO-HERE'

Copied!

Run the quickstart script:

curl -Ls https://get.konghq.com/quickstart | bash -s -- -e KONG_LICENSE_DATA

Copied!

Once Kong Gateway is ready, you will see the following message:

 Kong Gateway Ready

decK v1.43+

decK is a CLI tool for managing Kong Gateway declaratively with state files. To complete this tutorial, install decK version 1.43 or later.

This guide uses deck gateway apply, which directly applies entity configuration to your Gateway instance. We recommend upgrading your decK installation to take advantage of this tool.

You can check your current decK version with deck version.

Required entities

For this tutorial, you’ll need Kong Gateway entities, like Gateway Services and Routes, pre-configured. These entities are essential for Kong Gateway to function but installing them isn’t the focus of this guide. Follow these steps to pre-configure them:

Run the following command:

echo '
_format_version: "3.0"
services:
  - name: rerank-service
    url: http://httpbin.konghq.com/rerank
routes:
  - name: rerank-route
    paths:
    - "/rerank"
    service:
      name: rerank-service
' | deck gateway apply -

Copied!

To learn more about entities, you can read our entities documentation.

AWS credentials and Bedrock model access

Before you begin, you must have AWS credentials with Bedrock permissions:

AWS Access Key ID: Your AWS access key
AWS Secret Access Key: Your AWS secret key
Region: AWS region where Bedrock is available (for example, us-west-2)

Enable the rerank model in the AWS Bedrock console under Model Access. Navigate to Bedrock > Model access and request access to cohere.rerank-v3-5:0.
After model access is granted, construct the model ARN for your region:
```
arn:aws:bedrock:<region>::foundation-model/cohere.rerank-v3-5:0
```
Copied!
Replace <region> with your AWS region (for example, us-west-2).

Export the required values as environment variables:

export DECK_AWS_ACCESS_KEY_ID="<your-access-key-id>"
export DECK_AWS_SECRET_ACCESS_KEY="<your-secret-access-key>"
export DECK_AWS_REGION="<region>"
export DECK_AWS_MODEL="arn:aws:bedrock:<region>::foundation-model/cohere.rerank-v3-5:0"

Copied!

Replace <region> in both AWS_REGION and the AWS_MODEL ARN with your AWS Bedrock deployment region. See FAQs below for more details.

Python and requests library

Install Python 3 and the requests library:

pip install requests

Copied!

Configure the plugin

Configure AI Proxy to use AWS Bedrock’s Rerank API. This requires creating a dedicated route with the /rerank path:

echo '
_format_version: "3.0"
plugins:
  - name: ai-proxy
    route: rerank-route
    config:
      llm_format: bedrock
      route_type: llm/v1/chat
      logging:
        log_payloads: false
        log_statistics: true
      auth:
        allow_override: false
        aws_access_key_id: "${{ env "DECK_AWS_ACCESS_KEY_ID" }}"
        aws_secret_access_key: "${{ env "DECK_AWS_SECRET_ACCESS_KEY" }}"
      model:
        provider: bedrock
        name: "${{ env "DECK_AWS_MODEL" }}"
        options:
          bedrock:
            aws_region: "${{ env "DECK_AWS_REGION" }}"
' | deck gateway apply -

Copied!

The config.llm_format: bedrock setting enables Kong to accept native AWS Bedrock API requests. Kong detects the /rerank URI pattern and automatically routes requests to the Bedrock Agent Runtime service.

Use AWS Bedrock Rerank API

AWS Bedrock’s Rerank API reorders candidate documents by semantic relevance to a query. Send a query and document list (typically from vector or keyword search). The API returns the top N documents ordered by relevance score. This reduces context size before LLM generation and prioritizes relevant information. The rerank API scores and orders documents. It does not generate answers or citations.

The following script sends a query with 5 candidate documents to AWS Bedrock’s rerank endpoint. Three documents discuss exercise and health benefits. Two documents are intentionally irrelevant (Eiffel Tower, Python programming).

The script shows the original document order, then the reranked order with relevance scores. The numberOfResults: 3 parameter limits the response to the top 3 documents. This demonstrates how reranking filters and reorders documents by semantic relevance before LLM generation.

Create the script:

cat > bedrock-rerank-demo.py << 'EOF'
#!/usr/bin/env python3
"""Demonstrate AWS Bedrock Rerank for improving RAG retrieval quality"""

import requests
import json

RERANK_URL = "http://localhost:8000/rerank"

print("AWS Bedrock Rerank Demo: RAG Pipeline Improvement")
print("=" * 60)

# Simulate documents retrieved from vector search
query = "What are the health benefits of regular exercise?"
documents = [
    "Regular exercise can improve cardiovascular health and reduce the risk of heart disease.",
    "The Eiffel Tower was completed in 1889 and stands 324 meters tall.",
    "Exercise helps maintain healthy weight by burning calories and building muscle mass.",
    "Python is a high-level programming language known for its simplicity and readability.",
    "Physical activity strengthens bones and muscles, reducing the risk of osteoporosis and falls in older adults."
]

print(f"\nQuery: {query}")
print(f"\nCandidate documents: {len(documents)}")

# Before rerank: show original order
print("\n--- BEFORE RERANK (Original retrieval order) ---")
for idx, doc in enumerate(documents):
    print(f"{idx}. {doc[:80]}...")

# Rerank the documents
print("\n--- RERANKING ---")
try:
    # Build Bedrock rerank request
    sources = []
    for doc in documents:
        sources.append({
            "type": "INLINE",
            "inlineDocumentSource": {
                "type": "TEXT",
                "textDocument": {
                    "text": doc
                }
            }
        })

    response = requests.post(
        RERANK_URL,
        headers={"Content-Type": "application/json"},
        json={
            "queries": [
                {
                    "type": "TEXT",
                    "textQuery": {
                        "text": query
                    }
                }
            ],
            "sources": sources,
            "rerankingConfiguration": {
                "type": "BEDROCK_RERANKING_MODEL",
                "bedrockRerankingConfiguration": {
                    "numberOfResults": 3,
                    "modelConfiguration": {
                        "modelArn": "arn:aws:bedrock:us-west-2::foundation-model/cohere.rerank-v3-5:0"
                    }
                }
            }
        }
    )

    response.raise_for_status()
    result = response.json()

    print("✓ Reranking complete")

    # After rerank: show reordered results
    print("\n--- AFTER RERANK (Ordered by relevance) ---")
    for item in result['results']:
        idx = item['index']
        score = item['relevanceScore']
        print(f"{idx}. [Relevance: {score:.3f}] {documents[idx][:80]}...")

    # Show the top document that should be sent to LLM
    print("\n--- TOP RESULT FOR LLM CONTEXT ---")
    top_idx = result['results'][0]['index']
    top_score = result['results'][0]['relevanceScore']
    print(f"Relevance Score: {top_score:.3f}")
    print(f"Document: {documents[top_idx]}")

except Exception as e:
    print(f"✗ Failed: {e}")

print("\n" + "=" * 60)
print("Demo complete")
EOF

Copied!

Verify that the response structure includes results with index and relevanceScore fields. Check AWS Bedrock’s API documentation or test the script to confirm this behavior.

Validate the configuration

Now, let’s run the script we created in the previous step:

python3 bedrock-rerank-demo.py

Copied!

Example output:

AWS Bedrock Rerank Demo: RAG Pipeline Improvement
============================================================

Query: What are the health benefits of regular exercise?

Candidate documents: 5

--- BEFORE RERANK (Original retrieval order) ---
0. Regular exercise can improve cardiovascular health and reduce the risk of hea...
1. The Eiffel Tower was completed in 1889 and stands 324 meters tall....
2. Exercise helps maintain healthy weight by burning calories and building muscl...
3. Python is a high-level programming language known for its simplicity and read...
4. Physical activity strengthens bones and muscles, reducing the risk of osteopo...

--- RERANKING ---
✓ Reranking complete

--- AFTER RERANK (Ordered by relevance) ---
0. [Relevance: 0.989] Regular exercise can improve cardiovascular health and reduce the risk of hea...
2. [Relevance: 0.876] Exercise helps maintain healthy weight by burning calories and building muscl...
4. [Relevance: 0.823] Physical activity strengthens bones and muscles, reducing the risk of osteopo...

--- TOP RESULT FOR LLM CONTEXT ---
Relevance Score: 0.989
Document: Regular exercise can improve cardiovascular health and reduce the risk of heart disease.

============================================================
Demo complete

Copied!

The output shows how reranking improves retrieval quality. The three exercise-related documents (indices 0, 2, 4) are correctly identified as most relevant with high scores above 0.82. The irrelevant documents about the Eiffel Tower and Python programming are filtered out, not appearing in the top 3 results.

This reranking step ensures that when you send context to an LLM for generation, you’re providing the most semantically relevant information, improving answer quality and reducing hallucinations.

Cleanup

Clean up Konnect environment

If you created a new control plane and want to conserve your free trial credits or avoid unnecessary charges, delete the new control plane used in this tutorial.

Destroy the Kong Gateway container

curl -Ls https://get.konghq.com/quickstart | bash -s -- -d

Copied!

FAQs

What is reranking and why is it useful?

Reranking takes a list of search results and reorders them by semantic relevance to a query. This improves retrieval quality in RAG pipelines by ensuring the most relevant documents are sent to the LLM for generation.

How many documents can I rerank at once?

AWS Bedrock’s Rerank API supports reranking up to 1,000 documents per request. The numberOfResults parameter controls how many of the highest-ranked results are returned.

What rerank models are available?

AWS Bedrock offers cohere.rerank-v3-5:0 and amazon.rerank-v1:0. Cohere Rerank 3.5 is available in most regions, while Amazon Rerank 1.0 is not available in us-east-1.

Use AWS Bedrock rerank API with AI Proxy

Prerequisites

Kong Konnect

Kong Gateway running

decK v1.43+

Required entities

AWS credentials and Bedrock model access

Python and requests library

Configure the plugin

Use AWS Bedrock Rerank API

Validate the configuration

Cleanup

Clean up Konnect environment

Destroy the Kong Gateway container

FAQs

Help us make these docs great!

Still need help