Use Cohere rerank API for document-grounded chat with AI Proxy in Kong Gateway

Uses: Kong Gateway AI Gateway deck

Deployment Platform

konnect

on-prem

Prerequisites

Kong Konnect

This is a Konnect tutorial and requires a Konnect personal access token.

Create a new personal access token by opening the Konnect PAT page and selecting Generate Token.
Export your token to an environment variable:
```
 export KONNECT_TOKEN='YOUR_KONNECT_PAT'
```
Copied!
Run the quickstart script to automatically provision a Control Plane and Data Plane, and configure your environment:
```
 curl -Ls https://get.konghq.com/quickstart | bash -s -- -k $KONNECT_TOKEN --deck-output
```
Copied!
This sets up a Konnect Control Plane named quickstart, provisions a local Data Plane, and prints out the following environment variable exports:
```
 export DECK_KONNECT_TOKEN=$KONNECT_TOKEN
 export DECK_KONNECT_CONTROL_PLANE_NAME=quickstart
 export KONNECT_CONTROL_PLANE_URL=https://us.api.konghq.com
 export KONNECT_PROXY_URL='http://localhost:8000'
```
Copied!
Copy and paste these into your terminal to configure your session.

Kong Gateway running

This tutorial requires Kong Gateway Enterprise. If you don’t have Kong Gateway set up yet, you can use the quickstart script with an enterprise license to get an instance of Kong Gateway running almost instantly.

Export your license to an environment variable:

 export KONG_LICENSE_DATA='LICENSE-CONTENTS-GO-HERE'

Copied!

Run the quickstart script:

curl -Ls https://get.konghq.com/quickstart | bash -s -- -e KONG_LICENSE_DATA

Copied!

Once Kong Gateway is ready, you will see the following message:

 Kong Gateway Ready

decK v1.43+

decK is a CLI tool for managing Kong Gateway declaratively with state files. To complete this tutorial, install decK version 1.43 or later.

This guide uses deck gateway apply, which directly applies entity configuration to your Gateway instance. We recommend upgrading your decK installation to take advantage of this tool.

You can check your current decK version with deck version.

Required entities

For this tutorial, you’ll need Kong Gateway entities, like Gateway Services and Routes, pre-configured. These entities are essential for Kong Gateway to function but installing them isn’t the focus of this guide. Follow these steps to pre-configure them:

Run the following command:

echo '
_format_version: "3.0"
services:
  - name: rerank-service
    url: http://httpbin.konghq.com/rerank
routes:
  - name: rerank-route
    paths:
    - "/rerank"
    service:
      name: rerank-service
    protocols:
    - http
    - https
' | deck gateway apply -

Copied!

To learn more about entities, you can read our entities documentation.

Cohere API Key

Before you begin, you must get a Cohere API key:

Sign up at Cohere
Navigate to API Keys in your dashboard
Create a new API key

Export the API key as an environment variable:

export DECK_COHERE_API_KEY="<your-api-key>"

Copied!

Python and requests library

Install Python 3 and the requests library:

pip install requests

Copied!

Configure the plugin

Configure AI Proxy to use Cohere’s document-grounded chat:

echo '
_format_version: "3.0"
plugins:
  - name: ai-proxy
    service: rerank-service
    config:
      llm_format: cohere
      route_type: llm/v1/chat
      logging:
        log_payloads: false
        log_statistics: true
      model:
        provider: cohere
        name: command-a-03-2025
      auth:
        header_name: Authorization
        header_value: Bearer ${{ env "DECK_COHERE_API_KEY" }}
' | deck gateway apply -

Copied!

Use Cohere document-grounded chat

Cohere’s document-grounded chat filters candidate documents and generates answers in a single API call. Send a query with candidate documents. The model selects relevant documents, generates an answer using only those documents, and returns citations linking answer segments to sources. This replaces multi-step RAG pipelines with one request.

The following script sends a query with 5 candidate documents to Cohere’s chat endpoint. Three documents discuss green tea health benefits. Two documents are intentionally irrelevant (Eiffel Tower, Python programming).

The script attempts to show which documents the model used by comparing the documents field in the response to the input documents. This demonstrates whether Cohere’s document-grounded chat filters out irrelevant documents automatically.

Create the script:

cat > grounded-chat-demo.py << 'EOF'
#!/usr/bin/env python3
"""Demonstrate document filtering in Cohere grounded chat"""

import requests
import json

CHAT_URL = "http://localhost:8000/rerank"

print("Cohere Document Filtering Demo")
print("=" * 60)

query = "What are the health benefits of drinking green tea?"
documents = [
    {"text": "Green tea contains powerful antioxidants called catechins that may help reduce inflammation and protect cells from damage."},
    {"text": "The Eiffel Tower is a wrought-iron lattice tower located in Paris, France, and is one of the most recognizable structures in the world."},
    {"text": "Studies suggest that regular green tea consumption may boost metabolism and support weight management."},
    {"text": "Python is a high-level programming language known for its simplicity and readability, widely used in data science and web development."},
    {"text": "Green tea has been associated with improved brain function and may reduce the risk of neurodegenerative diseases."}
]

print(f"\nQuery: {query}\n")

# Show input documents
print("--- INPUT: All Candidate Documents ---")
for idx, doc in enumerate(documents, 1):
    print(f"{idx}. {doc['text']}")

# Send request
response = requests.post(
    CHAT_URL,
    headers={"Content-Type": "application/json"},
    json={
        "model": "command-a-03-2025",
        "query": query,
        "documents": documents,
        "return_documents": True
    }
)

result = response.json()

# Extract document IDs that were used
used_doc_ids = set()
if 'documents' in result:
    for doc in result['documents']:
        # Map returned docs back to original indices
        for idx, orig_doc in enumerate(documents):
            if doc['text'] == orig_doc['text']:
                used_doc_ids.add(idx)

# Show relevant documents
print("\n--- OUTPUT: Relevant Documents (Used in answer) ---")
if 'documents' in result:
    for doc in result['documents']:
        print(f"✓ {doc['text']}")

# Show filtered documents
print("\n--- FILTERED OUT: Irrelevant Documents ---")
for idx, doc in enumerate(documents):
    if idx not in used_doc_ids:
        print(f"✗ {doc['text']}")

# Show answer with citations
print("\n--- GENERATED ANSWER ---")
print(result.get('text', ''))

if 'citations' in result:
    print("\n--- CITATIONS ---")
    for citation in result['citations']:
        print(f"- \"{citation['text']}\" → {citation['document_ids']}")

print("\n" + "=" * 60)
EOF

Copied!

Verify that the return_documents parameter actually returns the filtered document subset. Check Cohere’s API documentation or test the script to confirm this behavior.

Validate the configuration

Let’s run the script we created in the previous step:

python3 grounded-chat-demo.py

Copied!

Example output:

Cohere Document Filtering Demo
============================================================

Query: What are the health benefits of drinking green tea?

--- INPUT: All Candidate Documents ---
1. Green tea contains powerful antioxidants called catechins that may help reduce inflammation and protect cells from damage.
2. The Eiffel Tower is a wrought-iron lattice tower located in Paris, France, and is one of the most recognizable structures in the world.
3. Studies suggest that regular green tea consumption may boost metabolism and support weight management.
4. Python is a high-level programming language known for its simplicity and readability, widely used in data science and web development.
5. Green tea has been associated with improved brain function and may reduce the risk of neurodegenerative diseases.

--- PROCESSING ---
Filtering documents and generating answer... ✓

--- OUTPUT: Relevant Documents (Used in answer) ---
✓ Green tea contains powerful antioxidants called catechins that may help reduce inflammation and protect cells from damage.
✓ Green tea has been associated with improved brain function and may reduce the risk of neurodegenerative diseases.
✓ Studies suggest that regular green tea consumption may boost metabolism and support weight management.

--- FILTERED OUT: Irrelevant Documents ---
✗ The Eiffel Tower is a wrought-iron lattice tower located in Paris, France, and is one of the most recognizable structures in the world.
✗ Python is a high-level programming language known for its simplicity and readability, widely used in data science and web development.

--- GENERATED ANSWER ---
Green tea has powerful antioxidants called catechins that may reduce inflammation and protect cells from damage. It has also been associated with improved brain function and may reduce the risk of neurodegenerative diseases. Regular consumption may boost metabolism and support weight management.

--- CITATIONS ---
- "powerful antioxidants called catechins" → ['doc_0']
- "reduce inflammation" → ['doc_0']
- "protect cells from damage." → ['doc_0']
- "associated with improved brain function" → ['doc_4']
- "reduce the risk of neurodegenerative diseases." → ['doc_4']
- "Regular consumption" → ['doc_2']
- "boost metabolism" → ['doc_2']
- "support weight management." → ['doc_2']

============================================================

Copied!

As you can see, the output shows three document-grounding behaviors:

Automatic filtering: The model used only the three green tea documents. It filtered out the Eiffel Tower and Python documents.
Source-restricted generation: The answer contains only information from the input documents.
Citation mapping: Each statement maps to specific source documents through the document_ids field.

Cleanup

Clean up Konnect environment

If you created a new control plane and want to conserve your free trial credits or avoid unnecessary charges, delete the new control plane used in this tutorial.

Destroy the Kong Gateway container

curl -Ls https://get.konghq.com/quickstart | bash -s -- -d

Copied!

FAQs

What is document-grounded chat and why is it useful?

Document-grounded chat generates answers based only on provided documents, automatically filtering for relevance and providing citations. This improves RAG pipelines by combining retrieval filtering and answer generation in a single step.

How many documents can I provide?

Cohere’s Chat API supports multiple documents per request. The model automatically selects the most relevant documents for generating the answer.

What models support document grounding?

Cohere models including command-a-03-2025 support document-grounded chat. Refer to the Cohere documentation for the complete list of available models.

Use Cohere rerank API for document-grounded chat with AI Proxy in Kong Gateway

Prerequisites

Kong Konnect

Kong Gateway running

decK v1.43+

Required entities

Cohere API Key

Python and requests library

Configure the plugin

Use Cohere document-grounded chat

Validate the configuration

Cleanup

Clean up Konnect environment

Destroy the Kong Gateway container

FAQs

Help us make these docs great!

Still need help?