Stream AWS Bedrock function calling responses with AI Proxy Advanced

Deployment Platform
Minimum Version
Kong Gateway - 3.10
TL;DR

Use the same AI Proxy Advanced configuration as the non-streaming variant, with llm_format: bedrock and llm/v1/chat route type. In your client code, call converse_stream instead of converse. The streamed response delivers text chunks incrementally and includes tool use requests that your application handles before sending results back for a final streamed response.

Prerequisites

This is a Konnect tutorial and requires a Konnect personal access token.

  1. Create a new personal access token by opening the Konnect PAT page and selecting Generate Token.

  2. Export your token to an environment variable:

     export KONNECT_TOKEN='YOUR_KONNECT_PAT'
    
  3. Run the quickstart script to automatically provision a Control Plane and Data Plane, and configure your environment:

     curl -Ls https://get.konghq.com/quickstart | bash -s -- -k $KONNECT_TOKEN --deck-output
    

    This sets up a Konnect Control Plane named quickstart, provisions a local Data Plane, and prints out the following environment variable exports:

     export DECK_KONNECT_TOKEN=$KONNECT_TOKEN
     export DECK_KONNECT_CONTROL_PLANE_NAME=quickstart
     export KONNECT_CONTROL_PLANE_URL=https://us.api.konghq.com
     export KONNECT_PROXY_URL='http://localhost:8000'
    

    Copy and paste these into your terminal to configure your session.

This tutorial requires Kong Gateway Enterprise. If you don’t have Kong Gateway set up yet, you can use the quickstart script with an enterprise license to get an instance of Kong Gateway running almost instantly.

  1. Export your license to an environment variable:

     export KONG_LICENSE_DATA='LICENSE-CONTENTS-GO-HERE'
    
  2. Run the quickstart script:

    curl -Ls https://get.konghq.com/quickstart | bash -s -- -e KONG_LICENSE_DATA 
    

    Once Kong Gateway is ready, you will see the following message:

     Kong Gateway Ready
    

decK is a CLI tool for managing Kong Gateway declaratively with state files. To complete this tutorial, install decK version 1.43 or later.

This guide uses deck gateway apply, which directly applies entity configuration to your Gateway instance. We recommend upgrading your decK installation to take advantage of this tool.

You can check your current decK version with deck version.

For this tutorial, you’ll need Kong Gateway entities, like Gateway Services and Routes, pre-configured. These entities are essential for Kong Gateway to function but installing them isn’t the focus of this guide. Follow these steps to pre-configure them:

  1. Run the following command:

    echo '
    _format_version: "3.0"
    services:
      - name: ai-proxy
        url: https://api.openai.com
    routes:
      - name: openai-chat
        paths:
        - "/"
        service:
          name: ai-proxy
    ' | deck gateway apply -
    

To learn more about entities, you can read our entities documentation.

You must have AWS credentials with Bedrock permissions:

  • AWS Access Key ID: Your AWS access key
  • AWS Secret Access Key: Your AWS secret key
  • Region: AWS region where Bedrock is available (for example, us-west-2)
  1. Enable the Cohere Command R model in the AWS Bedrock console under Model Access. Navigate to Bedrock > Model access and request access to cohere.command-r-v1:0.

  2. Export the required values as environment variables:

    export DECK_AWS_ACCESS_KEY_ID="<your-access-key-id>"
    export DECK_AWS_SECRET_ACCESS_KEY="<your-secret-access-key>"
    export DECK_AWS_REGION="us-west-2"
    

Install Python 3 and the Boto3 SDK:

pip install boto3

Configure the plugin

The plugin configuration for streaming is identical to non-streaming function calling. Configure AI Proxy Advanced to accept native AWS Bedrock API payloads. The llm_format: bedrock setting tells Kong to forward requests to the correct Bedrock endpoint, whether the client uses converse or converse_stream.

echo '
_format_version: "3.0"
plugins:
  - name: ai-proxy-advanced
    config:
      llm_format: bedrock
      targets:
      - route_type: llm/v1/chat
        auth:
          allow_override: false
          aws_access_key_id: "${{ env "DECK_AWS_ACCESS_KEY_ID" }}"
          aws_secret_access_key: "${{ env "DECK_AWS_SECRET_ACCESS_KEY" }}"
        model:
          provider: bedrock
          name: cohere.command-r-v1:0
          options:
            bedrock:
              aws_region: "${{ env "DECK_AWS_REGION" }}"
' | deck gateway apply -

The config.llm_format: bedrock setting enables Kong to accept native AWS Bedrock API requests. This configuration works for both converse and converse_stream calls without any changes.

Stream Bedrock function calling responses

The Bedrock ConverseStream API delivers model output as a sequence of events rather than a single complete response. This is particularly useful for function calling, where the interaction involves multiple round trips. Text appears in the terminal as it is generated, and tool use requests arrive as streamed chunks that your application reassembles.

The following script defines a top_song tool and uses converse_stream to interact with the model. When the LLM model requests the tool, the script executes the function locally and then sends the result back through a second converse_stream call.

The stream delivers several event types: messageStart signals the beginning of a response, contentBlockStart and contentBlockDelta carry tool use or text data in fragments, contentBlockStop marks the end of a content block, and messageStop provides the stop reason.

Create the script:

cat > bedrock-stream-tool-use-demo.py << 'EOF'
#!/usr/bin/env python3
"""Demonstrate streaming function calling through Kong's AI Gateway"""

import logging
import json
import boto3

from botocore.exceptions import ClientError

GATEWAY_URL = "http://localhost:8000"

logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)


class StationNotFoundError(Exception):
    """Raised when a radio station isn't found."""
    pass


def get_top_song(call_sign):
    """Returns the most popular song for the given radio station call sign."""
    if call_sign == 'WZPZ':
        return "Elemental Hotel", "8 Storey Hike"
    raise StationNotFoundError(f"Station {call_sign} not found.")


def stream_messages(bedrock_client, model_id, messages, tool_config):
    """Sends a message and processes the streamed response.

    Reassembles text and tool use content from stream events.
    Text chunks are printed to stdout as they arrive.

    Returns:
        stop_reason: The reason the model stopped generating.
        message: The fully reassembled response message.
    """

    logger.info("Streaming messages with model %s", model_id)

    response = bedrock_client.converse_stream(
        modelId=model_id,
        messages=messages,
        toolConfig=tool_config
    )

    stop_reason = ""
    message = {}
    content = []
    message['content'] = content
    text = ''
    tool_use = {}

    for chunk in response['stream']:
        if 'messageStart' in chunk:
            message['role'] = chunk['messageStart']['role']
        elif 'contentBlockStart' in chunk:
            tool = chunk['contentBlockStart']['start']['toolUse']
            tool_use['toolUseId'] = tool['toolUseId']
            tool_use['name'] = tool['name']
        elif 'contentBlockDelta' in chunk:
            delta = chunk['contentBlockDelta']['delta']
            if 'toolUse' in delta:
                if 'input' not in tool_use:
                    tool_use['input'] = ''
                tool_use['input'] += delta['toolUse']['input']
            elif 'text' in delta:
                text += delta['text']
                print(delta['text'], end='')
        elif 'contentBlockStop' in chunk:
            if 'input' in tool_use:
                tool_use['input'] = json.loads(tool_use['input'])
                content.append({'toolUse': tool_use})
                tool_use = {}
            else:
                content.append({'text': text})
                text = ''
        elif 'messageStop' in chunk:
            stop_reason = chunk['messageStop']['stopReason']

    return stop_reason, message


def main():
    logging.basicConfig(level=logging.INFO, format="%(levelname)s: %(message)s")

    model_id = "cohere.command-r-v1:0"
    input_text = "What is the most popular song on WZPZ?"

    try:
        bedrock_client = boto3.client(
            "bedrock-runtime",
            region_name="us-west-2",
            endpoint_url=GATEWAY_URL,
            aws_access_key_id="dummy",
            aws_secret_access_key="dummy",
        )

        messages = [{"role": "user", "content": [{"text": input_text}]}]

        tool_config = {
            "tools": [
                {
                    "toolSpec": {
                        "name": "top_song",
                        "description": "Get the most popular song played on a radio station.",
                        "inputSchema": {
                            "json": {
                                "type": "object",
                                "properties": {
                                    "sign": {
                                        "type": "string",
                                        "description": "The call sign for the radio station for which you want the most popular song. Example call signs are WZPZ and WKRP."
                                    }
                                },
                                "required": ["sign"]
                            }
                        }
                    }
                }
            ]
        }

        stop_reason, message = stream_messages(
            bedrock_client, model_id, messages, tool_config)
        messages.append(message)

        if stop_reason == "tool_use":
            for block in message['content']:
                if 'toolUse' in block:
                    tool = block['toolUse']

                    if tool['name'] == 'top_song':
                        try:
                            song, artist = get_top_song(tool['input']['sign'])
                            tool_result = {
                                "toolUseId": tool['toolUseId'],
                                "content": [{"json": {"song": song, "artist": artist}}]
                            }
                        except StationNotFoundError as err:
                            tool_result = {
                                "toolUseId": tool['toolUseId'],
                                "content": [{"text": err.args[0]}],
                                "status": 'error'
                            }

                        messages.append({
                            "role": "user",
                            "content": [{"toolResult": tool_result}]
                        })

            stop_reason, message = stream_messages(
                bedrock_client, model_id, messages, tool_config)

    except ClientError as err:
        message = err.response['Error']['Message']
        logger.error("A client error occurred: %s", message)
        print(f"A client error occurred: {message}")
    else:
        print(f"\nFinished streaming messages with model {model_id}.")


if __name__ == "__main__":
    main()
EOF

The script points a Boto3 client at the AI Gateway route (http://localhost:8000) with dummy credentials. AI Gateway replaces these credentials with the real AWS keys from the plugin configuration before forwarding to Bedrock.

The interaction follows two streaming rounds:

  1. The first converse_stream call sends the user question and tool definition. The model responds with a stream that contains a tool use request, delivering the function name (top_song) and input arguments ({"sign": "WZPZ"}) across multiple contentBlockDelta events. The script reassembles these fragments into a complete tool call.
  2. The script executes get_top_song("WZPZ") locally and appends the result to the message history. A second converse_stream call sends the full conversation, including the tool result. The model streams its final answer, with each text chunk printed to the terminal as it arrives.

Validate the configuration

Run the script:

python3 bedrock-stream-tool-use-demo.py

Expected output:

INFO:__main__:Streaming messages with model cohere.command-r-v1:0
INFO:__main__:Streaming messages with model cohere.command-r-v1:0
I will search for the most popular song on WZPZ and relay this information to the user.The most popular song on WZPZ is Elemental Hotel by 8 Storey Hike.
Finished streaming messages with model cohere.command-r-v1:0.

The INFO line appears twice because the script makes two converse_stream calls: one for the initial request (which results in a tool use), and one after sending the tool result back. The final text response streams to the terminal as it is generated.

If the request fails with authentication errors, confirm that the aws_access_key_id and aws_secret_access_key in your plugin configuration are valid and that the Cohere Command R model is enabled in your AWS Bedrock console.

Cleanup

If you created a new control plane and want to conserve your free trial credits or avoid unnecessary charges, delete the new control plane used in this tutorial.

curl -Ls https://get.konghq.com/quickstart | bash -s -- -d

FAQs

The converse method waits for the full model response before returning. The converse_stream method returns an event stream that delivers response chunks as they are generated. Streaming reduces perceived latency for the end user, since text appears incrementally rather than all at once. Both methods support function calling with the same tool configuration format.

Cohere Command R and Command R+, Anthropic Claude 3 and later, and Amazon Titan models support streaming function calling through the ConverseStream API. Check the AWS documentation for the full compatibility matrix.

Something wrong?

Help us make these docs great!

Kong Developer docs are open source. If you find these useful and want to make them better, contribute today!