Use Gemini's thinkingConfig with AI Proxy Advanced in AI Gateway

Uses: Kong Gateway AI Gateway deck

Deployment Platform

konnect

on-prem

Prerequisites

Kong Konnect

This is a Konnect tutorial and requires a Konnect personal access token.

Create a new personal access token by opening the Konnect PAT page and selecting Generate Token.
Export your token to an environment variable:
```
 export KONNECT_TOKEN='YOUR_KONNECT_PAT'
```
Copied!
Run the quickstart script to automatically provision a Control Plane and Data Plane, and configure your environment:
```
 curl -Ls https://get.konghq.com/quickstart | bash -s -- -k $KONNECT_TOKEN --deck-output
```
Copied!
This sets up a Konnect Control Plane named quickstart, provisions a local Data Plane, and prints out the following environment variable exports:
```
 export DECK_KONNECT_TOKEN=$KONNECT_TOKEN
 export DECK_KONNECT_CONTROL_PLANE_NAME=quickstart
 export KONNECT_CONTROL_PLANE_URL=https://us.api.konghq.com
 export KONNECT_PROXY_URL='http://localhost:8000'
```
Copied!
Copy and paste these into your terminal to configure your session.

Kong Gateway running

This tutorial requires Kong Gateway Enterprise. If you don’t have Kong Gateway set up yet, you can use the quickstart script with an enterprise license to get an instance of Kong Gateway running almost instantly.

Export your license to an environment variable:

 export KONG_LICENSE_DATA='LICENSE-CONTENTS-GO-HERE'

Copied!

Run the quickstart script:

curl -Ls https://get.konghq.com/quickstart | bash -s -- -e KONG_LICENSE_DATA

Copied!

Once Kong Gateway is ready, you will see the following message:

 Kong Gateway Ready

decK v1.43+

decK is a CLI tool for managing Kong Gateway declaratively with state files. To complete this tutorial, install decK version 1.43 or later.

This guide uses deck gateway apply, which directly applies entity configuration to your Gateway instance. We recommend upgrading your decK installation to take advantage of this tool.

You can check your current decK version with deck version.

Required entities

For this tutorial, you’ll need Kong Gateway entities, like Gateway Services and Routes, pre-configured. These entities are essential for Kong Gateway to function but installing them isn’t the focus of this guide. Follow these steps to pre-configure them:

Run the following command:

echo '
_format_version: "3.0"
services:
  - name: example-service
    url: http://httpbin.konghq.com/anything
routes:
  - name: example-route
    paths:
    - "/anything"
    service:
      name: example-service
' | deck gateway apply -

Copied!

To learn more about entities, you can read our entities documentation.

Vertex AI

Before you begin, you must get the following credentials from Google Cloud:

Service Account Key: A JSON key file for a service account with Vertex AI permissions
Project ID: Your Google Cloud project identifier
Location ID: Your Google Cloud project location identifier
API Endpoint: The global Vertex AI API endpoint https://aiplatform.googleapis.com

After creating the key, convert the contents of modelarmor-admin-key.json into a single-line JSON string. Escape all necessary characters. Quotes (") become \" and newlines become \n. The result must be a valid one-line JSON string.

Then export your credentials as environment variables:

export DECK_GCP_SERVICE_ACCOUNT_JSON="<single-line-escaped-json>"
export DECK_GCP_LOCATION_ID="<your_location_id>"
export DECK_GCP_API_ENDPOINT="<your_gcp_api_endpoint>"
export DECK_GCP_PROJECT_ID="<your-gcp-project-id>"

Copied!

Set up GCP Application Default Credentials (ADC) with your quota project:

gcloud auth application-default set-quota-project <your_gcp_project_id>

Copied!

Replace <your_gcp_project_id> with your actual project ID. This configures ADC to use your project for API quota and billing.

Python

To complete this tutorial, you’ll need Python (version 3.7 or later) and pip installed on your machine. You can verify it by running:

python3
python3 -m pip --version

Copied!

Create a virtual env:
```
python3 -m venv myenv
```
Copied!
Activate it:
```
source myenv/bin/activate
```
Copied!

OpenAI SDK

Install the OpenAI SDK:

pip install openai

Copied!

Configure the plugin

First, let’s configure AI Proxy Advanced to use the gemini-3-pro-preview models via Vertex AI:

echo '
_format_version: "3.0"
plugins:
  - name: ai-proxy-advanced
    config:
      genai_category: text/generation
      targets:
      - route_type: llm/v1/chat
        model:
          provider: gemini
          name: gemini-3-pro-preview
          options:
            gemini:
              api_endpoint: aiplatform.googleapis.com
              project_id: "${{ env "DECK_GCP_PROJECT_ID" }}"
              location_id: global
        auth:
          allow_override: false
          gcp_use_service_account: true
          gcp_service_account_json: "${{ env "DECK_GCP_SERVICE_ACCOUNT_JSON" }}"
' | deck gateway apply -

Copied!

Use the OpenAI SDK with `thinkingConfig`

Gemini 3 models support a thinkingConfig feature that returns detailed reasoning traces alongside the final response. This allows you to see how the model arrived at its answer. For more information, see Gemini Thinking Mode.

The thinkingConfig supports the following parameters:

include_thoughts (boolean): Set to true to include reasoning traces in the response.
thinking_budget (integer): Controls the depth and detail of reasoning. Higher values (up to 200) produce more detailed reasoning traces but may increase latency.

Create a Python script using the OpenAI SDK:

cat << 'EOF' > thinking-config.py
from openai import OpenAI
client = OpenAI(
    base_url="http://localhost:8000/anything",
    api_key="ignored"
)
response = client.chat.completions.create(
    model="gemini-3-pro-preview",
    messages=[
        {
            "role": "user",
            "content": "Three logicians walk into a bar. The bartender asks 'Do all of you want a drink?' The first logician says 'I don't know.' The second logician says 'I don't know.' The third logician says 'Yes!' Explain why each logician answered the way they did."
        }
    ],
    extra_body={
        "generationConfig": {
            "thinkingConfig": {
                "include_thoughts": True,
                "thinking_budget": 200
            }
        }
    }
)
content = response.choices[0].message.content
if '<thinking>' in content:
    print("✓ Thoughts included in response\n")
else:
    print("✗ No thoughts found\n")
print(content)
EOF

Copied!

This script sends a logic puzzle that requires multi-step reasoning. Complex queries like this are more likely to produce visible reasoning traces showing how the model analyzes the problem, deduces information from each response, and reaches its conclusion. The thinking_budget of 200 allows for detailed reasoning traces.

The OpenAI SDK sends requests to AI Gateway using the OpenAI chat completions format. The extra_body parameter passes Gemini-specific configuration through to the model. AI Gateway transforms the OpenAI-format request into Gemini’s native format, forwards it to Vertex AI, and converts the response back to OpenAI format with reasoning traces wrapped in <thought> tags.

Now, let’s run the script:

python3 thinking-config.py

Copied!

Example output:

✓ Thoughts found

=== Content ===
<thought>**Dissecting the Riddle's Elements**

I'm focused on the riddle's core. The bartender's question sets the stage, and each logician's response is key. I'm noting how the information unfolds with each "I don't know," allowing the final "Yes!" to make logical sense. Each element in the question and answer is important.


</thought>
This is a classic logic puzzle disguised as a joke. To understand the answers, you have to look at the specific question asked: **"Do *all* of you want a drink?"**

Here is the breakdown of each logician’s thought process:

**The First Logician**
*   **The Situation:** The first logician wants a drink.
*   **The Logic:** If he *didn't* want a drink, the answer to "Do **all** of you want a drink?" would be "No" (because if one person doesn't want one, they don't *all* want one). However, simply knowing that *he* wants a drink isn't enough to answer "Yes," because he doesn't know what the other two want.
*   **The Answer:** Since he cannot say "No" (because he wants one) but cannot say "Yes" (because he doesn't know about the others), his only truthful logical answer is **"I don't know."**

**The Second Logician**
*   **The Situation:** The second logician also wants a drink.
*   **The Logic:** She hears the first logician say "I don't know." She deduces that the first logician *must* want a drink (otherwise he would have said "No"). Now she looks at her own desire. If *she* didn't want a drink, she would answer "No" (because the condition "all" would fail). But she *does* want a drink. However, like the first logician, she doesn't know what the third logician wants.
*   **The Answer:** Since she wants a drink but is unsure of the third person, she also must answer **"I don't know."**

**The Third Logician**
*   **The Situation:** The third logician wants a drink.
*   **The Logic:** He has heard the first two answer "I don't know."
    *   From the first answer, he deduces Logician #1 wants a drink.
    *   From the second answer, he deduces Logician #2 wants a drink.
*   **The Answer:** Since he knows he wants a drink himself, and he has deduced that the other two also want drinks, he now has complete information. Everyone wants a drink. Therefore, he can definitively answer **"Yes!"**

Copied!

The response includes the model’s reasoning process in the <thought> section, followed by the final answer with step-by-step calculations which solve the puzzle.

Cleanup

Clean up Konnect environment

If you created a new control plane and want to conserve your free trial credits or avoid unnecessary charges, delete the new control plane used in this tutorial.

Destroy the Kong Gateway container

curl -Ls https://get.konghq.com/quickstart | bash -s -- -d

Copied!

FAQs

What version of Kong Gateway supports thinkingConfig?

The thinkingConfig feature requires Kong Gateway 3.13 or later.

How are reasoning traces formatted in the response?

Reasoning traces are returned as part of the text content with <thought> tags for easy parsing. You can extract these sections programmatically or display them to end users.

Why don’t I see reasoning traces in my response?

Complex queries are more likely to produce visible reasoning traces. Simple questions may not trigger the thinking mode. Try using more complex problems or increase the thinking_budget parameter.

How does thinking_budget affect performance?

Higher thinking_budget values (up to 200) increase response time but provide more detailed reasoning. Lower values produce faster responses with less detailed traces.

Use Gemini's thinkingConfig with AI Proxy Advanced in AI Gateway

Prerequisites

Kong Konnect

Kong Gateway running

decK v1.43+

Required entities

Vertex AI

Python

OpenAI SDK

Configure the plugin

Use the OpenAI SDK with `thinkingConfig`

Cleanup

Clean up Konnect environment

Destroy the Kong Gateway container

FAQs

Help us make these docs great!

Still need help

Use Gemini's thinkingConfig with AI Proxy Advanced in AI Gateway

Prerequisites

Kong Konnect

Kong Gateway running

decK v1.43+

Required entities

Vertex AI

Python

OpenAI SDK

Configure the plugin

Use the OpenAI SDK with thinkingConfig

Cleanup

Clean up Konnect environment

Destroy the Kong Gateway container

FAQs

Help us make these docs great!

Still need help

Use the OpenAI SDK with `thinkingConfig`