Route Google Gemini CLI traffic through AI Gateway

Deployment Platform
Tags
#ai
Related Resources
Minimum Version
Kong Gateway - 3.10
TL;DR

Configure the AI Proxy plugin to forward requests to Google Gemini, then enable the File Log plugin to inspect traffic, and point Gemini CLI to the local proxy endpoint so all LLM requests go through the Gateway for monitoring and control.

Prerequisites

This is a Konnect tutorial and requires a Konnect personal access token.

  1. Create a new personal access token by opening the Konnect PAT page and selecting Generate Token.

  2. Export your token to an environment variable:

     export KONNECT_TOKEN='YOUR_KONNECT_PAT'
    
  3. Run the quickstart script to automatically provision a Control Plane and Data Plane, and configure your environment:

     curl -Ls https://get.konghq.com/quickstart | bash -s -- -k $KONNECT_TOKEN --deck-output
    

    This sets up a Konnect Control Plane named quickstart, provisions a local Data Plane, and prints out the following environment variable exports:

     export DECK_KONNECT_TOKEN=$KONNECT_TOKEN
     export DECK_KONNECT_CONTROL_PLANE_NAME=quickstart
     export KONNECT_CONTROL_PLANE_URL=https://us.api.konghq.com
     export KONNECT_PROXY_URL='http://localhost:8000'
    

    Copy and paste these into your terminal to configure your session.

This tutorial requires Kong Gateway Enterprise. If you don’t have Kong Gateway set up yet, you can use the quickstart script with an enterprise license to get an instance of Kong Gateway running almost instantly.

  1. Export your license to an environment variable:

     export KONG_LICENSE_DATA='LICENSE-CONTENTS-GO-HERE'
    
  2. Run the quickstart script:

    curl -Ls https://get.konghq.com/quickstart | bash -s -- -e KONG_LICENSE_DATA 
    

    Once Kong Gateway is ready, you will see the following message:

     Kong Gateway Ready
    

decK is a CLI tool for managing Kong Gateway declaratively with state files. To complete this tutorial, install decK version 1.43 or later.

This guide uses deck gateway apply, which directly applies entity configuration to your Gateway instance. We recommend upgrading your decK installation to take advantage of this tool.

You can check your current decK version with deck version.

For this tutorial, you’ll need Kong Gateway entities, like Gateway Services and Routes, pre-configured. These entities are essential for Kong Gateway to function but installing them isn’t the focus of this guide. Follow these steps to pre-configure them:

  1. Run the following command:

    echo '
    _format_version: "3.0"
    services:
      - name: example-service
        url: http://httpbin.konghq.com/anything
    routes:
      - name: example-route
        paths:
        - "/anything"
        service:
          name: example-service
    ' | deck gateway apply -
    

To learn more about entities, you can read our entities documentation.

Before you begin, you must get the Gemini API key from Google Cloud:

  1. Go to the Google Cloud Console.
  2. Select or create a project.
  3. Enable the Generative Language API:
    • Navigate to APIs & Services > Library.
    • Search for “Generative Language API”.
    • Click Enable.
  4. Create an API key:
    • Navigate to APIs & Services > Credentials.
    • Click Create Credentials > API Key.
    • Copy the generated API key.

Export the API key as an environment variable:

export DECK_GEMINI_API_KEY="<your_gemini_api_key>"

This tutorial uses the Google Gemini CLI. Install Node.js 18+ if needed (verify with node --version), then install and launch the Gemini CLI.

  1. Run the following command in your terminal to install the Gemini CLI:

     npm install -g @google/gemini-cli
    
  2. Once the installation process is complete, verify the installation:

     gemini --version
    
  3. The CLI will display the installed version number.

Configure the AI Proxy plugin

First, let’s configure the AI Proxy plugin. The Gemini CLI expects to communicate with Google’s Gemini API using the chat endpoint. The plugin handles authentication using a query parameter and forwards requests to the specified model. CLI tools installed across multiple developer machines typically require distributing API keys to each installation, which exposes credentials and makes rotation difficult.

Routing CLI tools through AI Gateway removes this requirement. Developers authenticate against the gateway instead of directly to AI providers. You can centralize authentication, enforce rate limits, track usage costs, enforce guardrails, and cache repeated requests.

echo '
_format_version: "3.0"
plugins:
  - name: ai-proxy
    config:
      max_request_body_size: 4194304
      logging:
        log_statistics: true
        log_payloads: true
      route_type: llm/v1/chat
      llm_format: gemini
      auth:
        param_name: key
        param_value: "${{ env "DECK_GEMINI_API_KEY" }}"
        param_location: query
      model:
        provider: gemini
        name: gemini-2.5-flash
' | deck gateway apply -

Configure the File Log plugin

Now, let’s configure the File Log plugin to inspect the traffic between Gemini CLI and AI Gateway by attaching a File Log plugin to the Service. This creates a local log file for examining requests and responses as Gemini CLI runs through Kong Gateway.

echo '
_format_version: "3.0"
plugins:
  - name: file-log
    config:
      path: "/tmp/gemini.json"
' | deck gateway apply -

Export environment variables

Open a new terminal window and export the variables that the Gemini CLI will use. Point GOOGLE_GEMINI_BASE_URL to the local proxy endpoint where LLM traffic from Gemini CLI will route:

export GOOGLE_GEMINI_BASE_URL="http://localhost:8000/anything"
export GEMINI_API_KEY="YOUR-GEMINI-API-KEY"
export GOOGLE_GEMINI_BASE_URL="http://localhost:8000/anything"
export GEMINI_API_KEY="YOUR-GEMINI-API-KEY"

If you’re using a different Konnect proxy URL, be sure to replace http://localhost:8000 with your proxy URL.

Validate the configuration

Now you can test the Gemini CLI setup.

  1. In the terminal where you exported your Gemini environment variables, run:

    gemini --model gemini-2.5-flash
    

    You should see the Gemini CLI interface start up.

  2. Run a command to test the connection:

    Tell me about prisoner's dilemma.
    

    Expected output will show the model’s response to your prompt.

  3. In your other terminal window, check that LLM traffic went through AI Gateway:

    docker exec kong-quickstart-gateway cat /tmp/gemini.json | jq
    

    Look for entries similar to:

    {
      ...
      "ai": {
        "proxy": {
          "usage": {
            "prompt_tokens": 7795,
            "completion_tokens": 483,
            "total_tokens": 8278,
            "time_per_token": 10.513457556936,
            "time_to_first_token": 845
          },
          "meta": {
            "provider_name": "gemini",
            "request_model": "gemini-2.5-flash",
            "response_model": "gemini-2.5-flash",
            "llm_latency": 5078,
            "request_mode": "stream"
          }
        }
      }
      ...
    }
    

Cleanup

If you created a new control plane and want to conserve your free trial credits or avoid unnecessary charges, delete the new control plane used in this tutorial.

curl -Ls https://get.konghq.com/quickstart | bash -s -- -d
Something wrong?

Help us make these docs great!

Kong Developer docs are open source. If you find these useful and want to make them better, contribute today!