Voice AI Pipeline Observability

Prerequisites

Kong Konnect

This tutorial uses Kong Konnect. The quickstart script provisions a recipe-scoped Control Plane and local Data Plane.

Create a new personal access token by opening the Konnect PAT page and selecting Generate Token.
Export your token. The same token is reused later for kongctl commands:
```
export KONNECT_TOKEN='YOUR_KONNECT_PAT'
```
Copied!
Set the recipe-scoped control plane name and run the quickstart script. The two -e flags pass non-default tracing settings into the Data Plane container; the OpenTelemetry Plugin will not export traces unless the Data Plane is started with KONG_TRACING_INSTRUMENTATIONS enabled, and the default sampling rate (0.01) drops 99% of spans before the Plugin sees them:
```
export KONNECT_CONTROL_PLANE_NAME='voice-ai-observability-recipe'
curl -Ls https://get.konghq.com/quickstart | \
  bash -s -- -k $KONNECT_TOKEN \
    -e KONG_TRACING_INSTRUMENTATIONS=all \
    -e KONG_TRACING_SAMPLING_RATE=1.0 \
    --deck-output
```
Copied!
This provisions a Konnect Control Plane named voice-ai-observability-recipe, a local Data Plane connected to it with tracing enabled, and prints export lines for the rest of the session vars. Paste those into your shell when prompted.

kongctl + decK

This tutorial uses kongctl and decK to manage Kong configuration.

Install kongctl from developer.konghq.com/kongctl.
Install decK version 1.43 or later from docs.konghq.com/deck.
Verify both are installed:
```
kongctl version
deck version
```
Copied!

AI Credentials

Every provider tab uses OpenAI for the STT and TTS hops (Whisper and TTS-1), so an OpenAI key is required regardless of which LLM provider you pick. Configure the LLM provider you plan to use as well.

Create an OpenAI account.
Get an API key.
Export the OpenAI key (used by all providers for STT and TTS):
```
export DECK_OPENAI_TOKEN='Bearer sk-YOUR-KEY'
```
Copied!

Then configure your LLM provider:

Already covered above. The same DECK_OPENAI_TOKEN is used for STT, TTS, and the LLM hop.

Create an Anthropic account.
Get an API key.
Create decK variables for the API key and the Messages API schema version Kong should send upstream on every request:
```
export DECK_ANTHROPIC_TOKEN='YOUR-ANTHROPIC-KEY'
export DECK_ANTHROPIC_VERSION='2023-06-01'
```
Copied!

Ensure you have an AWS account with Bedrock model access enabled.

Create decK variables with your AWS credentials:

export DECK_AWS_ACCESS_KEY_ID='your-access-key'
export DECK_AWS_SECRET_ACCESS_KEY='your-secret-key'
export DECK_AWS_REGION='us-east-1'

Copied!

Create an Azure OpenAI resource.
Deploy a model and note your instance name, deployment ID, and API version.

Create decK variables:

export DECK_AZURE_API_KEY='your-azure-api-key'
export DECK_AZURE_INSTANCE='your-instance-name'
export DECK_AZURE_DEPLOYMENT_ID='your-deployment-id'
export DECK_AZURE_API_VERSION='YOUR-API-VERSION'  # check Azure docs for current version

Copied!

Create a Google Cloud project with Vertex AI enabled.
Create a service account and mount the JSON key file in your Kong container.

Create decK variables:

export DECK_GCP_API_ENDPOINT='your-api-endpoint'
export DECK_GCP_PROJECT_ID='your-project-id'
export DECK_GCP_LOCATION_ID='us-central1'

Copied!

Create a Mistral account.
Get an API key.

Create a decK variable with the API key:

export DECK_MISTRAL_TOKEN='Bearer your-mistral-key'

Copied!

Langfuse

Langfuse is an open-source observability platform that receives OpenTelemetry traces and groups them into conversation-level sessions. This recipe exports Kong’s gen_ai.* spans to Langfuse for per-hop and conversation-level analysis.

Langfuse authenticates OTLP ingestion with HTTP Basic Auth, where the username is your project’s Public Key (pk-lf-...) and the password is its Secret Key (sk-lf-...).

Sign up for Langfuse Cloud (free tier) on the region you want to use, or self-host Langfuse v3.22.0 or later using Docker Compose. OTLP ingestion is only available on self-hosted Langfuse v3.22.0+. Each region is a fully separate deployment with its own UI, account, projects, and API keys. Keys issued in one region will not authenticate against another, so pick one and stick with it:
- US: us.cloud.langfuse.com
- EU: cloud.langfuse.com
- HIPAA: hipaa.cloud.langfuse.com
Create a project and copy the Public Key and Secret Key from the project settings.

Export both keys and the OTLP endpoint for the same region you signed up under, then derive the Basic Auth header from them:

export DECK_LANGFUSE_PUBLIC_KEY='pk-lf-YOUR-PUBLIC-KEY'
export DECK_LANGFUSE_SECRET_KEY='sk-lf-YOUR-SECRET-KEY'
# Pick the endpoint that matches the region you signed up under:
#   US:    https://us.cloud.langfuse.com/api/public/otel/v1/traces
#   EU:    https://cloud.langfuse.com/api/public/otel/v1/traces
#   HIPAA: https://hipaa.cloud.langfuse.com/api/public/otel/v1/traces
export DECK_LANGFUSE_OTLP_ENDPOINT='https://us.cloud.langfuse.com/api/public/otel/v1/traces'
export DECK_LANGFUSE_AUTH_HEADER="Basic $(printf '%s:%s' "$DECK_LANGFUSE_PUBLIC_KEY" "$DECK_LANGFUSE_SECRET_KEY" | base64 | tr -d '\n')"

Copied!

The tr -d '\n' strips the newline that GNU base64 inserts after 76 characters, which would otherwise corrupt the Authorization header. If your endpoint and key region don’t match, Langfuse will return 401 Unauthorized on every trace export.

For self-hosted Langfuse, set the endpoint to http://host.docker.internal:3000/api/public/otel/v1/traces so the Kong container can reach Langfuse on the host.

Python 3.11+

The demo script requires Python 3.11 or later. Set up an isolated environment:

python3 -m venv .venv
source .venv/bin/activate
pip install 'openai>=1.0.0' 'opentelemetry-api>=1.27.0' 'opentelemetry-sdk>=1.27.0' 'opentelemetry-exporter-otlp-proto-http>=1.27.0' 'opentelemetry-instrumentation-httpx>=0.48b0'

Copied!

The demo uses the OpenTelemetry SDK plus the httpx auto-instrumentation to emit a voice-turn parent span per turn and to inject W3C traceparent into every outbound call so Kong’s per-hop spans nest correctly under it in Langfuse.

The problem

Voice AI systems present observability challenges that text-based LLM applications do not face. The core difficulty is that a single conversational turn requires multiple API calls in sequence, each with distinct providers, latency characteristics, and failure modes.

Three independent failure surfaces per turn. A cascading voice pipeline makes at least three API calls for every user interaction: STT (audio to text), LLM (text to response), and TTS (response to audio). Each hop can fail independently. The STT Service may return a low-confidence transcription. The LLM may time out. The TTS Service may rate-limit. When a user reports that “the call sounded broken,” you need to identify which hop failed, but each provider has its own dashboard and log format with no shared correlation identifier.
The observable unit is a conversation, not a request. Users do not experience individual API calls. They experience a phone call or voice interaction spanning dozens of turns. Turn 3 completed in 400ms, but turn 7 had a 3-second TTS timeout. Traditional API monitoring shows average latency across all requests. It does not show latency progression within a conversation. Debugging user-reported issues requires conversation-scoped traces that group all hops from all turns under a single identifier.
Latency budgets compound across hops. Natural-sounding voice interaction requires end-to-end turn latency under approximately 800ms. STT taking 350ms eats into the LLM’s budget. A 200ms LLM response leaves only 250ms for TTS. Monitoring each hop in isolation does not reveal the cascading impact. You need a waterfall view showing how latency distributes across the pipeline so you can identify which hop is consuming the budget.
Credential management scales with the pipeline. A text-only LLM application manages one provider’s credentials. A voice pipeline manages three: STT provider keys, LLM provider keys, and TTS provider keys. Each has its own rotation policy, billing dashboard, and rate limits. Switching STT providers (Deepgram to Whisper, or Whisper to a self-hosted model) means re-tooling authentication, monitoring, and cost tracking for that hop across every Service that calls it.
Cost attribution is per-model, but budgets are per-conversation. LLM providers charge per token. STT providers charge per audio-second. TTS providers charge per character. Building a cost-per-minute or cost-per-conversation view requires normalizing these different units and correlating charges across hops that run on separate billing systems.

The alternative to the cascading pipeline is realtime speech-to-speech APIs (OpenAI Realtime, Gemini Live), which use a single WebSocket connection to a multimodal model that ingests and emits audio natively. Latency drops sharply, but per-hop observability disappears by design: there are no separate STT, LLM, or TTS stages to instrument. Regulated industries (finance, healthcare, legal) remain on cascading architectures because the text intermediary between STT and TTS provides an audit trail and a checkpoint for compliance checks before responses are spoken.

The solution

This recipe places Kong AI Gateway between the voice agent and all three providers. Each pipeline hop gets its own Kong Service, Route, and AI Proxy Advanced Plugin instance. The Key Auth Plugin identifies the calling voice agent on every hop, and a global OpenTelemetry Plugin exports gen_ai.* spans from every hop to Langfuse, where they appear as a single conversation trace.

Component	Role
`voice-ai-stt` Service	Routes audio to OpenAI Whisper for transcription (`audio/v1/audio/transcriptions`)
`voice-ai-llm` Service	Routes text to any supported LLM provider (`llm/v1/chat`), provider varies per tab
`voice-ai-tts` Service	Routes text to OpenAI TTS for speech synthesis (`audio/v1/audio/speech`)
AI Proxy Advanced (3 instances)	Injects credentials, handles format translation, emits per-hop telemetry
Key Auth Plugin (global)	Authenticates the voice agent with a shared `apikey` header on every Route
OpenTelemetry Plugin (global)	Exports `gen_ai.*` spans with provider, model, token usage, and latency to Langfuse
Langfuse	Groups spans by W3C trace ID into conversation-level traces for full-turn visibility

All three calls share a single W3C trace ID, which Langfuse uses to group the per-hop spans into one conversation-level trace.

 
sequenceDiagram
    participant V as Voice Agent
    participant K as Kong AI Gateway
    participant P as Provider (Whisper / LLM / TTS)
    participant Lf as Langfuse

    V->>K: POST /stt (apikey, audio)
    activate K
    K->>K: key-auth + ai-proxy-advanced (inject OpenAI creds)
    K->>P: Whisper transcription
    activate P
    P-->>K: Transcription
    deactivate P
    K->>Lf: gen_ai.* span (STT)
    K-->>V: Transcription
    deactivate K

    V->>K: POST /llm (apikey, text)
    activate K
    K->>K: key-auth + ai-proxy-advanced (translate format, inject creds)
    K->>P: LLM completion
    activate P
    P-->>K: Response
    deactivate P
    K->>Lf: gen_ai.* span (LLM)
    K-->>V: Text response
    deactivate K

    V->>K: POST /tts (apikey, text)
    activate K
    K->>K: key-auth + ai-proxy-advanced (inject OpenAI creds)
    K->>P: TTS synthesis
    activate P
    P-->>K: Audio
    deactivate P
    K->>Lf: gen_ai.* span (TTS)
    K-->>V: Audio
    deactivate K

Kong decouples the voice agent from individual providers. Swap the LLM from OpenAI to Anthropic by changing a tab and re-applying. Replace Whisper with a self-hosted STT model by updating one Service target. The observability contract (same gen_ai.* spans, same Prometheus labels, same Langfuse trace structure) stays identical regardless of which providers sit behind the gateway.

How it works

Tracing terminology

OpenTelemetry’s vocabulary shows up in several places below; here is what each term means in this recipe.

Span. One unit of work with a start, an end, and attributes. Examples here include the voice-turn parent span the demo opens, the kong.access.plugin.ai-proxy-advanced Plugin-phase span Kong emits, and the gen_ai.* generation span Kong populates from the provider response.
Trace. A tree of spans sharing one trace_id that represents a single logical operation. In this recipe, that means one voice turn (STT → LLM → TTS).
Root span. The top of the tree; everything else is a descendant. The demo’s voice-turn span is the root, and Kong’s per-hop spans nest below the demo’s httpx client spans.
traceparent header. The W3C-standard HTTP header that carries the active trace_id and parent span_id across services. The demo’s httpx instrumentation injects it on every outbound call; Kong’s OpenTelemetry Plugin extracts it on every inbound request.
Propagation. The act of moving trace context from one process to another via headers like traceparent. Without it, each Service emits its own disconnected trace.
Exporter. The component that ships finished spans to a backend over OTLP. The demo SDK and Kong’s Plugin each run their own exporter, both pointed at Langfuse.
Session (Langfuse-specific). A grouping of multiple traces under one logical conversation, keyed by the langfuse.session.id attribute on the root span of each trace.

Per-turn flow

When the demo processes a conversational turn, it makes three sequential requests through Kong:

Authenticate at the gateway. Every request includes the apikey header. The Key Auth Plugin matches the value against the registered Consumer (voice-agent) before any AI Proxy Advanced logic runs. Unknown keys are rejected with 401 Unauthorized.
STT hop. The agent sends an audio file to /voice-ai-observability/stt. The AI Proxy Advanced Plugin on this Route injects OpenAI credentials, forwards the audio to Whisper’s transcription endpoint, and logs the result. The OpenTelemetry Plugin emits a gen_ai.* span with the provider name (openai), model (whisper-1), and operation metadata.
LLM hop. The agent sends the transcribed text to /voice-ai-observability/llm. The AI Proxy Advanced Plugin on this Route injects the configured LLM provider’s credentials, translates the request format if needed (for example, OpenAI format to Anthropic’s messages API), and forwards to the provider. The span includes gen_ai.usage.input_tokens and gen_ai.usage.output_tokens for cost attribution.
TTS hop. The agent sends the LLM response text to /voice-ai-observability/tts. The AI Proxy Advanced Plugin injects OpenAI credentials and forwards to the TTS endpoint. The response is raw audio bytes returned to the agent.
Trace and Session grouping. Before any HTTP call, the voice agent (the demo) opens a voice-turn parent span via the OpenTelemetry SDK and tags it with langfuse.session.id. The httpx auto-instrumentation in the demo wraps the OpenAI SDK’s underlying httpx client, so every STT/LLM/TTS call becomes a child span of voice-turn and a real W3C traceparent is injected into the outbound request. Kong’s OpenTelemetry Plugin extracts that traceparent and roots its per-hop server, Plugin, balancer, dns, and gen_ai.* spans as descendants of the demo’s httpx client span. Both exporters ship spans to Langfuse independently; Langfuse reassembles by trace_id. The langfuse.session.id attribute on the root span tells Langfuse to roll multiple per-turn traces up under one Session for cross-turn analysis.

Key Auth: Voice agent identification

The Key Auth Plugin authenticates the calling voice agent before any per-hop logic runs. It is configured at the global level so all three Routes (/stt, /llm, /tts) require the same apikey header. The recipe defines a single voice-agent Consumer with a static credential. In production, replace this with one Consumer per tenant or per voice client, rotated through Kong Vaults.

Configuration details

plugins:
  - name: key-auth
    config:
      key_names:
        - apikey
      hide_credentials: true

consumers:
  - username: voice-agent
    keyauth_credentials:
      - key: voice-demo-key

key_names: [apikey]. The header (or query parameter) the Plugin reads to identify the Consumer. Clients send apikey: voice-demo-key. See the Key Auth reference for the full list of recognized parameter sources.
hide_credentials: true. Strips the credential from the request before it reaches the upstream provider. Without this, the apikey header would be forwarded to OpenAI, Anthropic, etc., leaking the gateway-side credential into provider logs.
consumers[].keyauth_credentials[].key. The credential the Consumer presents. For non-trivial deployments, generate per-Consumer keys with kongctl create consumer-credential or rotate via Kong Vaults.

For richer identity flows (JWT-based SSO, scoped audiences, IdP integration), swap Key Auth for the OpenID Connect Plugin. The Claude Code SSO recipe shows the pattern.

AI Proxy Advanced: Speech-to-text transcription

The STT Service uses the AI Proxy Advanced Plugin with genai_category: audio/transcription to route audio files to OpenAI’s Whisper model. This is the entry point of the cascading pipeline: raw audio goes in, transcribed text comes out. By routing STT through Kong instead of calling Whisper directly, you get credential injection, payload logging, and gen_ai.* telemetry on the transcription hop without instrumenting your application code.

Configuration details

plugins:
  - name: ai-proxy-advanced
    config:
      genai_category: audio/transcription
      max_request_body_size: 26214400
      response_streaming: deny
      targets:
        - route_type: audio/v1/audio/transcriptions
          auth:
            header_name: Authorization
            header_value: "Bearer <openai-key>"
          logging:
            log_payloads: true
          model:
            provider: openai
            name: whisper-1

genai_category: audio/transcription. Classifies this Plugin instance as an audio transcription operation. Kong uses this category for Prometheus metric labels (genai_category=audio/transcription) and OpenTelemetry span attributes, separating STT metrics from LLM and TTS traffic.
max_request_body_size: 26214400. Raises the request body limit to 25 MB. Audio files can be several megabytes, and the default limit rejects most audio uploads. Set this to at least three times the expected raw audio file size, per the AI Proxy Advanced documentation.
response_streaming: deny. Whisper returns the full transcript in one response. Streaming would add complexity for no benefit, so the Plugin is configured to refuse streaming requests on this Route.
route_type: audio/v1/audio/transcriptions. Selects the Whisper transcription endpoint. The Plugin supports several other audio operations on the same target type. See the AI Proxy Advanced route types for the current list.
logging.log_payloads. Includes request and response bodies in log output. For STT, this captures the transcription text. Disable in production if audio payloads contain sensitive content.
No log_statistics. Kong’s AI Proxy Advanced Plugin rejects log_statistics on audio/* route types because token-counting concepts don’t map onto audio operations. Statistics-style metrics (count, latency, request volume) for audio hops still come from Kong’s Prometheus exporter labelled with genai_category=audio/transcription and audio/speech; this option is reserved for llm/* route types only.

AI Proxy Advanced: LLM chat completion

The LLM Service handles the reasoning hop of the pipeline. This is the only Service that varies by provider: the auth block, model name, and provider-specific options change depending on which tab you select. The route_type: llm/v1/chat target accepts OpenAI-format chat completion requests, and Kong translates them to the upstream provider’s native format when needed.

Configuration details

plugins:
  - name: ai-proxy-advanced
    config:
      max_request_body_size: 8388608
      response_streaming: allow
      targets:
        - route_type: llm/v1/chat
          auth:
            header_name: Authorization
            header_value: "Bearer <provider-key>"
          logging:
            log_statistics: true
            log_payloads: true
          model:
            provider: openai
            name: gpt-4o

max_request_body_size: 8388608. Allows up to 8 MB of request body. Long conversation histories, large system prompts, and tool-call payloads can exceed the default limit.
response_streaming: allow. Lets clients request server-sent events for token-by-token chat responses. The recipe demo does not stream, but production voice agents often do to start TTS earlier in the turn.
route_type: llm/v1/chat. Selects the chat completions translation path. The Plugin accepts OpenAI-format request bodies and translates them to the upstream provider’s native format. Responses are normalized back to OpenAI format. To pass requests through in a provider’s native format, set llm_format (for example anthropic, bedrock, gemini) on the Plugin config; see the AI Proxy Advanced documentation for the full route-type and llm_format support matrix.
auth. The auth block varies by provider. OpenAI and Mistral use Authorization: Bearer <key>, Anthropic uses x-api-key, Azure uses api-key, Bedrock uses AWS access key pairs, and Gemini uses GCP service account credentials. Kong injects these into every upstream request. Clients send a placeholder credential.
model.provider and model.name. Identify the upstream LLM. The model name resolves from the DECK_CHAT_MODEL environment variable at apply time, so you can switch models without editing the deck file.
logging.log_statistics and logging.log_payloads. Statistics capture prompt and completion token counts; payload logging captures the full prompt and reply text. The gen_ai.input.messages and gen_ai.output.messages span attributes in the OpenTelemetry trace also contain this data when payload logging is enabled.

AI Proxy Advanced: Text-to-speech synthesis

The TTS Service converts the LLM response to audio, completing the cascading pipeline. Like the STT Service, it is fixed to OpenAI (TTS-1 model) across all provider tabs. Routing TTS through Kong gives you the same telemetry contract as the other hops: credential injection, usage logging, and gen_ai.* span emission.

Configuration details

plugins:
  - name: ai-proxy-advanced
    config:
      genai_category: audio/speech
      max_request_body_size: 1048576
      response_streaming: allow
      targets:
        - route_type: audio/v1/audio/speech
          auth:
            header_name: Authorization
            header_value: "Bearer <openai-key>"
          logging:
            log_payloads: true
          model:
            provider: openai
            name: tts-1

genai_category: audio/speech. Classifies this as a text-to-speech operation. Prometheus metrics and OTel spans are labeled accordingly, so you can filter TTS latency and cost separately from STT and LLM traffic.
max_request_body_size: 1048576. TTS input is plain text, so 1 MB is generous. Set this lower if your voice agent never sends prompts above a few KB.
response_streaming: allow. Lets clients request streamed audio chunks instead of waiting for the entire synthesis to finish. Production voice agents use this to begin playback as soon as the first chunk arrives.
route_type: audio/v1/audio/speech. Selects the TTS endpoint. The response is raw audio bytes (MP3 by default). The client can request other formats via the response_format field in the request body. See the AI Proxy Advanced reference for supported audio formats and voices.
model.name: tts-1. OpenAI’s standard TTS model. Available voices and higher-quality model variants are listed in the OpenAI TTS documentation.

OpenTelemetry: Trace export to Langfuse

The OpenTelemetry Plugin runs as a global Plugin (not scoped to a single Service), so it captures traces from all three pipeline hops. It exports spans to Langfuse’s OTLP endpoint, where they are grouped by W3C trace ID into conversation-level traces.

Configuration details

plugins:
  - name: opentelemetry
    config:
      traces_endpoint: "https://us.cloud.langfuse.com/api/public/otel/v1/traces"
      headers:
        Authorization: "Basic <base64-encoded-credentials>"
        x-langfuse-ingestion-version: "4"
      sampling_rate: 1
      propagation:
        default_format: w3c

traces_endpoint. The OTLP/HTTP endpoint where Kong sends trace data. For Langfuse Cloud, use https://us.cloud.langfuse.com/api/public/otel/v1/traces (US) or https://cloud.langfuse.com/api/public/otel/v1/traces (EU). For self-hosted Langfuse, use http://host.docker.internal:3000/api/public/otel/v1/traces.
headers.Authorization. Basic auth header constructed from your Langfuse public key and secret key: Basic <base64(pk:sk)>. This authenticates trace export to your Langfuse project.
headers.x-langfuse-ingestion-version: "4". Enables Langfuse’s real-time Fast Preview display for incoming traces.
sampling_rate: 1. Samples 100% of requests. Reduce this in high-traffic production environments to control trace volume and cost.
propagation.default_format: w3c. Uses W3C Trace Context for trace ID propagation. When a client sends a traceparent header, the Plugin preserves that trace ID on the emitted span. This is how multiple requests (STT, LLM, TTS) get grouped under a single trace.

Kong emits gen_ai.* span attributes on every AI Proxy Advanced request (v3.13+). These attributes follow the OpenTelemetry GenAI semantic conventions and include:

Attribute	Description
`gen_ai.provider.name`	Provider identifier (for example, `openai`, `anthropic`)
`gen_ai.request.model`	Model name from the request
`gen_ai.response.model`	Model name from the provider response
`gen_ai.operation.name`	Operation type (`chat`, `embeddings`)
`gen_ai.usage.input_tokens`	Input token count
`gen_ai.usage.output_tokens`	Output token count
`gen_ai.input.messages`	Full input messages (when payload logging enabled)
`gen_ai.output.messages`	Full output messages (when payload logging enabled)

Production considerations

In production, store credentials in Kong Vaults using {vault://backend/key} references rather than environment variables. Kong supports HashiCorp Vault, AWS Secrets Manager, GCP Secret Manager, and the Konnect Config Store.

The gen_ai.input.messages and gen_ai.output.messages span attributes capture full prompt and response payloads. Review your data retention and access control policies before enabling payload logging in production, as these attributes may contain PII, sensitive business context, or credentials passed in prompts.

Apply the Kong configuration

This section configures the Control Plane in two parts. First, adopt the quickstart Control Plane into a kongctl namespace so the apply commands below can manage it. The recipe’s select_tags and the voice-ai-observability-recipe namespace scope every resource so teardown removes only this recipe’s configuration.

kongctl adopt control-plane "${KONNECT_CONTROL_PLANE_NAME}" \
  --namespace "${KONNECT_CONTROL_PLANE_NAME}" \
  --pat "${KONNECT_TOKEN}"

Copied!

Adoption stamps the KONGCTL-namespace label on the Control Plane.

The configuration creates three Kong Services and Routes (/stt, /llm, /tts), each with an AI Proxy Advanced Plugin handling credential injection and telemetry, plus a global Key Auth Plugin for voice agent identification, a global OpenTelemetry Plugin exporting traces to Langfuse, and a voice-agent Consumer with the voice-demo-key API key.

Select your LLM provider below, export the per-tab environment variables, and apply. The STT and TTS hops use OpenAI regardless of which LLM provider you choose, so DECK_OPENAI_TOKEN, DECK_LANGFUSE_OTLP_ENDPOINT, and DECK_LANGFUSE_AUTH_HEADER are exported once during the prerequisites and reused across tabs.