Claude Code SSO

Minimum Version
Kong Gateway - 3.14
Incompatible with
on-prem

Overview

This recipe puts Kong AI Gateway in front of Anthropic Console API access (or AWS Bedrock) so every Claude Code request from your engineering team is authenticated via Okta SSO, routed to a model tier based on the developer’s Okta group membership, and subject to per-tier token rate limits. No provider credentials sit on developer machines.

The recipe uses three Kong Plugins working together: the OpenID Connect Plugin for Okta JWT validation, the AI Proxy Advanced Plugin for Consumer-Group-scoped model routing, and the AI Rate Limiting Advanced Plugin for per-tier token budgets.

Scope: this recipe governs Claude Code’s API-key authentication path. That’s the path it uses when configured with apiKeyHelper, ANTHROPIC_API_KEY, ANTHROPIC_AUTH_TOKEN, or cloud-provider credentials. Claude.ai subscription seats authenticated via interactive claude /login go directly to Anthropic, bill against the subscription seat, and are out of scope.

Prerequisites

This tutorial uses Kong Konnect. The quickstart script provisions a recipe-scoped Control Plane and local Data Plane.

  1. Create a new personal access token by opening the Konnect PAT page and selecting Generate Token.
  2. Export your token. The same token is reused later for kongctl commands:

    export KONNECT_TOKEN='YOUR_KONNECT_PAT'
    
  3. Set the recipe-scoped Control Plane name and run the quickstart script. The -e flags raise the data plane’s nginx body buffer so Claude Code’s large request payloads (full conversation context, tool definitions, file contents) stay in memory instead of spilling to disk:

    export KONNECT_CONTROL_PLANE_NAME='claude-code-sso-recipe'
    curl -Ls https://get.konghq.com/quickstart | \
      bash -s -- -k $KONNECT_TOKEN \
        -e KONG_NGINX_HTTP_CLIENT_BODY_BUFFER_SIZE=16m \
        -e KONG_NGINX_HTTP_CLIENT_MAX_BODY_SIZE=16m \
        --deck-output
    

    This provisions a Kong Konnect Control Plane named claude-code-sso-recipe, a local Data Plane connected to it, and prints export lines for the rest of the session vars. Paste those into your shell when prompted.

This tutorial uses kongctl and decK to manage Kong configuration, plus jq for JSON processing in the apply and cleanup steps.

  1. Install kongctl from developer.konghq.com/kongctl.
  2. Install decK version 1.43 or later from docs.konghq.com/deck.
  3. Install jq from jqlang.org.

You can verify all three are installed:

kongctl version
deck version
jq --version

This recipe requires an Okta organization with admin access. The steps below create one Native OIDC Application for Claude Code, enable refresh tokens, configure a groups claim on Okta’s authorization server, and assign one test user to one group.

Create the Native Application

This is a public client representing Claude Code. Public clients use Authorization Code + PKCE without a client secret.

  1. In the Okta Admin Console, go to Applications → Create App Integration.
  2. Select OIDC - OpenID Connect and Native Application, then click Next.
  3. Name it Claude Code.
  4. Set the sign-in redirect URI to http://localhost:9876/callback. This is the local callback the helper script in the Claude Code prereq listens on.
  5. On the application’s General tab under Client Credentials, confirm that Proof Key for Code Exchange (PKCE) is checked. Okta enables it by default for Native app types.
  6. Under General Settings → Grant type, check Refresh Token (alongside the Authorization Code entry that’s already enabled). Without this, even when the helper script requests offline_access, Okta does not issue refresh tokens, and developers hit the browser PKCE flow on every token expiry.
  7. Copy the Client ID for use in the Claude Code prereq.

Confirm the authorization server scopes

The helper script requests openid, profile, email, and offline_access. Okta’s default authorization server enables these out of the box, so no change is usually required.

  1. Go to Security → API → Authorization Servers and select the default server (or your custom server).
  2. Open the Scopes tab and confirm the four scopes above are listed and enabled. If offline_access is missing, add it. Without it, refresh tokens are never issued.

Configure the groups claim

The OpenID Connect Plugin reads the groups claim out of the access token and maps each value to a Kong Consumer Group with the same name. Configure the authorization server to include the claim:

  1. On the same authorization server, open the Claims tab and add a new claim:
    • Name: groups
    • Include in token type: Access Token
    • Value type: Groups
    • Filter: Matches regex .* (or restrict to a prefix like claude-)

Set up an Okta group and a test user

Create one group and assign one user to start. Kong is configured with both claude-standard-users and claude-power-users Consumer Groups, but you only need to test one tier at a time and can swap the user between groups later to exercise the other tier:

  1. Go to Directory → Groups and create claude-standard-users.
  2. Go to Directory → People and either add a new person (for example, claude-test-user@example.com) or pick an existing user. Make sure the user has a password set and can sign in.
  3. Open the user, go to the Groups tab, and assign them to claude-standard-users.
  4. Open the Claude Code application created above, go to the Assignments tab, and assign the same user. Without this, Okta blocks the OAuth flow at sign-in.

To exercise the power tier later, also create a claude-power-users group. The Swap to the power tier section walks through swapping the user between groups to switch tiers.

Export Okta endpoints and audience

The audience is configured on the authorization server itself, not per request. View or edit it at Security → API → Authorization Servers → [server] → Settings tab → Audience. The built-in default server uses api://default.

export DECK_OKTA_ISSUER='https://your-org.okta.com/oauth2/default'
export DECK_OKTA_AUDIENCE='api://default'
# Salt for the openid-connect plugin's token cache key. Stable across syncs.
# Not a credential. For production, regenerate with `openssl rand -hex 16` and
# source from your secrets manager (or Kong Vaults).
export DECK_OIDC_CACHE_TOKENS_SALT='claude-code-sso-dev-salt'

If you use a custom authorization server, set DECK_OKTA_ISSUER to that server’s issuer URL and DECK_OKTA_AUDIENCE to its configured audience value.

This recipe routes Claude Code through Kong using the apiKeyHelper setting, which runs a script before each API call and uses its stdout as the credential.

  1. Install Claude Code from docs.claude.com/en/docs/claude-code/setup.
  2. Verify installation:

    claude --version
    

apiKeyHelper is bypassed if ANTHROPIC_API_KEY or ANTHROPIC_AUTH_TOKEN is set in your environment. Both env vars take precedence over the helper per Claude Code’s credential precedence rules. If either is set in your shell profile, unset it before running Claude Code with this recipe: unset ANTHROPIC_API_KEY ANTHROPIC_AUTH_TOKEN.

Helper script: okta-claude-auth.sh

This script implements the OAuth 2.0 Authorization Code + PKCE flow against Okta, caches the token locally, and refreshes silently when possible. Save it to ~/.claude/okta-claude-auth.sh and make it executable:

cat <<'SCRIPT' > ~/.claude/okta-claude-auth.sh
#!/usr/bin/env bash
# okta-claude-auth.sh: apiKeyHelper for Claude Code + Okta PKCE
set -euo pipefail

OKTA_DOMAIN="${OKTA_DOMAIN:-"https://your-org.okta.com"}"
CLIENT_ID="${OKTA_CLIENT_ID:-"0oa1b2c3d4YourClientId"}"
REDIRECT_PORT="${OKTA_REDIRECT_PORT:-"9876"}"
REDIRECT_URI="http://localhost:${REDIRECT_PORT}/callback"
SCOPES="openid profile email offline_access"
AUDIENCE="${OKTA_AUDIENCE:-"api://default"}"
AUTH_SERVER="${OKTA_AUTH_SERVER:-"default"}"

CACHE_DIR="${HOME}/.claude/okta-cache"
TOKEN_CACHE="${CACHE_DIR}/tokens.json"
LOCK_FILE="${CACHE_DIR}/auth.lock"

log()  { echo "[okta-auth] $*" >&2; }
die()  { echo "[okta-auth] ERROR: $*" >&2; exit 1; }
b64url() { openssl base64 -A | tr '+/' '-_' | tr -d '='; }
random_str() { openssl rand -hex "${1:-32}"; }
json_get() { echo "$1" | grep -o "\"$2\"[[:space:]]*:[[:space:]]*\"[^\"]*\"" | sed 's/.*: *"\([^"]*\)".*/\1/'; }
json_get_num() { echo "$1" | grep -o "\"$2\"[[:space:]]*:[[:space:]]*[0-9]*" | grep -o '[0-9]*$'; }

mkdir -p "$CACHE_DIR" && chmod 700 "$CACHE_DIR"

cache_read() { [[ -f "$TOKEN_CACHE" ]] && cat "$TOKEN_CACHE" || echo "{}"; }
cache_write() { echo "$1" > "$TOKEN_CACHE" && chmod 600 "$TOKEN_CACHE"; }

access_token_valid() {
  local cache exp now tok
  cache=$(cache_read)
  tok=$(json_get "$cache" "access_token")
  exp=$(json_get_num "$cache" "expires_at")
  now=$(date +%s)
  [[ -z "$tok" || -z "$exp" ]] && return 1
  (( now < exp - 60 )) && { echo "$tok"; return 0; }
  return 1
}

do_refresh() {
  local cache refresh_tok response new_access new_refresh expires_in expires_at now
  cache=$(cache_read)
  refresh_tok=$(json_get "$cache" "refresh_token")
  [[ -z "$refresh_tok" ]] && return 1
  log "Attempting silent refresh..."
  response=$(curl -sf -X POST "${OKTA_DOMAIN}/oauth2/${AUTH_SERVER}/v1/token" \
    -H "Content-Type: application/x-www-form-urlencoded" \
    -d "grant_type=refresh_token&refresh_token=${refresh_tok}&client_id=${CLIENT_ID}&scope=${SCOPES// /%20}") || return 1
  new_access=$(json_get "$response" "access_token")
  new_refresh=$(json_get "$response" "refresh_token")
  expires_in=$(json_get_num "$response" "expires_in")
  [[ -z "$new_access" ]] && return 1
  now=$(date +%s); expires_at=$(( now + ${expires_in:-3600} ))
  cache_write "{\"access_token\":\"${new_access}\",\"refresh_token\":\"${new_refresh:-$refresh_tok}\",\"expires_at\":${expires_at}}"
  log "Token refreshed successfully."
  echo "$new_access"
}

start_callback_server() {
  local expected_state="$1"
  python3 - "$REDIRECT_PORT" "$expected_state" <<'PYEOF'
import sys, socket, urllib.parse
port, expected_state = int(sys.argv[1]), sys.argv[2]
HTML_OK = b"<html><body><h2>Authenticated! Close this tab.</h2><script>window.close();</script></body></html>"
HTML_ERR = b"<html><body><h2>Authentication failed.</h2></body></html>"
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
sock.bind(("127.0.0.1", port)); sock.listen(1); sock.settimeout(120)
try:
    conn, _ = sock.accept(); data = b""
    while b"\r\n\r\n" not in data:
        chunk = conn.recv(4096)
        if not chunk: break
        data += chunk
    path = data.decode(errors="replace").split(" ")[1] if b" " in data else "/"
    params = urllib.parse.parse_qs(urllib.parse.urlparse(path).query)
    code, state = params.get("code",[None])[0], params.get("state",[None])[0]
    error = params.get("error",[None])[0]
    ok = code and state == expected_state and not error
    conn.sendall(b"HTTP/1.1 200 OK\r\nContent-Type: text/html\r\nConnection: close\r\n\r\n" + (HTML_OK if ok else HTML_ERR))
    conn.close()
    if not ok: sys.exit(1)
    print(code, end="")
except socket.timeout:
    sys.stderr.write("[okta-auth] Timed out waiting for callback.\n"); sys.exit(1)
finally: sock.close()
PYEOF
}

do_auth_code_flow() {
  local verifier challenge state auth_url code response access_tok refresh_tok expires_in expires_at now
  verifier=$(random_str 32)
  challenge=$(echo -n "$verifier" | openssl dgst -binary -sha256 | b64url)
  state=$(random_str 16)
  auth_url="${OKTA_DOMAIN}/oauth2/${AUTH_SERVER}/v1/authorize?response_type=code&client_id=${CLIENT_ID}&redirect_uri=${REDIRECT_URI}&scope=${SCOPES// /+}&audience=${AUDIENCE}&state=${state}&code_challenge=${challenge}&code_challenge_method=S256"
  log "Opening browser for Okta login..."
  log "If the browser doesn't open, visit: ${auth_url}"
  if command -v open >/dev/null 2>&1; then
    open "$auth_url" &
  elif command -v xdg-open >/dev/null 2>&1; then
    xdg-open "$auth_url" >/dev/null 2>&1 &
  fi
  log "Waiting for callback on http://localhost:${REDIRECT_PORT}/callback ..."
  code=$(start_callback_server "$state")
  [[ -z "$code" ]] && die "No authorization code received."
  log "Authorization code received. Exchanging for tokens..."
  response=$(curl -sf -X POST "${OKTA_DOMAIN}/oauth2/${AUTH_SERVER}/v1/token" \
    -H "Content-Type: application/x-www-form-urlencoded" \
    -d "grant_type=authorization_code&code=${code}&redirect_uri=${REDIRECT_URI}&client_id=${CLIENT_ID}&code_verifier=${verifier}") || die "Token exchange failed."
  access_tok=$(json_get "$response" "access_token")
  refresh_tok=$(json_get "$response" "refresh_token")
  expires_in=$(json_get_num "$response" "expires_in")
  [[ -z "$access_tok" ]] && die "No access_token in response."
  now=$(date +%s); expires_at=$(( now + ${expires_in:-3600} ))
  cache_write "{\"access_token\":\"${access_tok}\",\"refresh_token\":\"${refresh_tok}\",\"expires_at\":${expires_at}}"
  log "Authentication successful. Token cached."
  echo "$access_tok"
}

acquire_lock() {
  local waited=0
  while ! mkdir "$LOCK_FILE" 2>/dev/null; do
    sleep 0.5; (( waited++ ))
    (( waited > 30 )) && { log "Lock timeout, removing stale lock"; rm -rf "$LOCK_FILE"; }
  done
  trap 'rm -rf "$LOCK_FILE"' EXIT INT TERM
}

main() {
  acquire_lock
  tok=$(access_token_valid) && { echo "$tok"; exit 0; }
  tok=$(do_refresh 2>/dev/null) && { echo "$tok"; exit 0; }
  tok=$(do_auth_code_flow)
  echo "$tok"
}

main "$@"
SCRIPT
chmod +x ~/.claude/okta-claude-auth.sh

For a production rollout, distribute okta-claude-auth.sh through your internal tooling (a Homebrew tap, package manager, or setup script) rather than having every developer run this heredoc.

Configure Claude Code

Create or update ~/.claude/settings.json. Pick the tab that matches the provider you’ll apply Kong against:

Replace https://your-org.okta.com and 0oa1b2c3d4YourClientId with values from your Okta setup. Claude Code runs apiKeyHelper in an isolated environment without access to your shell profile, so all Okta variables must be listed in the env block. ANTHROPIC_BASE_URL points Claude Code at Kong. OKTA_AUTH_SERVER defaults to default; set it to your custom server’s ID if you use one.

The model env vars pin which name Claude Code sends in the request body so it always matches a model_alias configured in Kong:

  • ANTHROPIC_MODEL sets the active model from session start. Without it, the picker boots on the system default and Claude Code sends the literal placeholder <default>, which has no matching alias in Kong.
  • ANTHROPIC_DEFAULT_SONNET_MODEL and ANTHROPIC_DEFAULT_OPUS_MODEL control what the sonnet and opus aliases resolve to when the user runs /model sonnet or /model opus mid-session.

These three values must match the DECK_SONNET_ALIAS and DECK_OPUS_ALIAS set in the apply step. That’s the alias contract Kong uses to route the request to the right tier. Whenever the platform team rolls out a new model version (for example, Sonnet 4.6 lands on Bedrock), they update the four DECK_* values in Kong and announce a corresponding bump for the three ANTHROPIC_*_MODEL values here.

On Windows, Claude Code does not expand ~ in the apiKeyHelper field (known issue). Substitute the absolute path, for example C:\\Users\\you\\.claude\\okta-claude-auth.sh. Tilde expansion works on macOS and Linux.

Token caching and refresh cadence

Claude Code caches the helper’s output and re-invokes okta-claude-auth.sh every 5 minutes by default, and immediately on any HTTP 401 from the upstream. The helper itself caches the access and refresh tokens in ~/.claude/okta-cache/tokens.json and silently exchanges the refresh token for a new access token when the cached one is within 60 seconds of expiry. On Okta’s default 1-hour access-token TTL, the helper refreshes well before Kong sees a stale JWT, and developers never see a browser or a 401 after the initial sign-in.

If your Okta authorization server is configured with a much shorter access-token TTL (for example, 2 minutes), set CLAUDE_CODE_API_KEY_HELPER_TTL_MS in the env block of ~/.claude/settings.json to match. For example, set "CLAUDE_CODE_API_KEY_HELPER_TTL_MS": "60000" for 60 seconds. This lowers Claude Code’s helper-output cache window so the helper’s refresh logic runs before each token expires, avoiding a transient 401-retry cycle on every TTL boundary. The env var has no effect on the helper’s refresh-token handling, which is decided inside the script based on the cached expires_at. Tightening it does not speed up Okta revocation detection. The Kong OpenID Connect Plugin re-validates the JWT signature and expiry on every request and is the real revocation guard.

The problem

Anthropic Console API keys (and AWS Bedrock provider credentials) give engineers programmatic access to LLMs through Claude Code, but the access is governed at the credential level, not at the organizational identity level. That gap creates several blind spots at scale:

Static keys are shared and untraceable. A single Console API key (or AWS access key) is typically distributed across a team via shell profiles, .env files, or CI secrets stores. Every developer’s requests are indistinguishable on the Anthropic bill. When a key leaks in shell history or a commit, rotation requires touching every machine and pipeline that uses it. There is no per-user audit trail, no way to revoke one person’s access without rotating the shared key, and no mechanism to assign different models or rate limits to different roles.

No coupling to organizational identity. Okta already defines who belongs to which team, what MFA policy applies, and when access should be revoked. The API key has no connection to Okta’s session state. A developer removed from an Okta group at 10 AM can still call the API at 3 PM because the cached key never re-checks anything. Anthropic’s Claude.ai SSO governs the parallel claude /login subscription path on Pro / Max / Team / Enterprise plans, not API keys. The two paths are separate billing relationships at Anthropic.

No model governance. Every developer with a valid key gets the same model access. There is no built-in way to say “interns get Haiku, senior engineers get Opus” or “this team’s monthly token budget is 500K.” Cost controls happen retroactively through billing alerts, not proactively through access policy.

The root issue is that trust is established at the developer machine, not at a server-side enforcement point your organization controls. That enforcement point is what this recipe builds.

The solution

This recipe moves the enforcement point to a Kong Gateway that your organization controls, with Okta as the single source of truth for identity. It applies to Claude Code’s API-key path, the path it uses when configured with apiKeyHelper, ANTHROPIC_API_KEY, ANTHROPIC_AUTH_TOKEN, or AWS provider credentials:

  • Developers never hold the provider key. Kong holds the Console API key (or Bedrock credentials) and injects it server-side after validating an Okta-issued JWT. Key rotation happens in one place (Kong’s config), not on every developer machine.
  • Okta drives identity. Claude Code presents the developer’s Okta JWT via apiKeyHelper. Kong validates the signature against Okta’s JWKS keys on every request. Removing a user from an Okta group or disabling their account immediately blocks API access on the next call.
  • Consumer-Group-based model routing maps Okta groups to Kong Consumer Groups, each with their own AI Proxy Advanced Plugin instance and model alias set. Standard users can request only the cost-efficient model; power users can request both that model and the most capable one. Changing a user’s tier is an Okta group assignment, not a config change.
  • Per-tier token budgets apply different rate limits to each Consumer Group so the gateway enforces cost ceilings at the boundary rather than reacting to billing alerts after the fact.
  • Explicit tier enforcement. A standard user requesting Opus is rejected at Kong with a 400 model not configured. There is no silent downgrade and no upstream provider call.
 
sequenceDiagram
    participant CC as Claude Code
    participant H as apiKeyHelper script
    participant O as Okta
    participant K as Kong Gateway
    participant L as LLM Provider

    CC->>H: Request bearer token
    activate H
    alt Cached token still valid
        H-->>CC: Cached JWT
    else Refresh or PKCE flow
        H->>O: PKCE flow (browser)
        activate O
        O-->>H: JWT (id_token + refresh_token)
        deactivate O
        H-->>CC: Fresh JWT
    end
    deactivate H

    CC->>K: POST /claude-code-sso (Authorization: Bearer JWT)
    activate K
    K->>K: openid-connect (JWKS validate, Okta group to Kong Consumer Group)
    K->>K: ai-proxy-advanced (alias check, inject provider key, Route by tier)
    K->>K: ai-rate-limiting-advanced (per-tier token budget)
    K->>L: Forwarded request
    activate L
    L-->>K: Provider response
    deactivate L
    K-->>CC: Anthropic-format response
    deactivate K
  

Component

Responsibility

okta-claude-auth.sh PKCE flow, token caching, silent refresh. Runs on the developer machine.
Okta Identity, MFA, group membership, JWT issuance.
Kong, OpenID Connect Plugin JWT signature validation via JWKS, audience verification, Okta group to Kong Consumer Group mapping.
Kong, AI Proxy Advanced Plugin LLM provider auth injection, model alias matching for tier enforcement, format translation.
Kong, AI Rate Limiting Advanced Plugin Per-Consumer-Group token rate limits with sliding windows.
LLM provider Model inference.

How it works

When a developer runs Claude Code, every API request flows through Kong before reaching the LLM provider. Here is the complete request lifecycle:

  1. Token acquisition. Claude Code invokes the apiKeyHelper script (okta-claude-auth.sh) before each API call. The script checks its local token cache for a valid access token. If the token has expired but a refresh token exists, it silently exchanges for a new access token via Okta’s token endpoint. If no valid tokens exist, it opens a browser to Okta’s authorization endpoint for a full PKCE authentication flow, captures the authorization code via a local callback server, and exchanges it for tokens. The resulting JWT is returned to Claude Code.

  2. Request to Kong. Claude Code sends POST /claude-code-sso/v1/messages with Authorization: Bearer <jwt> to Kong. The request body is in Anthropic’s native message format and pins the model name to one of Claude Code’s bare aliases (claude-sonnet-4-6, claude-opus-4-7) via the ANTHROPIC_DEFAULT_*_MODEL env vars in ~/.claude/settings.json.

  3. JWT validation and Consumer Group mapping. The OpenID Connect Plugin validates the JWT signature against Okta’s cached JWKS keys, checks expiry, and verifies the aud claim. It then reads the groups array from the JWT and resolves each value to a Kong Consumer Group with the same name. A user in claude-power-users is attached to the claude-power-users Consumer Group.

  4. Tier enforcement and credential injection. The Consumer-Group-scoped AI Proxy Advanced Plugin matches the request body’s model value against the model_alias on each of its targets. The standard tier registers only claude-sonnet-4-6; the power tier registers both claude-sonnet-4-6 and claude-opus-4-7. A standard user requesting Opus produces no match and is rejected with 400 model not configured. On match, Kong injects the provider API key server-side and forwards the request to the configured upstream model.

  5. Token rate limiting. The Consumer-Group-scoped AI Rate Limiting Advanced Plugin counts prompt and completion tokens against the tier’s per-window budget. Rate-limit headers (X-AI-RateLimit-Remaining-*) are added to the response. Exhaustion returns 429 Too Many Requests with a Retry-After header.

  6. Response. The provider’s response flows back through Kong to Claude Code. Subsequent requests reuse the cached Okta token silently. No browser flow unless the refresh token has expired.

OpenID Connect: JWT validation and Consumer Group mapping

The OpenID Connect Plugin is the authentication layer. It validates every incoming JWT against Okta’s JWKS keys, rejects tokens that have expired or were issued for a different audience, and attaches the developer’s Okta groups to the request as Kong Consumer Groups. This Consumer Group attachment is the bridge between your identity provider and Kong’s tier-scoped Plugins. It determines which AI Proxy Advanced and AI Rate Limiting Advanced instances run on the request.

Configuration details

plugins:
  - name: openid-connect
    config:
      issuer: ${{ env "DECK_OKTA_ISSUER" }}
      auth_methods:
        - bearer
      bearer_token_param_type:
        - header
      audience_claim:
        - aud
      audience_required:
        - ${{ env "DECK_OKTA_AUDIENCE" }}
      consumer_groups_claim:
        - groups
      consumer_optional: true
      consumer_groups_optional: false
      upstream_headers_claims:
        - sub
        - email
      upstream_headers_names:
        - X-Authenticated-User
        - X-User-Email
      ssl_verify: true
      hide_credentials: true
      cache_tokens_salt: ${{ env "DECK_OIDC_CACHE_TOKENS_SALT" }}

issuer. The Okta authorization server’s base URL. Kong appends /.well-known/openid-configuration to discover JWKS endpoints, signing keys, and token metadata automatically. Kong caches the JWKS keys to avoid hitting Okta on every request.

auth_methods: [bearer]. Tells Kong to look for a Bearer token in the Authorization header. Claude Code’s apiKeyHelper outputs a bare token, and Claude Code sends it as Authorization: Bearer <token>.

audience_required. The JWT’s aud claim must match this value exactly. This prevents tokens issued for other Okta applications from being accepted. Set it to the audience you configured on the Okta authorization server.

consumer_groups_claim: [groups] with consumer_optional: true and consumer_groups_optional: false. The Plugin reads the groups array from the validated JWT and resolves each value against existing Kong Consumer Group names. A JWT with "groups": ["claude-power-users"] attaches the request to the claude-power-users Consumer Group; downstream tier-scoped Plugins fire automatically. consumer_optional: true allows requests through without a per-user Consumer mapping. consumer_groups_optional: false blocks any authenticated user whose groups don’t match a configured Consumer Group, so an Okta user without a tier assignment is rejected at the gateway.

upstream_headers_claims and upstream_headers_names. Forward the JWT’s sub and email claims as X-Authenticated-User and X-User-Email upstream. Useful for audit logging. You can see which user made each request without decoding the JWT.

ssl_verify: true. Enables TLS certificate verification when Kong connects to Okta’s JWKS endpoint. In Kong Gateway 3.14+, this defaults to true as part of the Secure by Default initiative. Set explicitly here for clarity.

hide_credentials: true. Strips the Authorization header from the request before forwarding upstream. Since the AI Proxy Advanced Plugin injects its own provider credentials, the Okta JWT is not needed upstream. In Kong Gateway 3.14+, this defaults to true. Set explicitly here for clarity.

cache_tokens_salt. Salt used when deriving the cache key for token-endpoint responses. The Plugin requires this to be set explicitly to a stable value so cached entries survive deck gateway sync. The value is not a credential and grants no token-forgery capability, but treat it as mildly sensitive (it influences cache-key predictability). For production, regenerate with openssl rand -hex 16 and source it from a vault.

AI Proxy Advanced: model alias matching and tier enforcement

Each tier gets its own AI Proxy Advanced Plugin instance, scoped to one Consumer Group. The Plugin uses the model_alias field on each target to decide which target serves a request. Claude Code sends bare model names like claude-sonnet-4-6 in the request body; the Plugin matches that value against the configured aliases and Routes to the matching target. A standard user picking Opus produces a body model of claude-opus-4-7, which has no matching target on the standard tier proxy, so AI Proxy Advanced returns 400 model not configured. The block is explicit and visible to the user.

The mapping from alias to actual provider model name is what insulates developers from the underlying provider. The platform team controls model.name (the real Anthropic or Bedrock model ID); developers always speak in stable bare aliases.

Configuration details

The standard tier configures one target. The power tier configures two:

plugins:
  - name: ai-proxy-advanced
    consumer_group: claude-standard-users
    config:
      llm_format: anthropic
      max_request_body_size: 10485760
      response_streaming: allow
      targets:
        - route_type: llm/v1/chat
          auth:
            header_name: x-api-key
            header_value: ${{ env "DECK_ANTHROPIC_TOKEN" }}
          logging:
            log_statistics: true
            log_payloads: true
          model:
            model_alias: ${{ env "DECK_SONNET_ALIAS" }}
            provider: anthropic
            name: ${{ env "DECK_CHAT_MODEL_1" }}
            options:
              anthropic_version: ${{ env "DECK_ANTHROPIC_VERSION" }}
  - name: ai-proxy-advanced
    consumer_group: claude-power-users
    config:
      llm_format: anthropic
      max_request_body_size: 10485760
      response_streaming: allow
      targets:
        - route_type: llm/v1/chat
          auth:
            header_name: x-api-key
            header_value: ${{ env "DECK_ANTHROPIC_TOKEN" }}
          model:
            model_alias: ${{ env "DECK_SONNET_ALIAS" }}
            provider: anthropic
            name: ${{ env "DECK_CHAT_MODEL_1" }}
            options:
              anthropic_version: ${{ env "DECK_ANTHROPIC_VERSION" }}
        - route_type: llm/v1/chat
          auth:
            header_name: x-api-key
            header_value: ${{ env "DECK_ANTHROPIC_TOKEN" }}
          model:
            model_alias: ${{ env "DECK_OPUS_ALIAS" }}
            provider: anthropic
            name: ${{ env "DECK_CHAT_MODEL_2" }}
            options:
              anthropic_version: ${{ env "DECK_ANTHROPIC_VERSION" }}

llm_format: anthropic. Claude Code sends requests in Anthropic’s native format (/v1/messages with messages array). For Anthropic, requests pass through natively. For AWS Bedrock (hosting Claude models), the Plugin translates from Anthropic format to Bedrock’s InvokeModel API automatically. Claude Code always speaks Anthropic; Kong handles the rest.

Do not use llm_format: openai with Claude Code. Claude Code sends Anthropic-native tool definitions in its requests. The OpenAI format translation path mangles these structures, causing 400 errors from the LLM provider (typically tools: Input should be a valid list). Always use llm_format: anthropic when proxying Claude Code traffic.

consumer_group: claude-standard-users (and claude-power-users). Scopes each Plugin instance to one Consumer Group. The OpenID Connect Plugin attaches the request to a Consumer Group based on the JWT’s groups claim, and the matching tier-scoped Plugin instance fires.

auth. Kong holds the provider API key and injects it on every upstream request. The developer’s Okta JWT is used only for authentication at the Kong layer. It never reaches the LLM provider. Credential values come from environment variables via decK’s ${{ env "..." }} syntax, resolved at apply time.

route_type: llm/v1/chat. Selects the chat-completions translation path. See the AI Proxy Advanced reference for the full list of supported Route types.

model.model_alias. The bare name Claude Code sends in the request body. The Plugin matches this string against the body’s model field and picks the corresponding target. With one alias per target on the standard tier, only requests for claude-sonnet-4-6 resolve.

model.name. The actual provider model ID Kong sends upstream. For Anthropic, this is the same bare name (claude-sonnet-4-6); for Bedrock, it’s the long form (anthropic.claude-sonnet-4-6-20250514-v1:0). When the platform team upgrades the underlying model, they change DECK_CHAT_MODEL_1 or DECK_CHAT_MODEL_2 and re-apply. Developers do not need to update anything on their side.

logging.log_statistics, logging.log_payloads. Emit token-usage data and request/response bodies to any attached logging Plugin (for example, HTTP Log or File Log) for per-user cost attribution and audit; Konnect Analytics captures token usage independently of these fields.

max_request_body_size: 10485760. Sets the maximum allowed request body to 10 MB. Claude Code conversations accumulate large context windows (100 KB or more of conversation history, tool results, and file contents). The default body size limit rejects these requests.

response_streaming: allow. Permits the Plugin to pass Server-Sent Events streaming responses from the provider back to Claude Code. Claude Code uses streaming for interactive terminal output. Without this setting, streaming responses can be buffered or rejected.

AI Rate Limiting Advanced: per-tier token limits

Each tier gets its own AI Rate Limiting Advanced instance with its own token budget. Unlike request-count rate limiting, this Plugin counts tokens (prompt + completion), which is the correct unit for LLM cost control. Standard users get a smaller token budget per window; power users get a larger one. When a tier exhausts its budget, Kong returns 429 Too Many Requests until the window resets.

Configuration details

plugins:
  - name: ai-rate-limiting-advanced
    consumer_group: claude-standard-users
    config:
      policies:
        - limits:
            - limit: 20000
              window_size: 60
          window_type: sliding
      identifier: consumer-group
      tokens_count_strategy: total_tokens
      strategy: local
      llm_format: anthropic

policies. An array of rate-limiting policies. Each policy contains a limits array of limit/window pairs and an optional match array for targeting providers, models, or other dimensions. A policy without match conditions acts as a fallback that applies to all requests, which is what this recipe uses since each Plugin instance is already Consumer-Group-scoped. Standard users get 20,000 total tokens per 60-second window; power users get 100,000.

window_type: sliding. Uses a sliding-window algorithm for smoother rate limiting compared to fixed windows. The fixed alternative uses strict time windows and resets all counters at the boundary.

tokens_count_strategy: total_tokens. Counts both prompt (input) and completion (output) tokens against the limit. Alternatives are prompt_tokens, completion_tokens, or cost.

identifier: consumer-group. Tracks token usage per Kong Consumer Group. Required when the Plugin instance is scoped to a Consumer Group, because a user can belong to multiple groups and the Plugin needs to know which group’s counter to increment.

strategy: local. Uses in-memory counters on each Kong node. Fine for single-node or development deployments. For multi-node production clusters, switch to strategy: redis with a shared Redis instance so counters stay consistent across nodes.

llm_format: anthropic. Must match the llm_format on the AI Proxy Advanced Plugin so the rate limiting Plugin can correctly parse token counts from the response.

Kong returns token rate-limit headers with every response:

Header

Description

X-AI-RateLimit-Limit-{window}-{provider} Maximum tokens allowed in the window.
X-AI-RateLimit-Remaining-{window}-{provider} Tokens remaining in the current window.
RateLimit-Reset Seconds until the window resets.

When the token limit is exceeded, Kong returns 429 Too Many Requests with a Retry-After header.

The 60-second windows here are intentionally aggressive for the demo so a few interactive prompts visibly exhaust the budget. Most teams enforce monthly or daily token budgets in production, for example limits: [{limit: 5000000, window_size: 2592000}]. Combine with Kong Vaults using {vault://backend/key} references for credentials in production rather than environment variables.

Apply the Kong configuration

This section configures the Control Plane in two parts. First, adopt the quickstart Control Plane into a kongctl namespace so the apply commands below can manage it. The recipe’s select_tags and the claude-code-sso-recipe namespace scope every resource so teardown removes only this recipe’s configuration.

kongctl adopt control-plane "${KONNECT_CONTROL_PLANE_NAME}" \
  --namespace "${KONNECT_CONTROL_PLANE_NAME}" \
  --pat "${KONNECT_TOKEN}"

Adoption stamps the KONGCTL-namespace label on the Control Plane.

The provider tabs below create a Service and Route at /claude-code-sso, two Consumer Groups (claude-standard-users and claude-power-users), an OpenID Connect Plugin for Okta JWT validation, Consumer-Group-scoped AI Proxy Advanced Plugins for tier-based model alias routing, and Consumer-Group-scoped AI Rate Limiting Advanced Plugins for per-tier token limits. See the kongctl documentation for more on federated configuration management.

Select your provider below, export the required environment variables, and apply.

Create the Claude Code Usage dashboard

Create a custom dashboard at the org level, pre-filtered to this recipe’s Gateway Service. The dashboard surfaces cost, token usage, request volume, model mix, per-developer (Consumer) usage, and latency for traffic through the claude-code-sso Service. The dashboard JSON is in the code block below; if a labelled dashboard from a prior apply already exists, the block reuses it instead of creating a duplicate.

# Look up the Control Plane and Service IDs so the dashboard's gateway_service
# preset filter resolves to the scoped UUID Konnect expects.
CP_ID=$(kongctl get gateway control-plane "${KONNECT_CONTROL_PLANE_NAME}" \
  --pat "${KONNECT_TOKEN}" -o json --jq '.id' -r)
SERVICE_ID=$(kongctl get gateway control-plane service claude-code-sso \
  --control-plane-name "${KONNECT_CONTROL_PLANE_NAME}" \
  --pat "${KONNECT_TOKEN}" -o json --jq '.id' -r)

EXISTING_DASHBOARDS=$(kongctl api get "/v2/dashboards?filter%5Blabels.recipe%5D=claude-code-sso-recipe" \
  --pat "${KONNECT_TOKEN}" -o json --jq '.data | length')

if [ "${EXISTING_DASHBOARDS}" -gt 0 ]; then
  echo "Claude Code Usage dashboard already exists. Reusing."
else
  cat <<'EOF' | jq --arg ref "${CP_ID}:${SERVICE_ID}" '.definition.preset_filters[0].value = [$ref]' > claude-code-usage-dashboard.json
{
  "name": "Claude Code Usage",
  "definition": {
    "tiles": [
      {
        "id": "e3f9588a-71fd-4996-818f-33c93afe6eae",
        "type": "chart",
        "layout": {
          "size": {
            "cols": 2,
            "rows": 1
          },
          "position": {
            "col": 0,
            "row": 0
          }
        },
        "definition": {
          "chart": {
            "type": "single_value",
            "chart_title": "Total cost ($)"
          },
          "query": {
            "filters": [
              {
                "field": "ai_provider",
                "operator": "not_empty"
              },
              {
                "field": "ai_provider",
                "value": [
                  "UNSPECIFIED"
                ],
                "operator": "not_in"
              }
            ],
            "metrics": [
              "cost"
            ],
            "datasource": "llm_usage",
            "dimensions": []
          }
        }
      },
      {
        "id": "9edab295-d705-403f-a612-59847420a195",
        "type": "chart",
        "layout": {
          "size": {
            "cols": 2,
            "rows": 1
          },
          "position": {
            "col": 2,
            "row": 0
          }
        },
        "definition": {
          "chart": {
            "type": "single_value",
            "chart_title": "Total Tokens"
          },
          "query": {
            "filters": [
              {
                "field": "ai_provider",
                "operator": "not_empty"
              },
              {
                "field": "ai_provider",
                "value": [
                  "UNSPECIFIED"
                ],
                "operator": "not_in"
              }
            ],
            "metrics": [
              "total_tokens"
            ],
            "datasource": "llm_usage",
            "dimensions": []
          }
        }
      },
      {
        "id": "e5cc504e-c505-4f0b-8633-638a0e7b5833",
        "type": "chart",
        "layout": {
          "size": {
            "cols": 2,
            "rows": 1
          },
          "position": {
            "col": 4,
            "row": 0
          }
        },
        "definition": {
          "chart": {
            "type": "single_value",
            "chart_title": "Total Claude Code requests"
          },
          "query": {
            "filters": [
              {
                "field": "ai_provider",
                "operator": "not_empty"
              },
              {
                "field": "ai_provider",
                "value": [
                  "UNSPECIFIED"
                ],
                "operator": "not_in"
              }
            ],
            "metrics": [
              "ai_request_count"
            ],
            "datasource": "llm_usage",
            "dimensions": []
          }
        }
      },
      {
        "id": "4ab1fb73-d60b-4f77-b915-a0c88d427263",
        "type": "chart",
        "layout": {
          "size": {
            "cols": 3,
            "rows": 2
          },
          "position": {
            "col": 0,
            "row": 1
          }
        },
        "definition": {
          "chart": {
            "type": "top_n",
            "chart_title": "Top Claude Models by Usage"
          },
          "query": {
            "limit": 10,
            "filters": [],
            "metrics": [
              "total_tokens",
              "ai_request_count"
            ],
            "datasource": "llm_usage",
            "dimensions": [
              "ai_request_model"
            ]
          }
        }
      },
      {
        "id": "e6c5fbe0-9dd2-46ac-af1c-3ba3af9ffd43",
        "type": "chart",
        "layout": {
          "size": {
            "cols": 3,
            "rows": 2
          },
          "position": {
            "col": 3,
            "row": 1
          }
        },
        "definition": {
          "chart": {
            "type": "timeseries_line",
            "stacked": false,
            "chart_title": "Model usage trend (top 5)"
          },
          "query": {
            "limit": 5,
            "filters": [
              {
                "field": "ai_provider",
                "operator": "not_empty"
              },
              {
                "field": "ai_provider",
                "value": [
                  "UNSPECIFIED"
                ],
                "operator": "not_in"
              }
            ],
            "metrics": [
              "total_tokens"
            ],
            "datasource": "llm_usage",
            "dimensions": [
              "ai_request_model",
              "time"
            ]
          }
        }
      },
      {
        "id": "fd14bf28-0cd2-409e-9a48-2cbbd7907409",
        "type": "chart",
        "layout": {
          "size": {
            "cols": 2,
            "rows": 2
          },
          "position": {
            "col": 0,
            "row": 3
          }
        },
        "definition": {
          "chart": {
            "type": "donut",
            "chart_title": "Claude health check"
          },
          "query": {
            "filters": [
              {
                "field": "gateway_service",
                "operator": "not_empty"
              },
              {
                "field": "ai_provider",
                "operator": "not_empty"
              },
              {
                "field": "ai_provider",
                "value": [
                  "UNSPECIFIED"
                ],
                "operator": "not_in"
              }
            ],
            "metrics": [
              "ai_request_count"
            ],
            "datasource": "llm_usage",
            "dimensions": [
              "status_code_grouped"
            ]
          }
        }
      },
      {
        "id": "f26edaa8-56d5-49f8-8f04-ade9451c76ec",
        "type": "chart",
        "layout": {
          "size": {
            "cols": 2,
            "rows": 2
          },
          "position": {
            "col": 2,
            "row": 3
          }
        },
        "definition": {
          "chart": {
            "type": "donut",
            "chart_title": "Claude provider usage"
          },
          "query": {
            "filters": [
              {
                "field": "gateway_service",
                "operator": "not_empty"
              },
              {
                "field": "ai_provider",
                "operator": "not_empty"
              },
              {
                "field": "ai_provider",
                "value": [
                  "UNSPECIFIED"
                ],
                "operator": "not_in"
              }
            ],
            "metrics": [
              "ai_request_count"
            ],
            "datasource": "llm_usage",
            "dimensions": [
              "ai_provider"
            ]
          }
        }
      },
      {
        "id": "d27037a3-7ace-49cc-affb-3b34a71ab9b1",
        "type": "chart",
        "layout": {
          "size": {
            "cols": 2,
            "rows": 2
          },
          "position": {
            "col": 4,
            "row": 3
          }
        },
        "definition": {
          "chart": {
            "type": "timeseries_bar",
            "stacked": true,
            "chart_title": "LLM latency (avg)"
          },
          "query": {
            "filters": [
              {
                "field": "ai_request_model",
                "operator": "not_empty"
              },
              {
                "field": "ai_request_model",
                "value": [
                  "UNSPECIFIED"
                ],
                "operator": "not_in"
              }
            ],
            "metrics": [
              "llm_latency_average"
            ],
            "datasource": "llm_usage",
            "dimensions": [
              "ai_request_model",
              "time"
            ]
          }
        }
      },
      {
        "id": "5f88c24b-80de-4fe1-8948-ef0bde29f948",
        "type": "chart",
        "layout": {
          "size": {
            "cols": 3,
            "rows": 2
          },
          "position": {
            "col": 0,
            "row": 9
          }
        },
        "definition": {
          "chart": {
            "type": "horizontal_bar",
            "stacked": true,
            "chart_title": "Claude model usage (requests)"
          },
          "query": {
            "filters": [
              {
                "field": "ai_request_model",
                "operator": "not_empty"
              },
              {
                "field": "ai_request_model",
                "value": [
                  "UNSPECIFIED"
                ],
                "operator": "not_in"
              }
            ],
            "metrics": [
              "ai_request_count"
            ],
            "datasource": "llm_usage",
            "dimensions": [
              "ai_request_model"
            ]
          }
        }
      },
      {
        "id": "702f6291-e02d-47eb-b38b-85438c643712",
        "type": "chart",
        "layout": {
          "size": {
            "cols": 3,
            "rows": 2
          },
          "position": {
            "col": 3,
            "row": 9
          }
        },
        "definition": {
          "chart": {
            "type": "vertical_bar",
            "stacked": true,
            "chart_title": "Claude model usage count (tokens)"
          },
          "query": {
            "filters": [
              {
                "field": "ai_provider",
                "operator": "not_empty"
              },
              {
                "field": "ai_provider",
                "value": [
                  "UNSPECIFIED"
                ],
                "operator": "not_in"
              }
            ],
            "metrics": [
              "total_tokens"
            ],
            "datasource": "llm_usage",
            "dimensions": [
              "ai_request_model"
            ]
          }
        }
      },
      {
        "id": "2242b427-2999-4cfe-b17e-9e4ca61be362",
        "type": "chart",
        "layout": {
          "size": {
            "cols": 2,
            "rows": 2
          },
          "position": {
            "col": 0,
            "row": 11
          }
        },
        "definition": {
          "chart": {
            "type": "vertical_bar",
            "stacked": true,
            "chart_title": "AI security report"
          },
          "query": {
            "filters": [
              {
                "field": "status_code_grouped",
                "value": [
                  "4XX"
                ],
                "operator": "in"
              },
              {
                "field": "ai_provider",
                "operator": "not_empty"
              },
              {
                "field": "ai_provider",
                "value": [
                  "UNSPECIFIED"
                ],
                "operator": "not_in"
              }
            ],
            "metrics": [
              "ai_request_count"
            ],
            "datasource": "llm_usage",
            "dimensions": [
              "route",
              "status_code"
            ]
          }
        }
      },
      {
        "id": "df1b0486-2c04-430e-a8ff-7e26c7821bf7",
        "type": "chart",
        "layout": {
          "size": {
            "cols": 3,
            "rows": 2
          },
          "position": {
            "col": 2,
            "row": 11
          }
        },
        "definition": {
          "chart": {
            "type": "timeseries_bar",
            "stacked": true,
            "chart_title": "Monthly spend trends"
          },
          "query": {
            "limit": 10,
            "filters": [],
            "metrics": [
              "cost"
            ],
            "datasource": "llm_usage",
            "dimensions": [
              "ai_request_model",
              "time"
            ],
            "time_range": {
              "type": "relative",
              "time_range": "30d"
            },
            "granularity": "daily"
          }
        }
      }
    ],
    "template_id": "AI_GATEWAY",
    "preset_filters": [
      {
        "field": "gateway_service",
        "value": [],
        "operator": "in"
      }
    ]
  },
  "labels": {
    "recipe": "claude-code-sso-recipe"
  }
}
EOF
  DASHBOARD_ID=$(kongctl api post /v2/dashboards \
    -f claude-code-usage-dashboard.json \
    --pat "${KONNECT_TOKEN}" -o json --jq '.id' -r)
  rm -f claude-code-usage-dashboard.json
  echo "Created Claude Code Usage dashboard (id: ${DASHBOARD_ID}). Open it in Konnect at Observability → Custom dashboards → 'Claude Code Usage'."
fi

Try it out

With the configuration applied and Okta configured, Claude Code requests now flow through Kong for authentication, model alias matching, tier enforcement, and rate limiting. Verify the recipe by running Claude Code itself, the tool the recipe enables.

Launch Claude Code

claude

On the first invocation (or when the cached token expires), a browser window opens to Okta for authentication. After authenticating, the terminal continues automatically:

[okta-auth] Opening browser for Okta login...
[okta-auth] Waiting for callback on http://localhost:9876/callback ...
[okta-auth] Authorization code received. Exchanging for tokens...
[okta-auth] Authentication successful. Token cached.

╭─────────────────────────────────╮
│ ✻  Welcome to Claude Code!      │
╰─────────────────────────────────╯

Ask a question. Claude Code sends the request through Kong, which validates your Okta JWT, attaches the matching Consumer Group, runs the alias check, injects the provider API key, and forwards to the configured upstream model.

Subsequent invocations reuse the cached token silently. No browser flow unless the refresh token has expired.

Hit the tier enforcement boundary

Standard users have only the claude-sonnet-4-6 model alias configured. Inside Claude Code, switch to Opus and send a prompt:

> /model opus
> Hello

Claude Code sends the configured Opus alias (whatever value you set as ANTHROPIC_DEFAULT_OPUS_MODEL) in the request body. The standard tier’s AI Proxy Advanced Plugin instance has no matching model_alias, so the request is rejected at Kong before any provider call:

API Error: 400 model not configured

This is the explicit block: a standard user cannot use Opus, regardless of what Claude Code’s UI offers, and they see exactly why. Switch back to Sonnet (/model sonnet) and the request succeeds again.

Hit the rate limit

Standard tier is configured at 20,000 tokens per 60-second sliding window. A few interactive prompts that include conversation history and file context exhaust the budget quickly. Send three or four prompts in rapid succession that each carry meaningful context (for example, ask Claude Code to summarize a file in your repo). Once the budget exhausts, Kong returns:

API Error: 429 Too Many Requests

The response includes a Retry-After header indicating how many seconds remain in the window.

Swap to the power tier

The IdP-issued groups claim drives the Consumer Group attachment, so changing the user’s Okta group is all that’s needed to switch tiers. Kong already has the claude-power-users Consumer Group, AI Proxy Advanced Plugin instance (with both Sonnet and Opus aliases), and AI Rate Limiting Advanced Plugin instance in place from the apply step.

  1. In Okta, go to Directory → Groups and create claude-power-users if it does not already exist.
  2. Open your test user, go to the Groups tab, remove claude-standard-users, and add claude-power-users.
  3. Force a fresh token by clearing the helper script’s cache so it does not silently reuse the old JWT (which still carries the previous groups claim):

    rm -rf ~/.claude/okta-cache/
    
  4. Re-launch Claude Code. The browser opens for a fresh PKCE flow, the new JWT carries groups: ["claude-power-users"], and Kong now attaches the request to the power-tier Consumer Group. Run /model opus followed by a prompt; the request succeeds. The token budget jumps to 100,000 per 60-second window.

Swap back the same way: move the user between groups in Okta, clear the cache, and re-authenticate.

Explore in Konnect

Open Kong Konnect to see the recipe’s resources in place.

Claude Code Usage dashboard

Navigate to Observability → Custom dashboards → Claude Code Usage. The dashboard is pre-filtered to the claude-code-sso Gateway Service and surfaces:

  • Total cost, total tokens, and request count for the recipe’s traffic.
  • Top Claude models by token and request volume, plus a model-usage trend.
  • Health-check, provider-share, and average-latency breakdowns.
  • Per-developer (Consumer) usage, cost, and token trends, so the per-tier ceilings configured by the recipe are directly visible here.
  • An AI security report scoped to 4XX responses on the recipe’s Route.

Gateway resources

Navigate to API Gateway → Gateways → claude-code-sso-recipe. The Control Plane the quickstart provisioned and kongctl adopt attached to this namespace surfaces:

  • Gateway services → claude-code-sso. The Service the apply block registered. Its detail page has tabs for Configuration, Routes, Plugins, and Analytics.
    • Routes tab: the /claude-code-sso Route.
    • Plugins tab: the OpenID Connect Plugin on the Service, plus the Consumer-Group-scoped AI Proxy Advanced and AI Rate Limiting Advanced Plugins.
  • Consumer Groups. The claude-standard-users and claude-power-users groups the OpenID Connect Plugin maps each Okta group to.

The Gateway Service’s Analytics tab and the Observability L1 menu remain available for deeper exploration beyond the curated dashboard above.

Variations and next steps

  • Switch the underlying model. Update DECK_CHAT_MODEL_1 or DECK_CHAT_MODEL_2 to a different provider model ID and re-apply. Developers see no change because the alias they send (claude-sonnet-4-6 or claude-opus-4-7) stays the same.
  • Add more tiers. Create additional Okta groups, additional Kong Consumer Groups with matching names, and additional Consumer-Group-scoped AI Proxy Advanced and AI Rate Limiting Advanced instances. For example, an intern tier with only Haiku registered, or an ml-team tier with access to a specialized model alias.
  • Adjust token budgets. Most teams enforce monthly or daily windows in production. For example, limits: [{limit: 5000000, window_size: 2592000}] for a 5 million token monthly budget per tier. Combine windows with multiple limits entries to enforce both burst and sustained budgets simultaneously.
  • Restrict access by IP. Add an IP Restriction Plugin to the Service or Route with allow set to your egress ranges (corporate VPN, CI runners). The Plugin runs before Okta JWT validation and refuses connections from unknown sources, narrowing the attack surface to authenticated traffic from approved networks.
  • Multi-node rate limiting with Redis. The recipe uses strategy: local, which keeps counters in memory on each Kong node. For multi-node production clusters, switch to strategy: redis and point to a shared Redis instance so counters stay consistent across nodes.
  • Use a different IdP. The OpenID Connect Plugin works with any OIDC-compliant identity provider (Microsoft Entra ID, Auth0, Keycloak, PingIdentity, others). Update the issuer URL and adjust the claim names to match your IdP’s token format. The PKCE helper script works with any OIDC provider that supports Authorization Code + PKCE.
  • Cover non-Claude clients. This recipe uses llm_format: anthropic because Claude Code sends requests in Anthropic’s native format. If your team needs broader provider support (OpenAI, Google Gemini, Mistral), see Basic LLM Routing with llm_format: openai.

Cleanup

The recipe’s select_tags and kongctl namespace scoped all resources, so this teardown removes only this recipe’s configuration.

Delete the Claude Code Usage custom dashboard. The dashboard is an org-level resource and outlives the Control Plane, so remove it before tearing down Kong:

DASHBOARD_IDS=$(kongctl api get "/v2/dashboards?filter%5Blabels.recipe%5D=claude-code-sso-recipe" \
  --pat "${KONNECT_TOKEN}" -o json --jq '.data[].id' -r)

if [ -z "${DASHBOARD_IDS}" ]; then
  echo "No Claude Code Usage dashboard found. Skipping."
else
  for id in ${DASHBOARD_IDS}; do
    if kongctl api delete "/v2/dashboards/${id}" --pat "${KONNECT_TOKEN}"; then
      echo "Deleted Claude Code Usage dashboard ${id}."
    else
      echo "Failed to delete dashboard ${id}."
    fi
  done
fi

Tear down Kong by deleting the local data plane and the Kong Konnect Control Plane:

export KONNECT_CONTROL_PLANE_NAME='claude-code-sso-recipe' && curl -Ls https://get.konghq.com/quickstart | bash -s -- -d -k $KONNECT_TOKEN

Remove the helper script and cached tokens:

rm -f ~/.claude/okta-claude-auth.sh
rm -rf ~/.claude/okta-cache/

Revert ~/.claude/settings.json. The following jq command removes only the keys this recipe added and preserves anything else you have in the file:

tmp=$(mktemp) && jq '
  del(.apiKeyHelper)
  | del(
      .env.ANTHROPIC_BASE_URL,
      .env.OKTA_DOMAIN,
      .env.OKTA_CLIENT_ID,
      .env.OKTA_AUDIENCE,
      .env.OKTA_AUTH_SERVER,
      .env.ANTHROPIC_MODEL,
      .env.ANTHROPIC_DEFAULT_SONNET_MODEL,
      .env.ANTHROPIC_DEFAULT_OPUS_MODEL,
      .env.CLAUDE_CODE_DISABLE_EXPERIMENTAL_BETAS
    )
  | if (.env // {}) == {} then del(.env) else . end
' ~/.claude/settings.json > "$tmp" && mv "$tmp" ~/.claude/settings.json

If you previously ran unset ANTHROPIC_API_KEY or unset ANTHROPIC_AUTH_TOKEN to use this recipe, re-export those values from your shell profile to return to your prior Claude Code authentication setup.

Subscription seat users (Pro, Max, Team, or Enterprise): Claude Code attempts to fall back to your cached subscription credentials once apiKeyHelper is removed from settings.json. Run claude /status to confirm the session resumed, and run claude /login again if it didn’t.

Help us make these docs great!

Kong Developer docs are open source. If you find these useful and want to make them better, contribute today!