AI Semantic Response Guard

AI License Required
Related Documentation
Made by
Kong Inc.
Supported Gateway Topologies
hybrid db-less traditional
Supported Konnect Deployments
hybrid cloud-gateways serverless
Compatible Protocols
grpc grpcs http https
Minimum Version
Kong Gateway - 3.12
Tags
#ai
AI Gateway Enterprise: This plugin is only available as part of our AI Gateway Enterprise offering.

The AI Semantic Response Guard plugin extends the AI Prompt Guard plugin by filtering LLM responses based on semantic similarity to predefined rules. It helps prevent unwanted or unsafe responses when serving llm/v1/chat, llm/v1/completions, or llm/v1/embeddings requests through AI Gateway.

You can use a combination of allow and deny response rules to maintain integrity and compliance when returning responses from an LLM service.

How it works

The plugin analyzes the semantic content of the full LLM response before it is returned to the client. The matching behavior is as follows:

  • If any deny_responses are set and the response matches a pattern in the deny list, the response is blocked with a 403 Forbidden.
  • If any allow_responses are set, but the response matches none of the allowed patterns, the response is also blocked with a 403 Forbidden.
  • If any allow_responses are set and the response matches one of the allowed patterns, the response is permitted.
  • If both deny_responses and allow_responses are set, the deny condition takes precedence. A response that matches a deny pattern will be blocked, even if it also matches an allow pattern. If the response does not match any deny pattern, it must still match an allow pattern to be permitted.

Response processing

To enforce these rules, the plugin:

  1. Disables streaming (stream=false) to ensure the full response body is buffered before analysis.
  2. Intercepts the response body using the guard-response filter.
  3. Extracts response text, supporting JSON parsing of multiple LLM formats and gzipped content.
  4. Generates embeddings for the extracted text.
  5. Searches the vector database (Redis, Pgvector, or other) against configured allow_responses or deny_responses.
  6. Applies the decision rules described above.

If a response is blocked or if a system error occurs during evaluation, the plugin returns a 403 Forbidden to the client without exposing that the Semantic Response Guard blocked it.

Partials v3.13+

This plugin supports vectordb and embeddings Partials, which let you define shared vector database and embeddings configuration once and reuse it across multiple AI Gateway plugins. This is useful when running this plugin alongside others that use the same vector database and embeddings model, such as AI Semantic Cache, AI RAG Injector, AI Semantic Prompt Guard, AI Semantic Response Guard, and AI Proxy Advanced.

Partial type

Fields covered

vectordb config.vectordb
embeddings config.embeddings

For setup instructions, see AI plugin Partials.

Vector databases

A vector database can be used to store vector embeddings, or numerical representations, of data items. For example, a response would be converted to a numerical representation and stored in the vector database so that it can compare new requests against the stored vectors to find relevant cached items.

The AI Semantic Response Guard plugin supports the following vector databases:

  • Using config.vectordb.strategy: redis and parameters in config.vectordb.redis:
    • Redis with Vector Similarity Search (VSS)
    • AWS MemoryDB for Redis v3.12+
    • Valkey v3.14+: When you configure vectordb.strategy: redis, Kong Gateway queries the server and checks the server name field. If it detects Valkey request, it automatically uses the Valkey-specific driver.
  • Using config.vectordb.strategy: pgvector and parameters in config.vectordb.pgvector:

To learn more about vector databases in AI Gateway, see Embedding-based similarity matching in Kong AI gateway plugins.

Using cloud authentication with Redis v3.13+

If your plugin uses a Redis datastore, you can authenticate to it with a cloud Redis provider. This allows you to seamlessly rotate credentials without relying on static passwords.

The following providers are supported:

  • AWS ElastiCache
  • Azure Managed Redis
  • Google Cloud Memorystore (with or without Valkey)

Help us make these docs great!

Kong Developer docs are open source. If you find these useful and want to make them better, contribute today!