You can choose from the following cache sizes:
- micro: ~0.5 GiB capacity
- small: ~1 GiB capacity
- medium: ~3 GiB capacity
- large: ~6 GiB capacity
- xlarge: ~12 GiB capacity
- 2xlarge: ~25 GiB capacity
- 4xlarge: ~52 GiB capacity
- 8xlarge: ~100 GiB capacity
- 12xlarge: ~150 GiB capacity
- 16xlarge: ~200 GiB capacity
- 24xlarge: ~300 GiB capacity
Contact Kong to enable cache tiers
Specific cache sizes must be enabled on your account.
Contact your Kong support team to enable a specific cache size before you create or upgrade one.
When sizing workloads, plan for approximately 70–75% of total managed cache memory to be available for cache data.
The platform reserves around 25% of each managed cache instance for operational needs, such as replication, failover, and memory management, so the usable cache capacity will be less than the total provisioned size.
To choose the right cache size, you’ll need to know your Redis key count, which determines your cache pressure.
This is driven by the following equation:
For example, if you have 5,000 Consumers, 3,000 Routes, and 3 windows, this produces a theoretical key space of 45 million counters per window cycle, each needing a periodic sync to Redis.
The sync rate determines how aggressively these counters are pushed, and the cache instance must absorb both the read (fetch counters) and write (push diffs) load.
The following table describes which cache size you should use based on your entity count (Consumers and Routes), rate limit windows, and target number of requests per second (RPS):
|
Deployment profile
|
Entities (Consumers × Routes × Windows)
|
Target RPS
|
Recommended minimum instance
|
Recommended sync rate
|
Notes
|
|
Small/Dev/Test
|
≤100 × ≤100 × 1 window
|
≤1,000
|
cache.t3.small
|
0.5
|
Micro fails at 10K RPS. Small handles 1K RPS baseline cleanly.
|
|
Standard enterprise
|
≤1,000 × ≤100 × 3 windows
|
≤10,000
|
cache.t3.medium
|
0.5
|
–
|
|
Large enterprise
|
≤5,000 × ≤3,000 × 3 windows
|
≤10,000
|
cache.m5.xlarge
|
0.5–1.0
|
Large instances are overwhelmed at 0.1 sync rate with this entity count. xLarge provides headroom.
|
|
High-scale enterprise
|
≤5,000 × ≤3,000 × 3 windows
|
≤20,000
|
cache.m5.2xlarge
|
0.5–1.0
|
–
|
|
Ultra-high-scale
|
5,000 × >3,000 × 3 windows
|
≤65,000
|
cache.m5.4xlarge
|
0.5
|
At this tier, it’s critical that the base RPS you configured for the Dedicated Cloud Gateway is accurate to your production traffic.
|
The sync rate is the most impactful tuning lever and interacts directly with cache sizing:
|
Sync rate
|
Syncs per second
|
Notes
|
|
0.1
|
10
|
Highest Redis command load.
Only viable on cache.m5.xlarge or larger when entity counts exceed 1,000 Consumers.
On smaller instances, it causes cache CPU saturation, Redis timeout cascades, and data plane node restarts.
Only use this when sub-second rate limiting accuracy is business-critical.
If you must use sync rate 0.1 for accuracy, size up the cache by at least one tier beyond what the entity count alone would suggest.
If you can tolerate sync rate 0.5, you can use a smaller cache instance.
|
|
0.5
|
2
|
Recommended default for production.
Best balance of accuracy and resource efficiency.
Stable across all instance types for standard workloads.
For high-entity deployments, this works well on cache.m5.large and above.
|
|
1.0
|
1
|
Lowest Redis load, but introduces rate limiting accuracy degradation.
At high entity counts, the rate limited percentage drops to 57–60% (expected: ~99%), which allows requests through that should be blocked.
Only use for non-critical or approximate rate limiting at very low entity counts.
|