Each run covers failover and recovery for each region. Use real production APIs rather than mocks.
Before starting, establish a requests-per-second baseline and let monitoring normalize. Keep observability plugins like OpenTelemetry, Datadog, and HTTP Log running alongside control plane analytics throughout.
Recommended transaction sets:
|
Transaction set
|
Requests
|
Duration
|
|
Set 1
|
50,000
|
30 minutes
|
|
Set 2
|
100,000
|
30 minutes
|
|
Set 3
|
250,000
|
30 minutes
|
For example, a transaction set would look like the following for set 1 for the US region:
- Start sending requests at your 50,000 baseline. Let things normalize.
- Enable the Pre-Function plugin. The US region starts returning
400 and Route 53 marks it as unhealthy. Traffic shifts to EU.
- Keep the load running. Disable the plugin, which will cause US health checks to pass again. Route 53 gradually routes traffic back and RPS returns to baseline across both regions.
Start load generation at your RPS baseline. After things are stable, enable the Pre-Function plugin on the health check route.
Health check requests will start returning 400, and the DNS health checker will eventually mark the region as unhealthy.
DNS should shift traffic to the alternate region automatically.
Let the test run for the full transaction set duration.
Watch for the health check status to change in your DNS provider and confirm traffic is shifting to the alternate region.
Capture observability output and analytics as you go.
How quickly the DNS provider marks the endpoint unhealthy depends on your health check interval and failure threshold settings.
Test recovery while the target region is still unhealthy and load generation is active.
Disable the plugin to let health checks pass again, and the DNS provider will start routing traffic back to the recovered region.
Traffic should gradually split across both regions as health checks pass.
Use analytics to confirm RPS returns to your baseline.