Report Date: 2024-04-23
Incident Date(s): 2024-04-11
On 2024-04-11 at 12:45 UTC, the F5 Distributed Cloud support team used internal monitoring to detect that some load balancers were failing to resolve hostnames. The ensuing investigation revealed that this issue was confined to load balancers that were either newly created or updated during this incident window, while the functionality of existing ones remained unaffected.
Further examination pinpointed the onset of the hostname resolution failure to the moment when the number of DNS records per zone reached a set threshold, which in turn disrupted the creation of DNS A records for load balancers.
To address the issue, the F5 Distributed Cloud team raised the threshold limit and restarted the DNS A record creation process on 2024-04-11 at 19:40 UTC. Subsequently, the affected load balancers began to recover in sequence, with the hostname resolution failures ceasing.
The service disruption lasted for 6 hours and 55 minutes. Throughout this period, no customer reports regarding the issue were received.
Start time of Service Event | 2024-04-11 12:45 UTC |
Conclusion of Service Event | 2024-04-11 19:40 UTC |
Event duration | 6 hours 55 minutes |
Impact | Newly created or recently updated load balancers were exhibiting hostname resolution failure. Customers might have experienced failure while accessing web applications. |
Root cause | Threshold exhaustion for DNS record per zone affected creation of DNS A records which resulted in hostname resolution failure on load balancers. |
DATE | TIME (UTC) | ACTION |
---|---|---|
2024-04-11 | 12:45 | Internal monitoring detected hostname resolution issue on load balancers. |
2024-04-11 | 13:09 | F5 Distributed Cloud team started to investigate and identified that new and/or recently modified load balancers are affected. |
2024-04-11 | 17:43 | It was identified that DNS A record was not getting created due to threshold exhaustion for DNS record per zone. |
2024-04-11 | 19:04 | F5 Distributed Cloud team increased the threshold and reinitiated the DNS A record creation. |
2024-04-11 | 19:40 | The F5 Distributed Cloud team validated and confirmed the issue with Load Balancer's exhibiting name resolution failures has been restored and no more issues are observed. End of service event. |
Yes, the hostname resolution issue with load balancer is resolved.
When new load balancers of any type are created, a virtual host DNS object also gets created. An internal service monitors the creation / deletion of this object and attempts to create a DNS A record on the DNS infrastructure. This A record points to the IP address configured on the load balancer. Over time we exhausted the limit for DNS record per zone which prevented further DNS A record creation impacting the load balancer name resolution. Due to lack of specific alert tracking, the limit exhaustion could not be detected which eventually triggered the service event.
The F5 Distributed Cloud team increased the DNS record per zone threshold which allowed DNS A record to get created. This restored the hostname resolution issue.
We will be taking several measures to prevent this service event from reoccurring and to ensure that we are better prepared to react to and recover from similar scenarios more quickly.
F5® understands how important reliability of the Distributed Cloud Platform is for customers. F5 will ensure the recommended changes in this document are canonized into our operational Methods of Procedure (MoP) moving forward. We are grateful you have chosen to partner with F5® for critical service delivery and are committed to evolving our platform and tooling to better anticipate and mitigate disruptions to Distributed Cloud Platform services.
F5 Glossary