Report Date: 2024-05-01
Incident Date(s): 2024-04-18
On 2024-04-18 at 18:30 UTC, the F5 Distributed Cloud support team used internal monitoring to detect 503 service unavailable error on Customer Edges (CE). The ensuing investigation revealed that this issue was confined to registration of new customer edges or software upgradation on existing ones. Traffic processing was not impacted.
Further examination pinpointed the onset of the 503 error to the point when the F5 Distributed Cloud team identified an incorrect CE configuration which resulted in the service responsible for CE configuration management to malfunction.
To address the issue, the F5 Distributed Cloud team manually removed the incorrect configuration and restarted the CE configuration management service. Post-fix validation confirmed the CE management issue to have been resolved on 2024-04-18 by 21:20 UTC. End of the service event.
The service event lasted 2 hours and 50 minutes.
Start time of Service Event | 2024-04-18 18:30 UTC |
Conclusion of Service Event | 2024-04-18 21:20 UTC |
Event duration | 2 hours 50 minutes |
Impact | Distributed Cloud customers may have experienced issues upgrading software on the existing Customer Edges or registration of new ones. |
Root cause | A new Customer Edge site was created with an incorrect configuration, resulting in management of the Customer Edge to exhibit failures. |
DATE | TIME (UTC) | ACTION |
---|---|---|
2024-04-18 | 18:30 | F5 Distributed Cloud support team identified 503 errors in Customer Edge (CE) management via proactive monitoring. |
2024-04-18 | 18:30 – 20:24 | F5 Distributed Cloud team investigated and determined that registration of new CE sites and software upgrade on existing ones was affected. No impact to traffic processing. |
2024-04-18 | 20:30 – 21:00 | An incorrect configuration on a newly created CE was identified to be the cause of the 503 errors. |
2024-04-18 | 21:20 | The F5 Distributed Cloud team manually rectified the incorrect configuration and restarted the CE configuration management service, post which, no further errors were observed with CE management This is the end of service event. |
Yes, the service event is resolved, and the Customer Edges services are fully operational.
The functionality of a particular service is to oversee the configuration management when a new CE is registered or when existing ones undergo a software upgrade. However, due to an under-optimized code base, the service was unable to adequately handle the impact of an incorrect CE configuration, leading to its failure. Consequently, users experienced 503 errors when attempting to create new CE sites or upgrade existing ones.
The F5 Distributed Cloud team rectified the incorrect configuration manually and subsequently restarted the affected service.
We will take the below measures to prevent this service event from reoccurring and to ensure that we are better prepared to react to and recover from similar scenarios more quickly.
F5® understands how important reliability of the Distributed Cloud Platform is for customers. F5 will ensure the recommended changes in this document are canonized into our operational Methods of Procedure (MoP) moving forward. We are grateful you have chosen to partner with F5® for critical service delivery and are committed to evolving our platform and tooling to better anticipate and mitigate disruptions to Distributed Cloud Platform services.
F5 Glossary