Issue with Email and SMS Alerts
Incident Report for F5 Distributed Cloud
Postmortem

F5® Distributed Cloud Services- SMS Alerting Service

Root Cause Analysis for SMS Alerting issue for US based customers

Report Date: 2024-03-28

Incident Date: 2024-02-15

EVENT SUMMARY

On 2024-02-15 at 14:44 UTC, the F5 Distributed Cloud support team received an initial report about a failure of the SMS alerting system. Upon receiving the report, the F5 Distributed Cloud team started investigation and isolated the issue to be impacting US region only. Subsequently over the next few days, the F5 Distributed Cloud support team received a few more customer reports about the SMS alert failures.

Detailed analysis revealed that the SMS alerts were failing at the messaging service provider’s end. It was identified that the F5 sender ID (From Number) was unregistered with the messaging service provider which was leading to SMS delivery failures. The SMS service provider had recently enforced this change under a compliance measure required for all US telecom carriers. F5 Distributed Cloud team had inadvertently missed updating the sender ID.

Due to an increased amount of time being taken by the existing messaging service provider to get the sender ID registered, the F5 Distributed Cloud team decided to move over to another provider instead.

On 2024-03-18 at 06:15 UTC, a configuration change was deployed by the F5 Distributed Cloud support team to fully resolve the SMS alerting issue. Post the fix being implemented, the teams confirmed SMS alerts functionality to be restored and fully functional.

The total duration of the service event was 31 days, 15 hours and 31 minutes.

WHAT HAPPENED?

INCIDENT DETAILS:

Start time of Service Event 2024-02-15 14:44 UTC
Conclusion of Service Event 2024-03-18 06:15 UTC
Event duration 31 days, 15 hours and 31 minutes
Impact Distributed Cloud customers in the US region were unable to receive SMS from the platform alerting system 
Root cause F5 Distributed Cloud inadvertently missed registering the sender ID (From Number) with the messaging provider, which caused the SMS delivery failures

TIMELINE OF EVENTS:

DATE TIME UTC ACTION
2024-02-15 14:44 F5 Distributed Cloud team received the initial customer report informing us about the SMS alert failure.
2024-02-15 14:57 Customer re-confirmed that there was no progress when they tried to receive the verification code through SMS.
2024-02-17 12:24 After further investigation, support reached out to the engineering team for additional assistance.
2024-02-23 17:56 Support reached out to the customer and wanted to confirm if the issue has been resolved since the engineering team notified us that there was an error on our side which seems to be resolved.
2024-02-27 13:23 The customer informed us that they are still facing the issue and no progress was seen.
2024-02-29 7:48 F5 Distributed Cloud Team discovered that the F5 number has been unregistered by the messaging service provider. Re-registration efforts were started.
2024-03-12 00:02 Due to delay in registration process with existing vendor, F5 team registered with new messaging service vendor successfully
2024-03-12 14:21 Support reached back to the customer and informed them that the engineering team had implemented the hotfix, and the issue should now have been resolved. Thus, they requested customers to test from their end.
2024-03-12 14:37 Customer confirmed that they were still having the issue for country code +1, and one of their Solution Architect based in Mexico was able to receive the verification SMS making customers to think issue is happening with American numbers.
2024-03-18 09:50 F5 Distributed Cloud team informed through status page that the issue has been resolved and the SMS alerting issue has been restored for all customers

IS THE SERVICE EVENT FULLY RESOLVED?

Yes, the SMS alerting service is fully operational.

ROOT CAUSE

Application to Person (A2P) messaging is SMS/MMS traffic in which a person is receiving messages from an application rather than another individual. US telecom carriers consider any messages sent from any messaging provider to be A2P message. In this scenario, the F5 alert manager was sending the alerts to the person which required the sender ID (From Number) to be registered with A2P vendor. As the number registration was inadvertently missed, SMS delivery was falling.

RESOLUTION

F5 Distributed Cloud team switched over to another messaging provider, which is FED compliant, and had completed the registration process for adding the F5 sender ID (from Number). A platform configuration change was deployed to reflect the change, post which, the SMS alerting service became operational.

NEXT STEPS: FUTURE EVENT PREVENTION

We will be taking several measure(s) to prevent this service event from reoccurring and to ensure that we are better prepared to react to and recover from similar scenarios more quickly. 

  • The F5 Distributed Cloud support team will revisit the messaging provider’s documentation on a recurring basis to ensure all compliances are met.

CLOSING

F5® understands how important reliability of the Distributed Cloud Platform is for customers. F5 will ensure the recommended changes in this document are canonized into our operational Methods of Procedure (MoP) moving forward. We are grateful you have chosen to partner with F5® for critical service delivery and are committed to evolving our platform and tooling to better anticipate and mitigate disruptions to Distributed Cloud Platform services.

APPENDICES

F5 Glossary

https://www.f5.com/services/resources/glossary

Posted Mar 28, 2024 - 01:29 UTC

Resolved
The F5 Distributed Cloud team confirmed the SMS alerting issue is restored. This incident has been resolved.
Posted Mar 18, 2024 - 09:50 UTC
Monitoring
The F5 Distributed Cloud team completed the activity and deployed the configuration change. The issue has been mitigated. We are currently monitoring the platform.
Posted Mar 18, 2024 - 07:12 UTC
Update
The F5 Distributed Cloud Team would have to deploy a config change and restart a alert service at the following time. It is expected to see errors related to the alert before configuration been fully loaded. We apologize for the confusion caused this activity.

Time:
Start: 05:00AM Mar 18 2024 UTC
Expected end time: 09:00AM Mar 18 2024 UTC
Posted Mar 17, 2024 - 05:01 UTC
Update
F5 Distributed Cloud support team will deploy a platform change to fully address the SMS alerting issue. The change is currently under testing and is expected to be completed and deployed in the week of March 18th. More updates will be provided soon.
Posted Mar 15, 2024 - 10:33 UTC
Identified
F5 Distributed Cloud Team is actively addressing the SMS alerting issue. Traffic processing and core services are operating normally, however, customers cannot receive SMS notifications on a real-time basis in the US region. F5 Distributed Cloud team continues working on priority to restore services. More updates will be provided soon.
Posted Mar 12, 2024 - 16:50 UTC
Monitoring
Distributed Cloud console issue has been resolved. We are continuously monitoring the system.
Posted Mar 12, 2024 - 06:32 UTC
Investigating
The F5 Distributed Cloud Team has identified an issue with Distributed Cloud alert console where some of our customers are getting 500 server error while accessing the browser. We are actively addressing and resolving the issue.
Posted Mar 12, 2024 - 03:02 UTC
Update
Email alerts has been restored. We are continuously monitoring the system
Posted Mar 12, 2024 - 01:23 UTC
Monitoring
A fix has been implemented and deployed. Presently we are monitoring the system
Posted Mar 12, 2024 - 00:02 UTC
Update
We have identified the issue and working the fix
Posted Mar 11, 2024 - 22:54 UTC
Identified
The F5 Distributed Cloud Team has identified an issue with email and SMS alerts that is currently restricting the capability for some of our customers to receive their configured notifications in real-time. We are actively addressing and resolving the issue
Posted Mar 11, 2024 - 19:52 UTC