At approximately 00:17 UTC on April 25, 2025, we observed elevated server errors and increased latency impacting multiple API endpoints, most notably the Presence service. Our engineering team immediately began investigating the issue. We identified that the root cause involved resource contention within a specific component of our backend infrastructure responsible for managing presence state. We implemented targeted configuration changes to better distribute this traffic and alleviate the resource contention. The issue was fully resolved by 02:32 UTC on April 25, 2025.
To prevent a similar issue from occurring in the future, we have already implemented specific configuration changes to ensure the responsible backend component can more effectively handle the type of traffic pattern encountered. Furthermore, we are actively working to enhance our monitoring and alerting systems.