Experiencing Data Latency in Azure portal for Activity Log Alerts - 10/28 - Resolved

Published Oct 28 2019 11:20 AM 758 Views
Final Update: Tuesday, 29 October 2019 01:26 UTC

We've confirmed that all systems are back to normal with no customer impact as of 10/29, 1:00 UTC. Our logs show the incident started on 10/27, 1:00 UTC and during the impact window, customers would have experienced varying latencies up to approximately 8 hours for their Activity Logs of category Policy and Administrative to become available in South Central US, West US 2, West US, East US 2, and Central US and corresponding alerts against these Activity Logs to fire.
  • Root Cause: The failure was due to a service reaching an operational threshold, which even when scaled out, took several hours to process the backlog of activity logs.
  • Incident Timeline: 48 Hours & 0 minutes - 10/27, 1:00 UTC through 10/29, 1:00 UTC
We understand that customers rely on Activity Log Alerts as a critical service and apologize for any impact this incident caused.

-Matt

Update: Monday, 28 October 2019 23:50 UTC

The number of undelivered activity logs continues to decrease.  However, customers may continue to see activity log alerts with several hours of latency.  Our current ETA for when the queue will be completely cleared is approximately 4 hours, but engineers will continue working to increase the speed of processing.
  • Work Around: None.
  • Next Update: Before 10/29 02:00 UTC
-Matt

Update: Monday, 28 October 2019 21:31 UTC

Root cause has been isolated to throttling issues that were occurring in a dependent service which allowed a backlog of undelivered activity logs to form.  Engineers have scaled out the service in paired regions to attempt to process the backlog and we are seeing a gradual reduction in latency.  We do not currently have an estimate for when the queue will be fully cleared.
  • Work Around: None.
  • Next Update: Before 10/29 00:00 UTC
-Matt

Update: Monday, 28 October 2019 18:59 UTC

We continue to investigate issues within Activity Log Alerts. At this time, we understand customer impact to be limited to Activity Log Alerts configured to alert against Activity Logs originating in South Central US, West US 2, West US, East US 2, and Central US where the category of the Activity Log is Policy or Administrative.   We currently have no estimate for resolution.
  • Work Around: None.
  • Next Update: Before 10/28 21:00 UTC
-Matt

Initial Update: Monday, 28 October 2019 18:17 UTC

We are aware of issues within Activity Log Alerts and are actively investigating. Some customers may experience latency for ARM-based Activity Log Alerts of up to 8 hours for all US regions.
  • Work Around: None
  • Next Update: Before 10/28 20:30 UTC
We are working hard to resolve this issue and apologize for any inconvenience.
-Matt

%3CLINGO-SUB%20id%3D%22lingo-sub-960148%22%20slang%3D%22en-US%22%3EExperiencing%20Data%20Latency%20in%20Azure%20portal%20for%20Activity%20Log%20Alerts%20-%2010%2F28%20-%20Resolved%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-960148%22%20slang%3D%22en-US%22%3E%3CDIV%20style%3D%22font-size%3A14px%3B%22%3E%3CDIV%20style%3D%22font-size%3A14px%3B%22%3E%3CDIV%20style%3D%22font-size%3A14px%3B%22%3E%3CDIV%20style%3D%22font-size%3A14px%3B%22%3E%3CU%3EFinal%20Update%3C%2FU%3E%3A%20Tuesday%2C%2029%20October%202019%2001%3A26%20UTC%3CBR%20%2F%3E%3CBR%20%2F%3EWe've%20confirmed%20that%20all%20systems%20are%20back%20to%20normal%20with%20no%20customer%20impact%20as%20of%2010%2F29%2C%201%3A00%20UTC.%20Our%20logs%20show%20the%20incident%20started%20on%2010%2F27%2C%201%3A00%20UTC%20and%20during%20the%20impact%20window%2C%20customers%20would%20have%20experienced%20varying%20latencies%20up%20to%20approximately%208%20hours%20for%20their%20Activity%20Logs%20of%20category%20Policy%20and%20Administrative%20to%20become%20available%20in%20South%20Central%20US%2C%20West%20US%202%2C%20West%20US%2C%20East%20US%202%2C%20and%20Central%20US%20and%20corresponding%20alerts%20against%20these%20Activity%20Logs%20to%20fire.%3CBR%20%2F%3E%3CUL%3E%0A%20%3CLI%3E%3CU%3ERoot%20Cause%3C%2FU%3E%3A%20The%20failure%20was%20due%20to%20a%20service%20reaching%20an%20operational%20threshold%2C%20which%20even%20when%20scaled%20out%2C%20took%20several%20hours%20to%20process%20the%20backlog%20of%20activity%20logs.%3C%2FLI%3E%0A%20%3CLI%3E%3CU%3EIncident%20Timeline%3C%2FU%3E%3A%2048%20Hours%20%26amp%3B%200%20minutes%20-%2010%2F27%2C%201%3A00%20UTC%20through%2010%2F29%2C%201%3A00%20UTC%3C%2FLI%3E%0A%3C%2FUL%3EWe%20understand%20that%20customers%20rely%20on%20Activity%20Log%20Alerts%20as%20a%20critical%20service%20and%20apologize%20for%20any%20impact%20this%20incident%20caused.%3CBR%20%2F%3E%3CBR%20%2F%3E-Matt%3CBR%20%2F%3E%3C%2FDIV%3E%3CHR%20style%3D%22border-top-color%3Alightgray%22%20%2F%3E%3CDIV%20style%3D%22font-size%3A14px%3B%22%3E%3CDIV%20style%3D%22font-size%3A14px%3B%22%3E%3CU%3EUpdate%3C%2FU%3E%3A%20Monday%2C%2028%20October%202019%2023%3A50%20UTC%3CBR%20%2F%3E%3CBR%20%2F%3EThe%20number%20of%20undelivered%20activity%20logs%20continues%20to%20decrease.%26nbsp%3B%20However%2C%20customers%20may%20continue%20to%20see%20activity%20log%20alerts%20with%20several%20hours%20of%20latency.%26nbsp%3B%20Our%20current%20ETA%20for%20when%20the%20queue%20will%20be%20completely%20cleared%20is%20approximately%204%20hours%2C%20but%20engineers%20will%20continue%20working%20to%20increase%20the%20speed%20of%20processing.%3CBR%20%2F%3E%3CUL%3E%3CLI%3E%3CU%3EWork%20Around%3C%2FU%3E%3A%20None.%3C%2FLI%3E%3CLI%3E%3CU%3ENext%20Update%3C%2FU%3E%3A%20Before%2010%2F29%2002%3A00%20UTC%3C%2FLI%3E%3C%2FUL%3E-Matt%3CBR%20%2F%3E%3C%2FDIV%3E%3CHR%20style%3D%22border-top-color%3Alightgray%22%20%2F%3E%3CDIV%20style%3D%22font-size%3A14px%3B%22%3E%3CDIV%20style%3D%22font-size%3A14px%3B%22%3E%3CU%3EUpdate%3C%2FU%3E%3A%20Monday%2C%2028%20October%202019%2021%3A31%20UTC%3CBR%20%2F%3E%3CBR%20%2F%3ERoot%20cause%20has%20been%20isolated%20to%20throttling%20issues%20that%20were%20occurring%20in%20a%20dependent%20service%20which%20allowed%20a%20backlog%20of%20undelivered%20activity%20logs%20to%20form.%26nbsp%3B%20Engineers%20have%20scaled%20out%20the%20service%20in%20paired%20regions%20to%20attempt%20to%20process%20the%20backlog%20and%20we%20are%20seeing%20a%20gradual%20reduction%20in%20latency.%26nbsp%3B%20We%20do%20not%20currently%20have%20an%20estimate%20for%20when%20the%20queue%20will%20be%20fully%20cleared.%3CBR%20%2F%3E%3CUL%3E%3CLI%3E%3CU%3EWork%20Around%3C%2FU%3E%3A%20None.%3C%2FLI%3E%3CLI%3E%3CU%3ENext%20Update%3C%2FU%3E%3A%20Before%2010%2F29%2000%3A00%20UTC%3C%2FLI%3E%3C%2FUL%3E-Matt%3CBR%20%2F%3E%3C%2FDIV%3E%3CHR%20style%3D%22border-top-color%3Alightgray%22%20%2F%3E%3CDIV%20style%3D%22font-size%3A14px%3B%22%3E%3CDIV%20style%3D%22font-size%3A14px%3B%22%3E%3CU%3EUpdate%3C%2FU%3E%3A%20Monday%2C%2028%20October%202019%2018%3A59%20UTC%3CBR%20%2F%3E%3CBR%20%2F%3EWe%20continue%20to%20investigate%20issues%20within%20Activity%20Log%20Alerts.%20At%20this%20time%2C%20we%20understand%20customer%20impact%20to%20be%20limited%20to%20Activity%20Log%20Alerts%20configured%20to%20alert%20against%20Activity%20Logs%20originating%20in%20South%20Central%20US%2C%20West%20US%202%2C%20West%20US%2C%20East%20US%202%2C%20and%20Central%20US%20where%20the%20category%20of%20the%20Activity%20Log%20is%20Policy%20or%20Administrative.%26nbsp%3B%20%26nbsp%3BWe%20currently%20have%20no%20estimate%20for%20resolution.%3CBR%20%2F%3E%3CUL%3E%3CLI%3E%3CU%3EWork%20Around%3C%2FU%3E%3A%20None.%3C%2FLI%3E%3CLI%3E%3CU%3ENext%20Update%3C%2FU%3E%3A%20Before%2010%2F28%2021%3A00%20UTC%3C%2FLI%3E%3C%2FUL%3E-Matt%3CBR%20%2F%3E%3C%2FDIV%3E%3CHR%20style%3D%22border-top-color%3Alightgray%22%20%2F%3E%3CDIV%20style%3D%22font-size%3A14px%3B%22%3E%3CDIV%20style%3D%22font-size%3A14px%3B%22%3E%3CU%3EInitial%20Update%3C%2FU%3E%3A%20Monday%2C%2028%20October%202019%2018%3A17%20UTC%3CBR%20%2F%3E%3CBR%20%2F%3EWe%20are%20aware%20of%20issues%20within%20Activity%20Log%20Alerts%20and%20are%20actively%20investigating.%20Some%20customers%20may%20experience%20latency%20for%20ARM-based%20Activity%20Log%20Alerts%20of%20up%20to%208%20hours%20for%20all%20US%20regions.%3CBR%20%2F%3E%3CUL%3E%3CLI%3E%3CU%3EWork%20Around%3C%2FU%3E%3A%20None%3C%2FLI%3E%3CLI%3E%3CU%3ENext%20Update%3C%2FU%3E%3A%20Before%2010%2F28%2020%3A30%20UTC%3C%2FLI%3E%3C%2FUL%3EWe%20are%20working%20hard%20to%20resolve%20this%20issue%20and%20apologize%20for%20any%20inconvenience.%3CBR%20%2F%3E-Matt%3C%2FDIV%3E%3CHR%20style%3D%22border-top-color%3Alightgray%22%20%2F%3E%3C%2FDIV%3E%3C%2FDIV%3E%3C%2FDIV%3E%3C%2FDIV%3E%3C%2FDIV%3E%3C%2FDIV%3E%3C%2FDIV%3E%3C%2FLINGO-BODY%3E%3CLINGO-LABS%20id%3D%22lingo-labs-960148%22%20slang%3D%22en-US%22%3E%3CLINGO-LABEL%3EActivity%20Log%20Alerts%3C%2FLINGO-LABEL%3E%3C%2FLINGO-LABS%3E
Version history
Last update:
‎Oct 28 2019 06:35 PM
Updated by: