Downloading mass of Unified Audit Logs (UAL) data

%3CLINGO-SUB%20id%3D%22lingo-sub-2112929%22%20slang%3D%22en-US%22%3EDownloading%20mass%20of%20Unified%20Audit%20Logs%20(UAL)%20data%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-2112929%22%20slang%3D%22en-US%22%3E%3CP%3EHi%2C%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EI%20work%20in%20data%20forensics.%20We%20often%20need%20to%20download%20months%20of%20UAL%20data%20from%20customers'%20Office%20365%20environment%20to%20analyze%20incidents.%20For%20example%2C%20I%20recently%20had%20to%20download%203%20months%20of%20data%2C%20which%20summed%20up%20to%2030%20GB%20of%20data%20(CSV%20w%2F%20embedded%20JSON).%20We%20do%20not%20want%20to%20filter%20on%20RecordType%2C%20UserIds%2C%26nbsp%3BOperations%2C%20Workload%2C%20etc.%20We%20need%20everything.%3CBR%20%2F%3E%3CBR%20%2F%3EWe%20are%20currently%20downloading%20data%2C%20by%20slices%20of%2015%20minutes%2C%20using%20the%20Search-UnifiedAuditLog%20PowerShell%20command.%20If%20we%20use%20bigger%20intervals%2C%20we%20experience%20errors%20(e.%20g.%2C%20missing%2Fempty%20data%2C%20timeouts%2C%20crashes%2C%20etc).%20Even%20then%2C%20we%20still%20experience%20errors%20every%20now%20and%20then.%20Also%2C%20we%20can%20never%20be%20sure%20that%20our%20data%20is%20100%25%20complete.%3CBR%20%2F%3E%3CBR%20%2F%3EAll%20in%20all%2C%20it%20can%20take%20us%20up%20to%204%20work%20days%20to%20be%20able%20to%20download%20a%20full%20set%20of%203%20months%20%2F%2030%20GB%20of%20UAL%20data.%20That%20is%20the%20first%20step%20before%20we%20can%20start%20analyzing%20the%20data%20(e.%20g.%2C%20importing%20the%20data%20to%20a%20database%2C%20adding%20indexes%2C%20augmenting%20the%20data%20with%20other%20sources%20of%20information%2C%20running%20queries%2C%20building%20new%20queries%20based%20on%20the%20specific%20incident%2C%20etc).%20The%20process%20is%20slow%20and%20painful.%20I%20have%20even%20started%20catching%20exceptions%20and%20sending%20them%20by%20SMS%20to%20my%20personal%20cell%20phone.%3CBR%20%2F%3E%3CBR%20%2F%3EWould%20any%20of%20you%20know%20of%20a%20more%20suitable%20way%20of%20gathering%20heaps%20of%26nbsp%3BUAL%20data%3F%20Note%3A%20not%20downloading%20a%20copy%20of%20the%20whole%20data%20(e.%20g.%2C%20running%20queries%20manually%20through%20the%20Security%20%26amp%3B%20Compliance%20Center)%20is%20out%20of%20the%20question%2C%20partly%20for%20preservation%20%2F%20legal%20reasons.%3CBR%20%2F%3E%3CBR%20%2F%3EThanks!%20%3A)%3C%2Fimg%3E%3CBR%20%2F%3E%3CBR%20%2F%3ERegards%2C%3CBR%20%2F%3E%3CBR%20%2F%3ESimon%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-LABS%20id%3D%22lingo-labs-2112929%22%20slang%3D%22en-US%22%3E%3CLINGO-LABEL%3EAdmin%3C%2FLINGO-LABEL%3E%3CLINGO-LABEL%3EDeveloper%3C%2FLINGO-LABEL%3E%3CLINGO-LABEL%3EOffice%20365%3C%2FLINGO-LABEL%3E%3C%2FLINGO-LABS%3E%3CLINGO-SUB%20id%3D%22lingo-sub-2113176%22%20slang%3D%22en-US%22%3ERe%3A%20Downloading%20mass%20of%20Unified%20Audit%20Logs%20(UAL)%20data%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-2113176%22%20slang%3D%22en-US%22%3E%3CP%3E%3CA%20href%3D%22https%3A%2F%2Ftechcommunity.microsoft.com%2Ft5%2Fuser%2Fviewprofilepage%2Fuser-id%2F58%22%20target%3D%22_blank%22%3E%40Vasil%20Michev%3C%2FA%3EThis%20will%20not%20work%20for%20us%20as%20it%20would%20require%20too%20much%20setup%20plus%20we%20cannot%20predict%20which%20of%20our%20customers%20have%20Active%20Directory%20LDAP%2C%20Azure%20Active%20Directory%20(AAD)%20or%20neither.%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-SUB%20id%3D%22lingo-sub-2113043%22%20slang%3D%22en-US%22%3ERe%3A%20Downloading%20mass%20of%20Unified%20Audit%20Logs%20(UAL)%20data%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-2113043%22%20slang%3D%22en-US%22%3E%3CP%3EExchange%20Remote%20PowerShell%20is%20definitely%20not%20the%20best%20tool%20to%20work%20with%20such%20amounts%20of%20data.%26nbsp%3B%3C%2FP%3E%0A%3CP%3ETake%20a%20look%20at%20the%20Management%20activity%20APIs%20instead%3A%26nbsp%3B%3CA%20href%3D%22https%3A%2F%2Fdocs.microsoft.com%2Fen-us%2Foffice%2Foffice-365-management-api%2Foffice-365-management-activity-api-reference%22%20target%3D%22_blank%22%20rel%3D%22noopener%20noreferrer%22%3Ehttps%3A%2F%2Fdocs.microsoft.com%2Fen-us%2Foffice%2Foffice-365-management-api%2Foffice-365-management-activity-api-reference%3C%2FA%3E%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-SUB%20id%3D%22lingo-sub-2454899%22%20slang%3D%22en-US%22%3ERe%3A%20Downloading%20mass%20of%20Unified%20Audit%20Logs%20(UAL)%20data%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-2454899%22%20slang%3D%22en-US%22%3EVasil%2C%20the%20management%20activity%20API's%20are%20generally%20not%20suited%20to%20forensics%20because%20they%20are%20limited%20to%20retrieving%20data%20that's%20no%20more%20than%207%20days%20old.%20Per%20the%20documentation%20linked%20above%2C%20the%20start%20time%20and%20end%20date%20query%20parameters%20must%20conform%20to%20the%20following%3A%3CBR%20%2F%3E%3CBR%20%2F%3E%22Both%20must%20be%20specified%20(or%20both%20omitted)%20and%20they%20must%20be%20no%20more%20than%2024%20hours%20apart%2C%20with%20the%20start%20time%20no%20more%20than%207%20days%20in%20the%20past.%22%3C%2FLINGO-BODY%3E%3CLINGO-SUB%20id%3D%22lingo-sub-2454950%22%20slang%3D%22en-US%22%3ERe%3A%20Downloading%20mass%20of%20Unified%20Audit%20Logs%20(UAL)%20data%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-2454950%22%20slang%3D%22en-US%22%3E%3CA%20href%3D%22https%3A%2F%2Ftechcommunity.microsoft.com%2Ft5%2Fuser%2Fviewprofilepage%2Fuser-id%2F953801%22%20target%3D%22_blank%22%3E%40scharest%3C%2FA%3E%20interested%20to%20know%20if%20you've%20made%20any%20progress%20or%20found%20any%20solutions%20to%20this%20issue.%20This%20is%20a%20constant%20thorn%20in%20the%20side%20of%20forensic%20investigators%20at%20my%20company%20as%20well.%3C%2FLINGO-BODY%3E
New Contributor

Hi,

 

I work in data forensics. We often need to download months of UAL data from customers' Office 365 environment to analyze incidents. For example, I recently had to download 3 months of data, which summed up to 30 GB of data (CSV w/ embedded JSON). We do not want to filter on RecordType, UserIds, Operations, Workload, etc. We need everything.

We are currently downloading data, by slices of 15 minutes, using the Search-UnifiedAuditLog PowerShell command. If we use bigger intervals, we experience errors (e. g., missing/empty data, timeouts, crashes, etc). Even then, we still experience errors every now and then. Also, we can never be sure that our data is 100% complete.

All in all, it can take us up to 4 work days to be able to download a full set of 3 months / 30 GB of UAL data. That is the first step before we can start analyzing the data (e. g., importing the data to a database, adding indexes, augmenting the data with other sources of information, running queries, building new queries based on the specific incident, etc). The process is slow and painful. I have even started catching exceptions and sending them by SMS to my personal cell phone.

Would any of you know of a more suitable way of gathering heaps of UAL data? Note: not downloading a copy of the whole data (e. g., running queries manually through the Security & Compliance Center) is out of the question, partly for preservation / legal reasons.

Thanks! :)

Regards,

Simon

5 Replies

Exchange Remote PowerShell is definitely not the best tool to work with such amounts of data. 

Take a look at the Management activity APIs instead: https://docs.microsoft.com/en-us/office/office-365-management-api/office-365-management-activity-api...

 

@Vasil MichevThis will not work for us as it would require too much setup plus we cannot predict which of our customers have Active Directory LDAP, Azure Active Directory (AAD) or neither.

Vasil, the management activity API's are generally not suited to forensics because they are limited to retrieving data that's no more than 7 days old. Per the documentation linked above, the start time and end date query parameters must conform to the following:

"Both must be specified (or both omitted) and they must be no more than 24 hours apart, with the start time no more than 7 days in the past."
@scharest interested to know if you've made any progress or found any solutions to this issue. This is a constant thorn in the side of forensic investigators at my company as well.
We developed our own fetching script in PowerShell. We also compared it to different tools by comparing data and using statistics. So far, the best tool out there seems to be the Office 365 Extractor by PwC (https://github.com/PwC-IR/Office-365-Extractor) as it manages errors/timeouts/retries to some extent. So we mostly use that one, for UAL, and then we use our own tools for data processing and analysis. To fetch other types of data, we use our own PS scripts. Only UAL is a big problem: there is way too much throttling, random errors and/or unexplainable empty recordsets.