We are using the API to retrieve audit log events from multiple channels (Azure AD, SharePoint, etc.) for a very large tenant, meaning that we need to retrieve potentially millions of events over a relatively short time span.
O365 gathers audit events into a series of "blobs" which then contain a number of individual event (JSON messages). To my understanding, which in part comes from correspondence with the API's dev. team and from reading the docs, these blobs should contain a "considerable" number events as to function as a sort of batch approach when doing the actual web requests.
In our approach, we request blobs URLs for an interval of an hour, and then do a request for the individual blobs.
However, we have tested with a number of different tenants and different PublisherIdentifiers, but only seem to get around 2.5 messages per blob on average, no matter the total number of events "waiting" to be fetched.
This becomes a major issue for the larger tenants as is puts a strain on the SIEM solution running the fetcher logic (a Python service), due to number of request/seconds, and it also gives us with throttling issues with the API itself. In effect, we simply cannot fetch the audit events fast enough to keep up - within the retention period.
A "funny" thing is, that if we use the visual query tool within the Admin Center of the tenant, it searches and retrieves the log messages very fast.
Has anyone had any experience with this issue, or perhaps a better "batch performance"?
As mentioned we have been in direct contact with the dev team and the program manager in Redmond. They have been very helpful with other issues we had, but they referred us to support for this specific issue - who in turn referred us to the forums / community. We currently do not have access to premium support...