1. Filter by SnapShotDate
When gathering SharePoint data through Microsoft Graph Data Connect (MGDC), you must choose the date to query. This is expressed as a required filter on the SnapshotDate column (see picture below for an example using the Synapse Copy Data tool). You typically want the "Start time" and "End time" to be the same date (we don't look at the time portion), so you capture the data for that specific day.
2. The latest data
You typically want the latest data available, which for SharePoint on MGDC is two days ago. So, if today is 2022-10-29, you probably want to set both the "Start time" and the "End time" to 2022-10-27. However, you can query any of the last 21 days, counting from 2 days ago. That means you can query from today’s date minus 2 to today’s date minus 23. For instance, if today is 2022-10-29, you can query any date from 2022-10-06 to 2022-10-27.
3. State datasets
This 21-day range applies to many of the SharePoint datasets on MGDC, including:
- BasicDataSet_v0.SharePointSites_v1
- BasicDataSet_v0.SharePointGroups_v1
- BasicDataSet_v0.SharePointPermissions_v1
- BasicDataSet_v0.SharePointFiles_v1
These are state datasets, which means they include all objects of that type in SharePoint at the time of the request. So, if you query for Sites on a specific day (as shown in the picture), you’ll get a full list of all the Sites in your tenant as of that date, not just the ones created or updated on that day.
4. Looking back
Why look back, then? Well, you might want to keep regular snapshots of the data to track how things are growing over time and maybe you lost a recent day on your capture process due to operational issues. This way you can look back a bit and "complete your collection".
5. There is a limit
Why have that limit at all? Well, if we kept the data for longer than a month there would be additional compliance requirements. We need to make sure it does not break any data retention rules. We also give ourselves a week to clean up the older data and that’s why we use 21 days instead of a full month.
By the way, we do encourage you to also check your compliance requirements if you intend to keep the data from MGDC in your Azure Storage account for a long time. You might have similar restrictions.
6. The sign-up date
There is another detail about this. SharePoint on MGDC does not do backfills, which means you can only look back to dates on or after you enabled the collection of data in the Admin Center (see picture below).
If you signed up for MGDC with SharePoint on 2022-10-15, it will take 48 hours to set up your fist collection, which means the first date you can query would be 2022-10-17. Since we're always looking 48 hours in the past, you will need to wait until 2022-10-19 to query that date. Effectively, you need to wait 96 hours after you check the box before you can run you first request. After that, you can query daily, always looking back at least 2 days.
Also, if you signed up for MGDC with SharePoint on 2022-10-15 and today is 2022-10-29, you can only query from 2022-10-17 (date of the first collection) to 2022-10-27 (two days ago). Once it’s been 21 days since you enabled MGDC with SharePoint datasets, this is no longer an issue.
7. Error messages
You might be wondering what happens if you query a date outside of these boundaries. Well, your request to MGDC will fail ☹.
Here’s what the error message (for the Sites dataset) looks like when you monitor your Copy Data activity in Azure Data Factory or Azure Synapse. If the SharePoint data extract failed for a dataset because the data is outside of the valid date range, your error message will look like this:
"ErrorCode=UserErrorOffice365DataLoaderError,
'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,
Message=Office365 data loading failed to execute.
office365LoadErrorType: PermanentError.
Table [BasicDataSet_v0.SharePointSites_v1] only support data for the past [21]days.
Please rerun the job with valid start and end dates,
Source=Microsoft.DataTransfer.ClientLibrary,'"
If the SharePoint data extract failed for a dataset because the data is not available for the date you requested, your error message will look like this:
"ErrorCode=UserErrorOffice365DataLoaderError,
'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,
Message=Office365 data loading failed to execute.
office365LoaderErrorType: PermanentError.
Your dataset request failed. Please verify the dates you used in the request. You might also want to review our documentation at https://aka.ms/mgdcdocs. If you just enabled the MGDC, try your request again in 48 hours. Please reach out to dataconnect@microsoft.com for further support.
Source=Microsoft.DataTransfer.ClientLibrary,'"
8. Delta State Datasets
As mentioned earlier, you should specify the same date for both start date and end date to get a complete set of objects for that specific date, also called a "full pull". If you specify different dates, you are asking for only the objects that changed between those two dates, also called a "delta pull". For more details about deltas, read this blog: How to Use Delta State Datasets.
9. Summary
In summary, you can query dates from [Today – 2] days to [Today – 23 days] (or the day of your first collection, after you enabled MGDC for SharePoint, whichever is more recent). The latest date you can query is 2 days ago and that is probably what you are looking for. You should always use the same start date and end date to get a complete set of objects, unless you want only the objects that changed between the two dates.
I hope this blog post helped you better understand which dates you can query for SharePoint Data on the Microsoft Graph Data Connect. For more information about SharePoint Data in MGDC, please visit the collection of links I keep at https://aka.ms/SharePointData.