0. Introduction
When you use Microsoft Graph Data Connect, you will commonly refer to the dataset schemas to understand what is provided with each dataset. For SharePoint datasets, you can see the schemas using this link: https://aka.ms/SharePointDatasets.
Microsoft Graph Data Connect schemas do not change often. However, on rare occasions, the schemas will change to offer additional capabilities or improve existing ones. Even when that happens, the goal is to avoid disruptions to your existing pipelines.
This blog post covers a few details about how schemas change and how to deal with some of the rare side effects.
1. Non-breaking schema changes
If a dataset schema changes in a way that causes your pipelines to stop working or behave incorrectly, we call that a breaking change. That would be the case if the data type or the meaning of a column changes. That would also be the case of an existing column being renamed or removed. See changes marked with a red X in the diagram below. Breaking schema changes should never happen.
If these types of schema changes are required, the required procedure is to create a new version of the schema (essentially a new schema with a new name) and deprecate (and eventually remove) the old schema.
However, it’s possible to make changes that will not break existing pipelines. To do this, schema changes can only contain new columns. For nested columns, it is OK to add more properties to the nested object if the parent column and existing children are not changed. See changes marked with a blue check in the diagram below.
Your pipeline will typically be unaffected by these non-breaking changes in the schema. Everything should continue to run after the changes happen without any modifications to your pipelines.
2. Pipeline change after consent
There is a situation that could cause your pipeline to fail due to a non-breaking schema change. This happens when you create the MGDC Application (consent request) before the schema change and the pipeline after the schema change.
Imagine that you created the MGDC application, got the proper consent for the specific dataset, but did not immediately create the pipeline to consume that dataset. A schema change then added a new column after the consent was granted but before the pipeline was created. When you got around to creating the pipeline, the new column was included, but you did not have consent for that column.
In that case, when you run the pipeline, you could see an error like this:
ErrorCode=UserErrorOffice365DataLoaderError,
Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,
Message=Office365 data loading failed to execute.
office365LoadErrorType: PermanentError.
Invalid requested columns.
The following requested columns were not consent to
for dataset [BasicDataSet_v0.SharePointSites_v1]
under application [00000000-444e-45d0-aa5a-39f318fae21c]: [Privacy].
This is by design and the pipeline fails because that new column is in the pipeline definition but not in the consent. The error message might mention invalid columns, lack of consent, or it might just say that it “cannot resolve” the column.
If you do not need the new column in your case, a simple workaround is to edit the pipeline (specifically the data source configuration) to remove the newly added column (lacking consent). Without a reference to that column, the pipeline will run fine with the existing consent.
If you are indeed interested in the new column, then the solution is to update the MGDC Application request to include it. You will need your Microsoft 365 Global Administrator to approve that updated application consent.
3. Delta changes
There are potential side effects of a schema change when it comes to Delta pulls. If you don't change anything, the schema change will not have any impact on your Delta pulls. If you decide you are interested in the new columns, you will need to update the pipeline and do a new Full pull before you start with your regular Delta pulls.
There is also the specific case where a new property is added to a nested column. For instance, this would happen in the earlier example in the diagram where a new “email” property is added to the “owner” object.
An existing pipeline will have no problem pulling the updated column. The only additional work might be to investigate scenarios where this new nested column would be useful and make changes to your reports or dashboards to leverage it.
From a Delta standpoint, you will only see the new nested column in objects that were created or changed after the schema update. If you want the new nested column to be updated for all objects, you will need to do a new full pull.
As I mentioned, there will be no errors. However, the Delta behavior is something to be aware of in case you are wondering why the new nested column appears in some but not all objects.
4. Conclusion
Microsoft Graph Data Connect schema changes don’t happen often and they are usually uneventful. However, I hope this post helps you address some uncommon issues related to these schema changes. For more information about MGDC for SharePoint, check the overview post at https://aka.ms/SharePointData.