MGDC for SharePoint FAQ: How to deal with schema changes

Jose Barreto · ‎May 01 2024

0. Introduction

When you use Microsoft Graph Data Connect, you will commonly refer to the dataset schemas to understand what is provided with each dataset. For SharePoint datasets, you can see the schemas using this link: https://aka.ms/SharePointDatasets.

Microsoft Graph Data Connect schemas do not change often. However, on rare occasions, the schemas will change to offer additional capabilities or improve existing ones. Even when that happens, the goal is to avoid disruptions to your existing pipelines.

This blog post covers a few details about how schemas change and how to deal with some of the rare side effects.

1, Non-breaking schema changes

If a dataset schema changes in a way that causes your pipelines to stop working or behave incorrectly, we call that a breaking change. That would be the case if the data type or the meaning of a column changes. That would also be the case of an existing column being renamed or removed. See changes marked with a red X in the diagram below. Breaking schema changes should never happen.

If these types of schema changes are required, the required procedure is to create a new version of the schema (essentially a new schema with a new name) and deprecate (and eventually remove) the old schema.

However, it’s possible to make changes that will not break existing pipelines. To do this, schema changes can only contain new columns. For nested columns, it is OK to add more properties to the nested object if the parent column and existing children are not changed. See changes marked with a blue check in the diagram below.

Your pipeline will typically be unaffected by these non-breaking changes in the schema. Everything should continue to run after the changes happen without any modifications to your pipelines.

2. Pipeline change after consent

There is a situation that could cause your pipeline to fail due to a non-breaking schema change. This happens when you create the MGDC Application (consent request) before the schema change and the pipeline after the schema change.

Imagine that you created the MGDC application, got the proper consent for the specific dataset, but did not immediately create the pipeline to consume that dataset. A schema change then added a new column after the consent was granted but before the pipeline was created. When you got around to creating the pipeline, the new column was included, but you did not have consent for that column.

In that case, when you run the pipeline, you could see an error like this:

ErrorCode=UserErrorOffice365DataLoaderError, 
Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException, 
Message=Office365 data loading failed to execute. 
office365LoadErrorType: PermanentError.  
Invalid requested columns.  
The following requested columns were not consent to  
for dataset [BasicDataSet_v0.SharePointSites_v1]  
under application [00000000-444e-45d0-aa5a-39f318fae21c]: [Privacy].

This is by design and the pipeline fails because that new column is in the pipeline definition but not in the consent. The error message might mention invalid columns, lack of consent, or it might just say that it “cannot resolve” the column.

If you do not need the new column in your case, a simple workaround is to edit the pipeline (specifically the data source configuration) to remove the newly added column (lacking consent). Without a reference to that column, the pipeline will run fine with the existing consent.

If you are indeed interested in the new column, then the solution is to update the MGDC Application request to include it. You will need your Microsoft 365 Global Administrator to approve that updated application consent.

3. Delta changes

Another potential side effect of a schema change is related to Delta pulls. That happens in the specific case where a new property is added to a nested column. For instance, this would happen in the earlier example in the diagram where a new “email” property is added to the “owner” object.

An existing pipeline will have no problem pulling the updated column and you will not have any issues. The only additional work might be to investigate scenarios where this new nested column would be useful and make changes to your reports or dashboards to leverage it.

However, from a Delta standpoint, you will see an update to every object in the dataset. Effectively, when the pipeline runs right after the schema changes, you will pull all the objects and that one Delta pull will operate like a full pull.

Unfortunately, there is no workaround for this one. MGDC can only filter the top-level objects in the datasets, so there is no mechanism to filter out just the newly introduced “owner.email” property. Excluding the entire “owner” nested object also will not work, since that would also create a Delta change, in addition to excluding a potentially useful column.

As I mentioned, there will be no errors. This is just something to be aware of in case you are wondering why a particular Delta pull is showing more objects than expected on a specific date.

4. Conclusion

Microsoft Graph Data Connect schema changes don’t happen often and they are usually uneventful. However, I hope this post helps you address some uncommon issues related to these schema changes. For more information about MGDC for SharePoint, check the overview post at https://aka.ms/SharePointData.

Products (50)

Special Topics (27)

Video Hub (462)

Most Active Hubs

Most Active Hubs

Video Hub

MGDC for SharePoint FAQ: How to deal with schema changes