MGDC for SharePoint: New, Updated and Upcoming Datasets
Published May 28 2024 09:22 AM 1,204 Views
Microsoft

In this post, I’ll cover some exciting news on Microsoft Graph Data Connect for SharePoint as of May 2024. This feature delivers rich data assets to SharePoint and OneDrive tenants. If you're new to MGDC for SharePoint, start by reviewing this post: https://aka.ms/SharePointData.

 

JoseBarreto_0-1716565627619.jpeg

 

 

TL;DR

 

We have been busy updating our existing SharePoint datasets in MGDC and adding new ones. You can see the full list at https://aka.ms/SharePointDatasets.

 

We have updated our 3 publicly available datasets, just published 1 new dataset and will deliver 3 new datasets in the next few months. Here are some details…

New and Upcoming Datasets

 

The new SharePoint File Actions dataset was released in May 2024. This dataset delivers one object for each file accessed, deleted, downloaded, modified, moved, renamed, or uploaded. This helps you understand how documents are being used in detail. This dataset is now publicly available, billed through Azure at the regular MGDC rate.

 

The new OneDrive Sync Health datasets include information on devices running OneDrive for Business. This includes a dataset with one object for every Sync-enabled device in the tenant and a dataset with details on errors faced by these devices. They were announced by the Sync team at the Microsoft 365 Conference. Sync Health and Sync Errors are in private preview. They will be publicly available by the end of June. This was a joint project between SharePoint, OneDrive Sync and MGDC.

 

The SharePoint Files dataset includes information about files in SharePoint and OneDrive. This delivers one object for every file in the tenant stored in a SharePoint document library, including OneDrives. The Files dataset is in private preview, with Public ETA expected in a few months.

 

Updated datasets

 

We also added columns to the existing Sites, Groups and Permissions datasets.

 

For SharePoint Sites, our most popular dataset, we added several new properties. Here’s the list:

 

  • ArchiveState: The archive state of the site: None, Archiving, Archived, or Reactivating
  • RootWeb.Configuration: Root web template configuration id
  • RecycleBinItemCount: Number of items in the recycle bin
  • RecycleBinItemSize: Size of the items in the recycle bin
  • SecondStageRecycleBinStorageUsage: Size of the items in the second stage recycle bin
  • IsCommunicationSite: Indicates that the site is a communication site
  • IsOneDrive: Indicates that the site is a OneDrive
  • IsExternalSharingEnabled: Indicates if the site is configured to enable external sharing
  • SiteConnectedToPrivateGroup: Indicates if a site is connected to Private Group
  • Privacy: Privacy of the site: Private or Public. Applies only to team sites
  • Owner.UPN: User Principal Name for the owner of the site
  • SecondaryContact.UPN: User Principal Name for the secondary contact for the site
  • LastUserAccessDate: Last access by a real user for the site (in UTC)

 

That last column is very useful to identify sites that have been inactive for a long time.

For Groups, we introduced a new TypeV2 property for owners and members, to specify the type of user. The old Type property can contain User, SecurityGroup or SharePointGroup, while the new TypeV2 can be InternalUser, ExternalUser, B2BUser, SecurityGroup and SharePointGroup.

 

For the SharePoint Permissions dataset, we added the following columns:

 

  • SharedWith.TypeV2: Expands User types to InternalUser, ExternalUser and B2Buser, as described in the Groups section above
  • SharedWith.UPN: User Principal Name of sharing recipient
  • SharedWith.AadObjectId: AAD Object Id of sharing recipient. Blank if this is not an AAD object.
  • SharedWith.UserCount: Unique user count for this sharing recipient. For groups, this is the number of users in the group, including nested groups. For users, this is always 1. It will be blank if the group is empty or if the count is unavailable
  • TotalUserCount: Unique user count for this entire permission. This will be blank if the count is zero or unavailable
  • ShareCreatedBy.UPN: User Principal Name of user who created the sharing link
  • ShareLastModifiedBy.UPN: User Principal Name of user who modified the sharing link

 

The two new user count columns are a major improvement here. They do group expansion, so you can have the total number of users impacted by the permissions, including nested groups, without having to pull the SharePoint Groups and AAD Groups datasets. You can now detect oversharing using only the Permissions dataset.

 

General improvements

 

MGDC for SharePoint also improved the overall infrastructure for analytics, including:

 

  • Filtering datasets: Downloading only rows that match specific site ids or template id. See details at How can I filter rows on a dataset?
  • Dataset sampling: Get a small sample of the dataset and an full object count without pulling the entire dataset. See details at How can I sample or estimate the number objects in a dataset?
  • Improved messages: Better error messages, including when dates are out of range, or when a region has no SharePoint data.
  • Guidance: Improved documentation, including updated step-by-step guides and schema docs. We also have a new Official MGDC for SharePoint blog in Tech Community with information like useful links and frequently asked questions. Since you’re reading this on the blog, I imagine you already knew about that one :-).

 

Conclusion

 

These are the main improvements to Microsoft Graph Data Connect for SharePoint in the last few months. I hope these changes will improve the feature for your analytics scenarios. We are busy cooking up more improvements and will share them here in the blog as they become available.

Co-Authors
Version history
Last update:
‎May 28 2024 04:10 PM
Updated by: