Introduction
Microsoft Graph Data Connect for SharePoint delivers rich data assets to OneDrive and SharePoint tenants, so they can run their own analytics, derive insights from their data and understand how they use these products. The data is transferred to an Azure account owned by the tenant, where they can use tools like Azure Synapse, Power BI or Microsoft Fabric to transform this into insightful reports and dashboards. Microsoft Graph Data Connect for SharePoint is also known by its early codename "Project Archimedes".
The three main scenarios are Security (information oversharing, external sharing), Capacity (understanding site lifecycle and storage) and Sync Health (OneDrive Sync, device health, folder backup). SharePoint currently offers 3 datasets via Microsoft Graph Data Connect: Sites, Groups and Permissions. This solution is available in 15 Microsoft 365 regions.
Security Scenario
The Security Scenario focuses on understanding the permissions in your SharePoint and OneDrive tenant. By looking at the Sites, Permissions and Groups datasets, you can understand if your content is properly protected.
Here are a few common questions that you will be able to answer:
- Is oversharing happening?
- Is external sharing happening?
- Is sensitive data being shared?
- Howe much sharing per sensitivity label?
- Is sensitive data shared with external users?
- What external domains are being shared with?
- Which sites have been shared the most?
- What roles (levels of sharing) are being used?
- What permissions does a specific user have?
- What file extensions are most shared?
- How much sharing happens at the Web, Folder, List or File level?
- A combination of any of the questions above…
Capacity Scenario
The Capacity Scenario focuses on understanding site lifecycle, ownership and storage used by your SharePoint sites and OneDrives. By looking at the Sites and Groups, you can understand hour much content you have, how it is being used.
Here are a few common questions that you will be able to answer:
- What SharePoint sites are the largest?
- What type of site uses the most storage?
- What’s the current storage for sensitive sites?
- How much is used by previous versions?
- Which sites were updated in the last few months?
- Which sites have just one owner?
- How many sites were created over 2 years ago?
- How many sites haven’t changed in 1 year?
- How many sites have over 1TB of files?
- A combination of any of the questions above…
Sync Health Scenario
The OneDrive Sync Health scenario is about understanding if users are properly using OneDrive for Business to protect their files by synchronizing them with the cloud. It includes details about the devices configured with a OneDrive Sync client, including whether they are backing up important folders and if there are any errors.
Here are a few common questions that you will be able to answer:
- How many devices are healthy?
- How many devices have opted in for Folder Backup?
- Which Folders are most selected for Folder Backup?
- What is the breakdown of unhealthy devices by OS version?
- What is the breakdown of unhealthy devices by OneDrive Sync client version?
- Is the device for user X reporting as healthy?
- How many devices are showing errors?
- Which types of errors are making most devices unhealthy?
- Which devices are showing a specific error?
- What are the errors occurring on a specific device?
Links to resources
Here are some useful links related to the OneDrive and SharePoint data available via Microsoft Graph Data Connect. This is meant as a convenient set of pointers to specific and relevant content, which you can bookmark for future reference.
Step-by-step guides
- Step-by-step: OneDrive Sync Health
- Step-by-Step: How to find OneDrive users not running Sync
- Step-by-step: Gather a detailed dataset on SharePoint Sites using MGDC and Fabric
- Information Oversharing Template Setup
https://go.microsoft.com/fwlink/?linkid=2207816 (follow the instructions) - Power BI sample with that you can use with the data coming from the pipeline
https://go.microsoft.com/fwlink/?linkid=2211101 (click on the download button)
- Overview video on YouTube about the Oversharing Scenario.
https://www.youtube.com/watch?v=lw4Oud4abvE
Note: This uses the old consent model, but everything else is current.
SharePoint Datasets and Schemas
- Dataset description: SharePoint Sites
Dataset name: BasicDataSet_v0.SharePointSites_v1
Link to dataset schema: data-connect-dataset-sharepointsites.md
Dataset old name (deprecated): SharePointSitesDataset_v0_Preview - Dataset description: SharePoint Groups
Dataset name: BasicDataSet_v0.SharePointGroups_v1
Link to dataset schema: data-connect-dataset-sharepointgroups.md
Old dataset name (deprecated): SharePointGroupsDataset_v0_Preview - Dataset description: SharePoint Sharing Permissions
Dataset name: BasicDataSet_v0.SharePointPermissions_v1
Link to dataset schema: data-connect-dataset-sharepointpermissions.md
Old dataset name (deprecated): DocumentSharingDataset_v0_Preview - Dataset description: SharePoint Files
Dataset name: BasicDataSet_v0.SharePointFiles_v1
Link to dataset schema: data-connect-dataset-sharepointfiles.md
Note: Coming soon (not yet available publicly) - Dataset description: SharePoint File Actions
Dataset name: BasicDataSet_v0.SharePointFileActions_v1
Link to dataset schema: data-connect-dataset-sharepointfileactions.md
Note: Coming soon (not yet available publicly) - Dataset description: SharePoint Sync Health
Dataset name: BasicDataSet_v0.OneDriveSyncHealth_v1
Link to dataset schema: data-connect-dataset-onedrivesynchealth.md
Note: Coming soon (not yet available publicly) - Dataset description: SharePoint Sync Errors
Dataset name: BasicDataSet_v0.OneDriveSyncErrors_v1
Link to dataset schema: data-connect-dataset-onedrivesyncerrors.md
Note: Coming soon (not yet available publicly)
You can see the official list of datasets at https://aka.ms/SharePointDatasets
Frequently Asked Questions
Links to the Microsoft Graph Data Connect for SharePoint FAQ blog series:
- MGDC for SharePoint FAQ: Which dates should I use to query?
- MGDC for SharePoint FAQ: Is OneDrive included?
- MGDC for SharePoint FAQ: What counts as an object?
- MGDC for SharePoint FAQ: How can I estimate my Azure bill?
- MGDC for SharePoint FAQ: What is in the Permissions dataset?
- MGDC for SharePoint FAQ: What is the size of my sites?
- MGDC for SharePoint FAQ: Which regions are supported?
- MGDC for SharePoint FAQ: How can I use Delta State Datasets?
- MGDC for SharePoint FAQ: How do I process Deltas?
- MGDC for SharePoint FAQ: How can I sample or estimate the number objects in a dataset?
- MGDC for SharePoint FAQ: How can I filter rows on a dataset?
- MGDC for SharePoint FAQ: How to create custom columns in Power BI?
- MGDC for SharePoint FAQ: How to deal with schema changes
- MGDC for SharePoint FAQ: How to restrict public access to storage accounts?
- MGDC for SharePoint FAQ: How can I track the lifecycle of a SharePoint site?
- MGDC for SharePoint FAQ: Dataset types and features
- MGDC for SharePoint FAQ: How to gather insights from a large Files dataset?
- MGDC for SharePoint FAQ: How are SharePoint Groups and Security Groups used together?
- MGDC for SharePoint FAQ: How do I join File Actions with Files?
Microsoft Graph Data Connect for SharePoint – Official Announcements
- Official blog post announcing new SharePoint datasets – Blog
- Official blog post announcing the public preview and dataset renames – Blog
- Official blog post announcing the pricing update for SharePoint datasets – Blog
Microsoft Graph Data Connect main links
- Microsoft Graph Data Connect main link – https://aka.ms/mgdcdocs
- All Microsoft Graph Data Connect datasets – https://learn.microsoft.com/en-us/graph/data-connect-datasets
- Microsoft Graph Data Connect pricing – https://azure.microsoft.com/en-us/pricing/details/graph-data-connect
- Video: Microsoft Graph Data Connect Overview (includes demo) – https://youtu.be/DiTYBWtzw2o
- Video: Microsoft Graph Data Connect at Microsoft Mechanics – https://youtu.be/cWg_EeB8q9s
- Video: Microsoft Graph Data Connect with SharePoint Demo – https://youtu.be/AJWBNiTMsOk?t=1850
Part of the Microsoft 365 Community Call on 2022-11-28
Partner Content on Microsoft Graph Data Connect for SharePoint
- Title: Setup Prerequisites in Microsoft 365 & Azure for Project Archimedes & Microsoft Graph Data Connect
By: Antonio Maio, Microsoft MVP, Managing Director and Senior Enterprise Architect with Protiviti.
Link: https://youtu.be/ym3BGYkbXhM - Title: Configure and Run an Azure Synapse Data Pipeline for Project Archimedes & Microsoft Graph Data Connect
By: Antonio Maio, Microsoft MVP, Managing Director and Senior Enterprise Architect with Protiviti.
Link: https://youtu.be/g4-kx5KVVNA - Title: Big Data Analytics for Microsoft 365 using Microsoft Graph Data Connect and Microsoft Fabric
By: Rakesh Chenchery, CTO at Proventeq
Link: https://youtu.be/H2D6HxHLekQ
Other Blogs about SharePoint on Data Connect
- Oversharing for Very Large Tenants
Tips for Microsoft Graph Data Connect for SharePoint - Four Options for SharePoint Site Analytics
(MGDC shown as option 4)
Microsoft Graph Data Connect for SharePoint - Ignite 2022: Transforming collaboration with low and pro code dev tools
(search for “SharePoint datasets”)
Microsoft 365 Developer Blog - Scale access to Microsoft 365 data with Microsoft Graph Data Connect
(search for “SharePoint”)
Microsoft 365 Developer Blog - Unlimited collaboration insights with Microsoft Graph and Azure Synapse Analytics
(covers Data Connect with Synapse in General, includes video)
Microsoft Mechanics Blogs
Presentations
- Build 2022
Unlocking the power of your Microsoft 365 data with Microsoft Graph Data Connect | OD09
(Good overview of Data Connect, with SharePoint datasets mentioned at minute 5:08) - Ignite 2022
Graham Sheldon: From low code to pro code: building and buying collaborative apps to power an evolving workplace
(Synapse template described at minute 26:25) - Ignite 2022
Rohan Kumar: Innovate faster and achieve greater agility with the Microsoft Intelligent Data Platform
(Data Connect and analytics with Synapse mentioned at minute 16:52) - Microsoft Mechanics
Unlimited collaboration insights with Microsoft Graph & Azure Synapse Analytics - Microsoft 365 Community Call
Microsoft 365 Platform Community Call – 29th of November, 2022
(Includes Data Connect Overview and Demo) - Microsoft 365 Conference in May 2023
Demo: Microsoft Graph Data Connect for SharePoint Demo
https://youtu.be/hdPKM835ICY