sql

22 Topics

Using Copilot (GPT-5) to turn schema into SQL and write queries
This article shows a few ways you can use Copilot to help with Microsoft Graph Data Connect for SharePoint. This includes turning a schema into a SQL CREATE TABLE statement and summarizing data from that. Make sure to use the Copilot option to use GPT-5 for best results. Scenario 1 This first scenario just builds a CREATE TABLE statement from the dataset schema definition in markdown notation. I provide a few instructions on how to handle the flattening of the objects. Me: Given the schema at this URL: https://github.com/microsoftgraph/dataconnect-solutions/blob/main/Datasets/data-connect-dataset-sharepointsites.md, give me a CREATE TABLE statement for a SQL Server. Just include the single SQL command in the output. Each column should have a comment with a short description. Don't include any information about keys or constraints. This dataset is flattened. One column in the table for every row, except those of type object. Column names to be flattened with a prefix. Use an underline to separate the prefix. Copilot: CREATE TABLE SharePointSites ( ptenant NVARCHAR(255) -- Id of the tenant ,Id NVARCHAR(255) -- GUID of the site ,Url NVARCHAR(2048) -- URL for the site ,ArchiveState NVARCHAR(50) -- Archive state of the site: None, Archiving, Archived, or Reactivating ,SiteArchivedBy NVARCHAR(255) -- Who archived the site, null if not archived ,SiteArchivedTime DATETIME -- When the site was archived, null if not archived ,RootWeb_Id NVARCHAR(255) -- Root web id ,RootWeb_Title NVARCHAR(512) -- Root web title ,RootWeb_WebTemplate NVARCHAR(255) -- Root web template name ,RootWeb_WebTemplateId INT -- Root web template id ,RootWeb_Configuration INT -- Root web template configuration id ,RootWeb_LastItemModifiedDate DATETIME -- Date when an item in the root web was last modified ,WebCount BIGINT -- Number of webs (subsites) in the site ,StorageQuota BIGINT -- Total storage in bytes allowed for this site ,StorageUsed BIGINT -- Total storage in bytes used by this site ,StorageMetrics_MetadataSize BIGINT -- Total metadata size for the site in bytes ,StorageMetrics_TotalFileCount BIGINT -- Total number of files for the site ,StorageMetrics_TotalFileStreamSize BIGINT -- Total size of latest version of files in bytes ,StorageMetrics_AdditionalFileStreamSize BIGINT -- Total size of additional file streams in bytes ,StorageMetrics_TotalSize BIGINT -- Total size of all files for the site in bytes ,GroupId NVARCHAR(255) -- Id of the group associated with this site ,GeoLocation NVARCHAR(50) -- Geographic region where the data is stored ,IsInRecycleBin BIT -- Indicates if the site is in the recycle bin ,RecycleBinItemCount BIGINT -- Number of items in the recycle bin ,RecycleBinItemSize BIGINT -- Size of items in the recycle bin ,SecondStageRecycleBinStorageUsage BIGINT -- Size of items in the second stage recycle bin ,IsTeamsConnectedSite BIT -- Indicates if the site is connected to Teams ,IsTeamsChannelSite BIT -- Indicates if the site is a channel site ,TeamsChannelType NVARCHAR(50) -- Type of channel if site is a Teams channel site ,IsHubSite BIT -- Indicates if the site is associated with a hub site ,HubSiteId NVARCHAR(255) -- Id of the hub site for this site ,IsCommunicationSite BIT -- Indicates if the site is a communication site ,IsOneDrive BIT -- Indicates if the site is a OneDrive ,BlockAccessFromUnmanagedDevices BIT -- Site blocks access from unmanaged devices ,BlockDownloadOfAllFilesOnUnmanagedDevices BIT -- Site blocks download of all files from unmanaged devices ,BlockDownloadOfViewableFilesOnUnmanagedDevices BIT -- Site blocks download of viewable files from unmanaged devices ,ShareByEmailEnabled BIT -- Site enables share by e-mail ,ShareByLinkEnabled BIT -- Site enables share by link ,IsExternalSharingEnabled BIT -- Indicates if external sharing is enabled ,SiteConnectedToPrivateGroup BIT -- Indicates if site is connected to a Private Group ,Privacy NVARCHAR(50) -- Privacy of the site: Private or Public ,SensitivityLabelInfo_Id NVARCHAR(255) -- Id of the Sensitivity Label for the site ,SensitivityLabelInfo_DisplayName NVARCHAR(255) -- Display name of the Sensitivity Label for the site ,Classification NVARCHAR(255) -- Classification of the site ,IBMode NVARCHAR(50) -- Information Barriers Mode ,IBSegments NVARCHAR(MAX) -- List of organization segments if IB mode is Explicit ,Owner_AadObjectId NVARCHAR(255) -- AAD Object Id of the owner ,Owner_Email NVARCHAR(320) -- Email of the owner ,Owner_UPN NVARCHAR(320) -- User Principal Name of the owner ,Owner_Name NVARCHAR(255) -- Name of the owner ,SecondaryContact_AadObjectId NVARCHAR(255) -- AAD Object Id of the secondary contact ,SecondaryContact_Email NVARCHAR(320) -- Email of the secondary contact ,SecondaryContact_UPN NVARCHAR(320) -- User Principal Name of the secondary contact ,SecondaryContact_Name NVARCHAR(255) -- Name of the secondary contact ,ReadLocked BIT -- Whether the site is locked for read access ,ReadOnly BIT -- Whether the site is in read-only mode ,CreatedTime DATETIME -- When the site was created (UTC) ,LastSecurityModifiedDate DATETIME -- When security on the site was last changed (UTC) ,LastUserAccessDate DATETIME -- Last access by a real user (UTC) ,LastContentChange DATETIME -- When the site contents were last changed (UTC) ,SnapshotDate DATETIME -- When this site information was captured (UTC) ,Operation NVARCHAR(50) -- Extraction mode of this row ); Scenario 2 Now I use the CREATE TABLE as the base to ask for a SELECT statement. Me: Give me a list of sites that have more than 5GB of files. Give me a single SQL statement using the tables described below. <create table statements included here> Copilot: SELECT Id AS SiteId, Url AS SiteUrl, RootWeb_Title AS SiteTitle, StorageMetrics_TotalSize AS TotalSizeInBytes, CAST(StorageMetrics_TotalSize / (1024.0 * 1024 * 1024) AS DECIMAL(10,2)) AS TotalSizeInGB FROM SharePointSites WHERE StorageMetrics_TotalSize > (5 * 1024 * 1024 * 1024) ORDER BY StorageMetrics_TotalSize DESC; Note: Interesting choice of columns and order, without any specific prompting to that effect. Scenario 3 Now for a more challenging scenario using two tables and a more complex requirement. Me: Given the tables below, write a SELECT statement to give me a list of sites with the name of the site, the total storage size and the top 3 file extensions by size. <create table statements for Sites and Files included here> Copilot: WITH ExtensionTotals AS ( SELECT f.SiteId, f.Extension, SUM(f.SizeInBytes) AS TotalSize FROM SharePointFiles f WHERE f.Extension IS NOT NULL GROUP BY f.SiteId, f.Extension ), RankedExtensions AS ( SELECT et.SiteId, et.Extension, et.TotalSize, ROW_NUMBER() OVER (PARTITION BY et.SiteId ORDER BY et.TotalSize DESC) AS rn FROM ExtensionTotals et ), TopExtensions AS ( SELECT SiteId, STRING_AGG(Extension + ' (' + CAST(TotalSize AS VARCHAR(20)) + ')', ', ') WITHIN GROUP (ORDER BY TotalSize DESC) AS Top3Extensions FROM RankedExtensions WHERE rn <= 3 GROUP BY SiteId ) SELECT s.RootWeb_Title AS SiteName, s.StorageMetrics_TotalSize AS TotalStorageSize, te.Top3Extensions FROM SharePointSites s LEFT JOIN TopExtensions te ON s.Id = te.SiteId ORDER BY s.StorageMetrics_TotalSize DESC; Notes: Scenario is nicely handled, using both the Files and Sites tables. Complex aggregation is coded correctly. Scenario 4 In this one, I skip the step of using a SQL CREATE TABLE statement and go directly from the markdown for the two schemas (Sites and Files) to a SQL SELECT statement. I also let GPT-5 add his comments and notes this time, to make sure it's thinking correctly... Me: Consider a SQL table for SharePoint Sites based on the schema at https://github.com/microsoftgraph/dataconnect-solutions/blob/main/Datasets/data-connect-dataset-sharepointsites.md - This dataset is flattened. There is one column in the SQL table for every row, except those of type object. Column names are flattened with a prefix. Column names use an underline to separate the prefix. Also consider a SQL table for SharePoint Files based on the schema at https://github.com/microsoftgraph/dataconnect-solutions/blob/main/Datasets/data-connect-dataset-sharepointfiles.md - Use the same assumptions as the previous one. Now write a SELECT statement to give me a list of sites with the name of the site, the total storage size and the top 3 file extensions by size. Copilot: Got it — you want to join the flattened SharePoint Sites table with the flattened SharePoint Files table, aggregate storage usage, and then find the top 3 file extensions by size for each site. Here’s a SQL example that works with the flattened column naming convention you described: WITH FileSizeByExtension AS ( SELECT f.SiteId, f.Extension, SUM(f.SizeInBytes) AS TotalExtensionSize FROM SharePointFiles f GROUP BY f.SiteId, f.Extension ), RankedExtensions AS ( SELECT SiteId, Extension, TotalExtensionSize, ROW_NUMBER() OVER (PARTITION BY SiteId ORDER BY TotalExtensionSize DESC) AS rn FROM FileSizeByExtension ), Top3Extensions AS ( SELECT SiteId, STRING_AGG(Extension, ', ' ORDER BY TotalExtensionSize DESC) AS TopExtensions FROM RankedExtensions WHERE rn <= 3 GROUP BY SiteId ) SELECT s.RootWeb_Title AS SiteName, s.StorageMetrics_TotalSize AS TotalStorageBytes, t.TopExtensions FROM SharePointSites s LEFT JOIN Top3Extensions t ON s.Id = t.SiteId ORDER BY s.StorageMetrics_TotalSize DESC; How it works FileSizeByExtension: Groups files by SiteId and Extension to sum their sizes. RankedExtensions: Uses ROW_NUMBER() to rank extensions by size within each site. Top3Extensions: Filters to the top 3 per site and concatenates them into a comma-separated list. Final SELECT: Joins back to the Sites table to get the site name and total storage size. Notes RootWeb_Title comes from flattening the RootWeb object in the Sites dataset. StorageMetrics_TotalSize is the flattened column for total site storage (excluding recycle bin). STRING_AGG works in SQL Server 2017+, Azure SQL, and some other RDBMS. If your SQL dialect doesn’t support it, you’d need a different aggregation method. This query orders sites by total storage size so the largest sites appear first. Note: Nicely done in one prompt with the proper explanations. Conclusion I hope these examples help you see the potential of getting Copilot's help with your queries. This is one more reason to load your MGDC for SharePoint tables directly into Microsoft Fabric, where you can run SQL queries against tables in the OneLake...
Jose_Barreto
Sep 04, 2025 Place Microsoft Graph Data Connect for SharePoint Blog
372Views
0likes
0Comments
MGDC for SharePoint FAQ: How to flatten datasets for SQL or Fabric
When you get your data from Microsoft Graph Data Connect (MGDC), you will typically get that data as a collection of JSON objects in an Azure Data Lake Storage (ADLS) Gen2 storage account. For those handling large datasets, it might be useful to move the data to a SQL Server or to OneLake (lakehouse). In those cases, you might need to flatten the datasets. This post describes how to do that. If you’re not familiar with MGDC for SharePoint, start with https://aka.ms/SharePointData. 1. Flattening Most of the MGDC for SharePoint datasets come with nested objects. That means that a certain object has other objects inside it. For instance, if you have a SharePoint Groups object, it might have multiple Group Members inside. If you have a SharePoint Permissions object, you could have many Permissions Recipients (also known as Sharees). For each SharePoint File object, you will have a single Author object inside. When you convert the datasets from JSON to other formats, it is possible that these other formats require (or perform better) if you don’t have any objects inside objects. To overcome that, you can turn those child objects into properties of the parent object. For instance, instead of having the File object with an Author object inside, you can have multiple author-related columns. For instance, you could have Author.Name and Author.Email as properties of the flattened File object. 2. Nested Objects You can get the full list of SharePoint datasets in MGDC at https://aka.ms/SharePointDatasets. Here is a table with a list of objects and their nested objects: Object How many? Primary Key Nested Object How many? Add to Primary Key Sites 1 per Site Id RootWeb 1 per Site Sites 1 per Site Id StorageMetrics 1 per Site Sites 1 per Site Id SensitivityLabelInfo 1 per Site Sites 1 per Site Id Owner 1 per Site Sites 1 per Site Id SecondaryContact 1 per Site Groups 1 per Group SiteId + GroupId Owner 1 per Group Groups 1 per Group SiteId + GroupId Members 1 per Member COALESCE(AADObjectId, Email, Name) Permissions 1 per Permission SiteId + ScopeId + RoleDefintion + LinkId SharedWithCount 1 per Recipient Type Type Permissions 1 per Permission SiteId + ScopeId + RoleDefintion + LinkId SharedWith 1 per Recipient or Sharee COALESCE(AADObjectId, Email, Name) Files 1 per File SiteId + WebId + ListId + ItemId Author 1 per File Files 1 per File SiteId + WebId + ListId + ItemId ModifiedBy 1 per File When you flatten a dataset and there is an object with multiple objects inside (like Group Members or Permission Recipients), the number of rows will increase. You also need to add to primary key to keep it unique. Also note that the File Actions, Sync Health and Sync Errors datasets do not have any nested objects. 3. One Object per Parent When the nested object has only one instance, things are simple. As we described for the Author nested object inside the File object, you promote the properties of the nested object to be properties of the parent object. This is because the Author is defined as the user that initially created the file. There is always one and only one Author. This can happen even happen multiple times for the same object. The File also has a ModifiedBy property. That is the single user that last changed the file. In that case, there is also only one ModifiedBy per File. The Site object also includes several properties in this style, like RootWeb, StorageMetrics, SensitivityLabelInfo, Owner and SecondaryContact. Note that, in the context of the Site object, there is only one owner. Actually two, but that second one is tracked in a separate object called SecondaryContact which is effectively the secondary owner. 4. Multiple Objects per Parent The SharePoint Permissions dataset has a special condition that might create trouble for flattening. There are two sets of nested objects with multiple objects each: SharedWith and SharedWithCount. SharedWith has the list of Recipients and SharedWithCount has a list of Recipient Types. If you just let the tools flatten it, you will end up a cross join of the two. As an example, if you have 4 recipients in an object and 2 types of recipients (internal users and external users, for instance) you will end up with 20 objects in the flattened dataset instead of the expected 10 objects (one per recipient). To avoid this, in this specific condition, I would recommend just excluding the SharedWithCount column from the object before flattening. 5. Conclusion I hope this clarifies how you can flatten the MGDC for SharePoint datasets, particularly SharePoint Permissions dataset. For further details about the MGDC for SharePoint, https://aka.ms/SharePointData.
Jose_Barreto
Nov 29, 2024 Place Microsoft Graph Data Connect for SharePoint Blog
507Views
0likes
2Comments
MGDC for SharePoint FAQ: How do I join File Actions with Files?
This post is about how you can join the SharePoint File Actions dataset with the SharePoint Files dataset, including the most common issues.
Jose_Barreto
Nov 27, 2024 Place Microsoft Graph Data Connect for SharePoint Blog
1KViews
1like
0Comments
MGDC for SharePoint FAQ: Why does the file count not match?
I am frequently asked why the number of files in the Sites dataset does not match the count of the Files dataset. It is true that they sometimes don't. Here are the counts we are talking about: Sites: SELECT Id as SiteId, StorageMetrics.TotalFileCount AS FilesInSite FROM Sites Files: SELECT SiteId, COUNT(*) AS FilesInSite FROM Files GROUP BY SiteId The main reasons for the discrepancy are: The Files dataset is collected weekly, while the Sites dataset is collected daily, making it difficult to capture the exact same state. The Files dataset includes only items inside Document Libraries, whereas the Sites dataset counts all files, including those in other list types. The Files dataset does not include pages (files ending in .ASPX), while the Sites dataset counts all files. The Files dataset does not include items in the primary and secondary recycle bins, whereas the Sites dataset counts all files. Here are a few examples: A new site was created, and a few files were uploaded to a document library in this new site. Two days later, you get the SharePoint Sites dataset and find the new site with the right count of files. However, you cannot find these files in the SharePoint Files dataset. This is because the Files dataset may take one week to refresh. Wait a week and try pulling the Files dataset again. A team site was created with a few lists, where some of the items have file attachments. The SharePoint Files dataset does not show these file attachments. This is because the SharePoint Files dataset will only show files in document libraries. You deleted files from a document library in a SharePoint site a few weeks ago. Now the count of files in the SharePoint Sites is smaller than the count of files in the SharePoint Files dataset. That is because files in the recycle bins are excluded from the Files dataset. Once these deleted files go through the first and second level recycle bins, the count in the Sites dataset will also reflect that reduction.
Jose_Barreto
Nov 27, 2024 Place Microsoft Graph Data Connect for SharePoint Blog
344Views
0likes
0Comments
MGDC for SharePoint FAQ: How do I process Deltas?
More details on how to process a Delta State dataset in Microsoft Graph Data Connect for SharePoint.
Jose_Barreto
Sep 11, 2024 Place Microsoft Graph Data Connect for SharePoint Blog
3KViews
0likes
0Comments
MGDC for SharePoint FAQ: How to gather insights from a large Files dataset?
In this post, we’ll cover how to gather insights from the MGDC SharePoint Files dataset that is very large.
Jose_Barreto
Aug 08, 2024 Place Microsoft Graph Data Connect for SharePoint Blog
3.7KViews
1like
0Comments
'OpenApiConnection' is not valid
Hi everyone, I am new to power automate, l am trying to migrate data from a SQL table to a SharePoint list using power automate, one of the fields keeps throwing an openApiconnection error. it is a lookup field in SharePoint however in Microsoft flow it's expecting a lookup ID of the field. how do l go round solving it, please..... The 'inputs.parameters' of workflow operation 'Create_item' of type 'OpenApiConnection' is not valid. Error details: Input parameter 'item/Project/Id' is required to be of type 'Integer/int64'. The runtime value '"743: Carefirst replacement / Eclipse - Phase 1: Adults"' to be converted doesn't have the expected format 'Integer/int64'.
KeithMontana
Mar 28, 2022 Place SharePoint
21KViews
0likes
1Comment
Use SQL database values in Sharepoint Custom Forms
Hi everyone, I managed to use a PowerApps custom form to create dynamic dropdowns with my SQL data and the results appear in the dropdowns but when I submit the form only those SQL values show up as invisible (all other non-SQL fields show up fine). I'm using [DB_Name].Customers or Sort(Distinct([DB_Name], Customers), Result) to get the SQL information, but it's just not submitting properly. Is this a problem related to the form's connection to the server? Is there a solution to this issue? Thanks in advance,
CodeFox
Oct 20, 2021 Place SharePoint
726Views
0likes
0Comments
Automatically pull external files into Sharepoint folder
My company has a process that produces reports pulled from MySQL / Sql Server databases and the files are written to a local drive of a server. We need to pull those reports into a SharePoint Folder. I know we can't use a mapped drive, as Microsoft does not support this and also the account won't stay logged in. What are my best options to automate this process?
dpimental
Aug 19, 2021 Place Power Apps and Power Automate in SharePoint
6.1KViews
0likes
1Comment
What is the best way to use SQL on-premise data within SharePoint Online?
We would like to create a solution in SharePoint Online that can perform CRUD operations on data in an on-premise SQL database. What is the best way to accomplish this? What are the different options? Is it possible to do this from a SharePoint Framework web part?
kevinmckeown8
May 11, 2021 Place SharePoint
1.2KViews
0likes
2Comments