MGDC for SharePoint FAQ: What counts as an object?
Published Mar 04 2024 06:55 AM 2,233 Views
Microsoft

1. Overview

 

When gathering SharePoint data through Microsoft Graph Data Connect (MGDC), you are billed through Azure. You can find the official MGDC pricing information at Pricing – Microsoft Graph Data Connect. As I write this blog, the price to pull 1,000 objects from the available MGDC for SharePoint datasets (Sites, Groups and Permissions) in the US is $0.75, plus the cost for infrastructure like Azure Storage, Azure Data Factory or Azure Synapse.

 

Note: The SharePoint Files dataset is the only one with a different billing rate. Because of its typical high volume, the SharePoint Files dataset is billed at $0.75 per 50,000 objects.

 

With all that said, there is still a question that comes up frequently. What counts as an object? Well, that’s what we will cover in this blog post.

 

2. What MGDC provides

 

What MGDC delivers to you are datasets. After you run a pipeline with a copy data action, you end up with a collection of files in your Azure Storage account. Each of these files will contain objects. It’s an interesting file format that contains text using JavaScript Object Notation, also known as JSON.

 

Here is what the contents of the file would look like:

 

{"property1":"valuea","property2":"valueb","property3":"valuec"}
{"property1":"valued","property2":"valuee","property3":"valuef"}
{"property1":"valueg","property2":"valueh","property3":"valuei"}
{"property1":"valuej","property2":"valuek","property3":"valuel"}

 

In the example above, you have a file with 4 JSON objects, each with 3 properties. The file contains one line per object and these lines can get quite long.

 

3. Multiple JSON objects per file

 

Even though the files have a JSON file extension, the files you get are not proper JSON files. First, you typically don’t want your JSON content to be one long line. The proper formatting would be something like this:

 

{
    "property1": "valuea",
    "property2": "valueb",
    "property3": "valuec"
}
{
    "property1": "valued",
    "property2": "valuee",
    "property3": "valuef"
}
{
    "property1": "valueg",
    "property2": "valueh",
    "property3": "valuei"
}
{
    "property1": "valuej",
    "property2": "valuek",
    "property3": "valuel"
}

 

That’s more readable, but this is still not a proper JSON file. That would have only one object, not multiple objects, per file. But if you have lots of objects, having one file for each object will make this far less efficient to process. That’s why MGDC packs lots of JSON objects into a single file with the “json” extension.

 

Also, if you have lots and lots of objects, MGDC will pack the results as multiple “json” files, each containing many JSON objects packed together.

 

Most data tools have no problem loading this kind of file. Power BI, for example, not only can load files with multiple JSON objects, but it can also load multiple files in a single Power Query. Here’s an example:

 

Importing JSON files in Power BIImporting JSON files in Power BI

 

 

4. What constitutes a SharePoint object in MGDC

 

With all that information, we’re ready to state what counts as an object for MGDC. Each (long) line in those “json” files is an object, matching the schema published at Datasets, regions, and sinks supported by Microsoft Graph Data Connect.

 

For the SharePoint datasets currently available, you have:

 

  • Sites: One object is one site collection. These includes Team sites and OneDrive sites.
  • Groups: One object is one group. These groups could have multiple members, but those members are all included in that single group object.
  • Permissions: One object is a permission granted to a specific scope (site, web, library, folder or file). This single object includes a set of users granted a permission in that scope.
  • Files: One object per file
  • File Action: One object for each action on a file
  • Sync Health: One object for each device running the OneDrive Sync client.
  • Sync Errors: One object for each type of error and each device running the OneDrive Sync client.

 

5. Sharing Permission Objects

 

The SharePoint datasets above are easy to grasp but Sharing Permissions needs further explanation. Permissions captures a more complex concept commonly referred to as an Access Control List (ACL). That is how SharePoint stores permissions granted to users.

 

Permissions includes different types of permissions (Full control, contribute, read, etc.) that are granted at different scopes (site, web, library, folder or file). This covers permissions granted directly to users and groups, plus those permissions granted using sharing links. Each unique scope and permission combination gets their own Permission object, which could include multiple users and groups in the “shared with” list.

 

For instance, if you grant full permissions on a file to 10 users, that is a single Permission object where the “Full Control” role definition for the scope of that file is granted to a set of 10 users. That entire information is captured in one Permission object.

 

If you want to grant permission to a file with 5 users having read/write permission and another 5 users having read-only permissions, then you need 2 Permission objects. One with the “Contribute” role definition for the file being granted to 5 users and another with the “Read” role for that file being granted to the other 5 users.

 

You can read more about the Sharing Permissions dataset in the blog at SharePoint on MGDC FAQ: What is in the Permissions dataset?

 

6. Can I predict how many Sharing Permission objects?

 

The exact number of Sharing Permission objects for a given SharePoint tenant is hard to predict.

If you have 100 sites, for instance, you could reasonably assume that there will be at the very least 300 Permission objects. That’s because each site, by default, gets an Owner, Member and Visitors group. Each of these 3 SharePoint groups is granted these specific permissions. So that’s 3 Permission objects per site, even if you don’t grant any other permissions after creating the site.

 

In addition to those, you could grant further permissions at other levels. There is also the common scenario of using sharing links. The permissions for each of those links are captured in another Permission object.

 

Obviously, the more sharing happens in your company, the more Sharing Permission objects you will have. For a sample of tenants with at least 5K sites, I see an average of 53 Permission objects per site, with a median of 35 Permission objects per site. I have seen 100 Permissions per site in a few companies with heavy usage of SharePoint and its collaboration capabilities. I have also seen companies with less collaboration activities that have 10 or 20 Permission objects per site in average.

 

If you want to estimate the number of SharePoint Permissions objects more precisely, there is an option to sample the dataset and get a total object count without pulling the entire dataset. For more information, see MGDC for SharePoint FAQ: How can I sample or estimate the number objects in a dataset? 

 

7. Objects in Delta datasets

 

An important topic for those concerned with a high number of objects is Delta Datasets. The idea is simple: instead of pulling all objects every day or every week, you can ask SharePoint on MGDC to deliver just what has changed. This mechanism will drastically reduce the number of objects delivered, by providing only objects that were created, updated, or deleted.

 

For more details about Delta Datasets, read the blog at SharePoint on MGDC FAQ: How can I use Delta State Datasets?

 

8. Summary

 

In summary, SharePoint on MGDC delivers data to you as JSON objects, packed into files that are pulled into your Azure Storage account. MGDC objects transferred will show in your Azure bill. The number of SharePoint objects depends on your number of sites/groups/files, as well as the amount of collaboration in your tenant.

 

I hope this blog post helped you understand what constitutes an object in SharePoint on MGDC. For more information about SharePoint Data in MGDC, please visit the collection of links I keep at https://aka.ms/SharePointData.

Co-Authors
Version history
Last update:
‎Aug 21 2024 01:43 PM
Updated by: