Blog Post

Azure Data Factory Blog
3 MIN READ

SharePoint Online Multiple Files (Folder) Copy with Http Connector

RoshnaNazir's avatar
RoshnaNazir
Icon for Microsoft rankMicrosoft
Jul 01, 2021

ADF does not directly support copying a folder/multiple files from SharePoint Online, but there are workarounds to achieve this. Two additional steps needed here as compared to single file copy are:

  1. Get the list of files:
    • User can maintain the file names in a text file manually, OR
    • Use Web Activity to call SharePoint Rest API to get the list of files.
  2. ForEach Activity to loop the list of relative file names and pass the file name to Copy Activity (Base URL changes a bit as compared to single file copy)

 

Below is how the pipeline flow would look like:

 

Web1 – Get the access token from SPO

Web2 – Get the list of files from SPO folder

ForEach1 – Loop the list of file names

Copy1 – Copy data with HTTP connector as source

 

 

 

Step1:

Grab Access token from SPO

Copy file from SharePoint Online leverages AAD/service principal authentication and SharePoint API to retrieve files.

 

  1. Register SharePoint Application and Grant permission - https://docs.microsoft.com/en-us/azure/storage/common/storage-auth-aad-app?tabs=dotnet#register-your-application-with-an-azure-ad-tenant

         a) Register AAD Application

      1. On Azure Portal, go to AAD app registration page: https://portal.azure.com/#blade/Microsoft_AAD_IAM/ActiveDirectoryMenuBlade/RegisteredApps 
      2. New Registration à Enter your App name
      3. Go to "Certificates & secrets", create new client secret, you can set the expire to 1Y/2Y/Never

       b) Grant SharePoint site permission to your registered App (need site owner permission on SharePoint)

 

 

Full details on how to register app and also granting permissions is mentioned in prerequisites here - https://docs.microsoft.com/en-us/azure/data-factory/connector-sharepoint-online-list#prerequisites

 

     c) Create an ADF Pipeline. Start with creating a Web Activity to get the access token

Headers:

  • Content-Type: application/x-www-form-urlencoded
  • Body: grant_type=client_credentials&client_id=[Client-ID]@[Tenant-ID]&client_secret=[Client-Secret]&resource=00000003-0000-0ff1-ce00-000000000000/[Tenant-Name].sharepoint.com@[Tenant-ID

 

Debug run to check if the activity succeeds and also check the activity output to see if it returns the access token in the payload. You can also verify the same using Postman client to check if the token is valid.

 

 

 

Step 2:

Get the list of Files

 

  1. Create another Web Activity to get the list of files

Headers:

  • Authorization: @{concat('Bearer ', activity('WebActivity1Name').output.access_token)}
  • Accept: application/json

          

 Debug run to see if the activity succeeds, and check it shows the list of files under the folder in the output.

 

 

 

 

Step 3:

Loop the list of relative file names

 

  1. Create a ForEach Activity with inner Copy activity
  • Items: @activity('WebActivity2Name').output.value

 

 

 

 

Step 4:

Create Copy activity

  1. New dataset -> HTTP -> Binary type:

    a) HTTP linked service

 

    b) Configure copy activity HTTP source

         Dataset properties:

  • Name: RelativeURL (Any name)
  • Value: @{item().ServerRelativeUrl}
  • Request method: GET
  • Additional header: “Authorization: Bearer <accessToken>” (accessToken is generated in Step1)

 

Tip: You can test with a static access token gotten from the previous Web activity output first. You can also use expression (add dynamic content): @{concat('Authorization: Bearer ',activity('WebActivityName').output.access_token)}

 

 

     c) Configure Linked Service properties

  • Name: FileName (Any Name)
  • Value: @dataset().RelativeURL

 

 

2. Create Copy sink as below

 

 

 

 Successful pipeline run as follows:              

 

 

Thanks to @Jijo Puthooran for helping me in authoring this blog.

Published Jul 01, 2021
Version 1.0

36 Comments

  • Badri_Sridharan's avatar
    Badri_Sridharan
    Copper Contributor

    Hi RoshnaNazir Thanks much for this post. I had a few questions while trying to implement this. Appreciate your inputs. Thanks

    1. Does this solution work for scenario wherein we files at the have SharePoint SubSite level. (not Site)?

    2. Right now after registration of SharePoint SubSite on Azure Portal, the output of SPOFileList is coming out as hexadecimal values. Is that expected?

    3. Getting this error while setting up Step 3/Step4 - 

    • Items: @activity('WebActivity2Name').output

    "The function 'length' expects its parameter to be an array or a string. The provided value is of type 'Object'."

    4. Also, in our case, the Sink is Azure Blob Storage - set u pas binary2

     

    CC: shisyu_gaku 

  • andriimazur's avatar
    andriimazur
    Copper Contributor

    hello hps2022, i have found the correct links using REST API in python, the debugging is much faster than when you start ADF pipelines. The goal is to download a file with Python, when it works you can use the same data for ADF

  • hps2022's avatar
    hps2022
    Copper Contributor

    gpoggi  I am getting the same error.

    If you have found the solution, please help me to resolve the issue.

  • shisyu_gaku's avatar
    shisyu_gaku
    Copper Contributor

    Thanks RoshnaNazir , I am one of the people who was helped by your article.

     

    Hi gpoggi , I think your URL is wrong.
    Try adding a slash to the head of the path like this.
    I solved the error by working on this.

    https://{site_url}/_api/web/GetFolderByServerRelativeUrl('/Shared Documents')

    https://docs.microsoft.com/en-us/sharepoint/dev/sp-add-ins/working-with-folders-and-files-with-rest

  • gpoggi's avatar
    gpoggi
    Copper Contributor

    Thanks RoshnaNazir! I have one question, I was able to Grab Access token from SPO but I have a problem in Step 2 when trying to Get the list of Files 😞

     

    My Web Activity URL: https://mydomain.sharepoint.com/sites/PowerBISSASTabularsite/_api/web/GetFolderByServerRelativeUrl('/Shared Documents')/Files

    Error Message: {"error_description":"Exception of type 'Microsoft.IdentityModel.Tokens.AudienceUriValidationFailedException' was thrown."}

     

    I could confirm that I'm getting the token correctly:

     

     

    However for some reason next web activity is not working 😞

     

    Do you have any idea about why I'm getting this error? 

     

    Any suggestion is really appreciated