Forum Discussion

marshal.tito01's avatar
marshal.tito01
Copper Contributor
Sep 27, 2018

How can I use NiFi to ingest data from/to ADLS?

I would like to use NiFi to connect with ADLS. My scenario is like this: Nifi is installed and running in windows machine.Now i want to move data from my windows local directory to ADLS. I am not usi...
  • Bruce Nelson's avatar
    Mar 09, 2019

    No Hadoop is needed .. For ADLS Gen1 and Gen1 you need a couple of JAR files and a simplified core-site.xml.  I am currently working with Nifi 1.9.0 (released feb 2019).

     

    For ADLS Gen1 I am using  :

    • azure-data-lake-store-sdk-2.3.1.jar
    • hadoop-azure-datalake-3.1.1.jar 
    • These jars are available in the Maven central repository
    • My core-site.xml : (replace the $< >$ with your values. 

     

    <configuration>
    <property>
    <name>fs.defaultFS</name>
    <value>adl://$<adls storage account name>$.azuredatalakestore.net</value>
    </property>
    <property>
    <name>dfs.adls.oauth2.access.token.provider.type</name>
    <value>ClientCredential</value>
    </property>
    <property>
    <name>dfs.adls.oauth2.refresh.url</name>
    <value>https://login.microsoftonline.com/$<tenant id>$/oauth2/token</value>
    </property>
    <property>
    <name>dfs.adls.oauth2.client.id</name>
    <value>$<client id>$</value>
    </property>
    <property>
    <name>dfs.adls.oauth2.credential</name>
    <value>$<key>$</value>
    </property>
    </configuration>
    
    

    For ADLS Gen2 I am using  : 

     

    • hadoop-azure-3.2.0.jar
    • wildfly-openssl-1.0.4.Final.jar
    • my core-site.xml (replace the $< >$ with your values. 

     

    <configuration>
      <property>
        <name>fs.defaultFS</name>
        <value>abfss://$<cotainer>$@$<account>$.dfs.core.windows.net</value>
      </property>
      <property>
        <name>fs.azure.account.key.adbstorgen2.dfs.core.windows.net</name>
        <value>$<storage key>$</value>
      </property>
      <property>
                 <name>fs.adlsGen2.impl</name>
                 <value>org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem</value>
       </property>
       <property>
                 <name>fs.abfss.impl</name>
                 <value>org.apache.hadoop.fs.azurebfs.SecureAzureBlobFileSystem</value>
       </property>
       <property>
                 <name>fs.AbstractFileSystem.adlsGen2.impl</name>
                 <value>org.apache.hadoop.fs.azurebfs.Abfs</value>
       </property>
       <property>
                 <name>fs.AbstractFileSystem.abfss.impl</name>
                 <value>org.apache.hadoop.fs.azurebfs.Abfss</value>
       </property>
       <property>
                 <name>fs.azure.check.block.md5</name>
                 <value>false</value>
       </property>
       <property>
                 <name>fs.azure.store.blob.md5</name>
                 <value>false</value>
       </property>
       <property>
                 <name>fs.azure.createRemoteFileSystemDuringInitialization</name>
                 <value>true</value>
       </property>
    </configuration>

    for the put list fetch HDFS flows .. you just need to fill in the Hadoop Configuration Resources with the path to your core-site.xml and Additional Classpath Resources  with the folder that has the azure jar files. Enjoy !

     

Resources