Forum Discussion
How can I use NiFi to ingest data from/to ADLS?
- Mar 09, 2019
No Hadoop is needed .. For ADLS Gen1 and Gen1 you need a couple of JAR files and a simplified core-site.xml. I am currently working with Nifi 1.9.0 (released feb 2019).
For ADLS Gen1 I am using :
- azure-data-lake-store-sdk-2.3.1.jar
- hadoop-azure-datalake-3.1.1.jar
- These jars are available in the Maven central repository
- My core-site.xml : (replace the $< >$ with your values.
<configuration> <property> <name>fs.defaultFS</name> <value>adl://$<adls storage account name>$.azuredatalakestore.net</value> </property> <property> <name>dfs.adls.oauth2.access.token.provider.type</name> <value>ClientCredential</value> </property> <property> <name>dfs.adls.oauth2.refresh.url</name> <value>https://login.microsoftonline.com/$<tenant id>$/oauth2/token</value> </property> <property> <name>dfs.adls.oauth2.client.id</name> <value>$<client id>$</value> </property> <property> <name>dfs.adls.oauth2.credential</name> <value>$<key>$</value> </property> </configuration>
For ADLS Gen2 I am using :
- hadoop-azure-3.2.0.jar
- wildfly-openssl-1.0.4.Final.jar
- my core-site.xml (replace the $< >$ with your values.
<configuration> <property> <name>fs.defaultFS</name> <value>abfss://$<cotainer>$@$<account>$.dfs.core.windows.net</value> </property> <property> <name>fs.azure.account.key.adbstorgen2.dfs.core.windows.net</name> <value>$<storage key>$</value> </property> <property> <name>fs.adlsGen2.impl</name> <value>org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem</value> </property> <property> <name>fs.abfss.impl</name> <value>org.apache.hadoop.fs.azurebfs.SecureAzureBlobFileSystem</value> </property> <property> <name>fs.AbstractFileSystem.adlsGen2.impl</name> <value>org.apache.hadoop.fs.azurebfs.Abfs</value> </property> <property> <name>fs.AbstractFileSystem.abfss.impl</name> <value>org.apache.hadoop.fs.azurebfs.Abfss</value> </property> <property> <name>fs.azure.check.block.md5</name> <value>false</value> </property> <property> <name>fs.azure.store.blob.md5</name> <value>false</value> </property> <property> <name>fs.azure.createRemoteFileSystemDuringInitialization</name> <value>true</value> </property> </configuration>for the put list fetch HDFS flows .. you just need to fill in the Hadoop Configuration Resources with the path to your core-site.xml and Additional Classpath Resources with the folder that has the azure jar files. Enjoy !
Ok, I figured out where the problem was by reading http://docs.wandisco.com/bigdata/wdfusion/adls/.
In Bruce answer there is a small mistake in the ADLS Gen2 core-site.xml.
The correct one should be (differences marked in bold):
<configuration>
<property>
<name>fs.defaultFS</name>
<value>abfss://$<container>$@$<account>$.dfs.core.windows.net</value>
</property>
<property>
<name>fs.azure.account.key.$<account>$.dfs.core.windows.net</name>
<value>$<storage key>$</value>
</property>
<property>
<name>fs.adlsGen2.impl</name>
<value>org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem</value>
</property>
<property>
<name>fs.abfss.impl</name>
<value>org.apache.hadoop.fs.azurebfs.SecureAzureBlobFileSystem</value>
</property>
<property>
<name>fs.AbstractFileSystem.adlsGen2.impl</name>
<value>org.apache.hadoop.fs.azurebfs.Abfs</value>
</property>
<property>
<name>fs.AbstractFileSystem.abfss.impl</name>
<value>org.apache.hadoop.fs.azurebfs.Abfss</value>
</property>
<property>
<name>fs.azure.check.block.md5</name>
<value>false</value>
</property>
<property>
<name>fs.azure.store.blob.md5</name>
<value>false</value>
</property>
<property>
<name>fs.azure.createRemoteFileSystemDuringInitialization</name>
<value>true</value>
</property>
</configuration>Tested using:
- hadoop-azure-3.2.0.jar
- wildfly-openssl-1.0.4.Final.jar
apologies .. I missed the account name when cleaning up the core-site.xml to send.
I also have Nifi *HDFS working with RBAC and ACLs for ADL gen2 as well .. working on a write up
- rarity1210Jan 14, 2020Copper Contributorhello. I saw question and answer, i tried to connect from nifi to gen2, but it does not work.
can you show me your nifi processor flow and detailed configuration?
I think nifi can not recognize azure jar files.
plz help me.
I use nifi 1.10.0