Forum Discussion
How can I use NiFi to ingest data from/to ADLS?
- Mar 09, 2019
No Hadoop is needed .. For ADLS Gen1 and Gen1 you need a couple of JAR files and a simplified core-site.xml. I am currently working with Nifi 1.9.0 (released feb 2019).
For ADLS Gen1 I am using :
- azure-data-lake-store-sdk-2.3.1.jar
- hadoop-azure-datalake-3.1.1.jar
- These jars are available in the Maven central repository
- My core-site.xml : (replace the $< >$ with your values.
<configuration> <property> <name>fs.defaultFS</name> <value>adl://$<adls storage account name>$.azuredatalakestore.net</value> </property> <property> <name>dfs.adls.oauth2.access.token.provider.type</name> <value>ClientCredential</value> </property> <property> <name>dfs.adls.oauth2.refresh.url</name> <value>https://login.microsoftonline.com/$<tenant id>$/oauth2/token</value> </property> <property> <name>dfs.adls.oauth2.client.id</name> <value>$<client id>$</value> </property> <property> <name>dfs.adls.oauth2.credential</name> <value>$<key>$</value> </property> </configuration>
For ADLS Gen2 I am using :
- hadoop-azure-3.2.0.jar
- wildfly-openssl-1.0.4.Final.jar
- my core-site.xml (replace the $< >$ with your values.
<configuration> <property> <name>fs.defaultFS</name> <value>abfss://$<cotainer>$@$<account>$.dfs.core.windows.net</value> </property> <property> <name>fs.azure.account.key.adbstorgen2.dfs.core.windows.net</name> <value>$<storage key>$</value> </property> <property> <name>fs.adlsGen2.impl</name> <value>org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem</value> </property> <property> <name>fs.abfss.impl</name> <value>org.apache.hadoop.fs.azurebfs.SecureAzureBlobFileSystem</value> </property> <property> <name>fs.AbstractFileSystem.adlsGen2.impl</name> <value>org.apache.hadoop.fs.azurebfs.Abfs</value> </property> <property> <name>fs.AbstractFileSystem.abfss.impl</name> <value>org.apache.hadoop.fs.azurebfs.Abfss</value> </property> <property> <name>fs.azure.check.block.md5</name> <value>false</value> </property> <property> <name>fs.azure.store.blob.md5</name> <value>false</value> </property> <property> <name>fs.azure.createRemoteFileSystemDuringInitialization</name> <value>true</value> </property> </configuration>
for the put list fetch HDFS flows .. you just need to fill in the Hadoop Configuration Resources with the path to your core-site.xml and Additional Classpath Resources with the folder that has the azure jar files. Enjoy !
Hi Bruce Nelson ,
thanks for the tips.
I tried to reproduce the steps you mentioned, but NiFi doesn't load properly the PutHDFS flow (it's the only one I'm testing at the moment).
The error I get is:
o.apache.nifi.processors.hadoop.PutHDFS PutHDFS[id=...] HDFS Configuration error - Configuration property [account name].dfs.core.windows.net not found.: Configuration property [account name].dfs.core.windows.net not found.
org.apache.hadoop.fs.azurebfs.contracts.exceptions.ConfigurationPropertyNotFoundException: Configuration property [account name].dfs.core.windows.net not found.
at org.apache.hadoop.fs.azurebfs.AbfsConfiguration.getStorageAccountKey(AbfsConfiguration.java:342)
at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.initializeClient(AzureBlobFileSystemStore.java:812)
at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.<init>(AzureBlobFileSystemStore.java:149)
at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.initialize(AzureBlobFileSystem.java:108)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3288)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:473)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:225)
at org.apache.nifi.processors.hadoop.AbstractHadoopProcessor$1.run(AbstractHadoopProcessor.java:434)
at org.apache.nifi.processors.hadoop.AbstractHadoopProcessor$1.run(AbstractHadoopProcessor.java:431)
at java.base/java.security.AccessController.doPrivileged(Native Method)
at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962)
at org.apache.nifi.processors.hadoop.AbstractHadoopProcessor.getFileSystemAsUser(AbstractHadoopProcessor.java:431)
at org.apache.nifi.processors.hadoop.AbstractHadoopProcessor.resetHDFSResources(AbstractHadoopProcessor.java:393)
at org.apache.nifi.processors.hadoop.AbstractHadoopProcessor.abstractOnScheduled(AbstractHadoopProcessor.java:251)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:564)
at org.apache.nifi.util.ReflectionUtils.invokeMethodsWithAnnotations(ReflectionUtils.java:142)
at org.apache.nifi.util.ReflectionUtils.invokeMethodsWithAnnotations(ReflectionUtils.java:130)
at org.apache.nifi.util.ReflectionUtils.invokeMethodsWithAnnotations(ReflectionUtils.java:75)
at org.apache.nifi.util.ReflectionUtils.invokeMethodsWithAnnotation(ReflectionUtils.java:52)
at org.apache.nifi.controller.StandardProcessorNode.lambda$initiateStart$4(StandardProcessorNode.java:1515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1135)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:844)
I tried to check around, but I have no idea what it can refer to.
I'm currently using Nifi 1.9.1, and the jars with the version you specified on Ubuntu 18.04 Server.
Could you help me?
Thanks in advance,
Michele
Ok, I figured out where the problem was by reading http://docs.wandisco.com/bigdata/wdfusion/adls/.
In Bruce answer there is a small mistake in the ADLS Gen2 core-site.xml.
The correct one should be (differences marked in bold):
<configuration> <property> <name>fs.defaultFS</name> <value>abfss://$<container>$@$<account>$.dfs.core.windows.net</value> </property> <property> <name>fs.azure.account.key.$<account>$.dfs.core.windows.net</name> <value>$<storage key>$</value> </property> <property> <name>fs.adlsGen2.impl</name> <value>org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem</value> </property> <property> <name>fs.abfss.impl</name> <value>org.apache.hadoop.fs.azurebfs.SecureAzureBlobFileSystem</value> </property> <property> <name>fs.AbstractFileSystem.adlsGen2.impl</name> <value>org.apache.hadoop.fs.azurebfs.Abfs</value> </property> <property> <name>fs.AbstractFileSystem.abfss.impl</name> <value>org.apache.hadoop.fs.azurebfs.Abfss</value> </property> <property> <name>fs.azure.check.block.md5</name> <value>false</value> </property> <property> <name>fs.azure.store.blob.md5</name> <value>false</value> </property> <property> <name>fs.azure.createRemoteFileSystemDuringInitialization</name> <value>true</value> </property> </configuration>
Tested using:
- hadoop-azure-3.2.0.jar
- wildfly-openssl-1.0.4.Final.jar
- Bruce NelsonMar 25, 2019Microsoft
apologies .. I missed the account name when cleaning up the core-site.xml to send.
I also have Nifi *HDFS working with RBAC and ACLs for ADL gen2 as well .. working on a write up
- rarity1210Jan 14, 2020Copper Contributorhello. I saw question and answer, i tried to connect from nifi to gen2, but it does not work.
can you show me your nifi processor flow and detailed configuration?
I think nifi can not recognize azure jar files.
plz help me.
I use nifi 1.10.0
- rarity1210Jan 14, 2020Copper Contributorhello. I saw question and answer, i tried to connect from nifi to gen2, but it does not work.
can you show me your nifi processor flow and detailed configuration?
I think nifi can not recognize azure jar files.
plz help me
I use nifi 1.10.0.- rarity1210Jan 15, 2020Copper Contributor
rarity1210 The Error message is below.
ERROR [Timer-Driven Process Thread-5] o.apache.nifi.processors.hadoop.ListHDFS ListHDFS[id=8420db79-016f-1000-0be1-bfa5c8c82247] Failed to properly initialize Processor. If still scheduled to run, NiFi will attempt to initialize and run the Processor again after the 'Administrative Yield Duration' has elapsed. Failure is due to java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FileSystem: java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FileSystem
java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FileSystem
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
at java.lang.ClassLoader.loadClass(ClassLoader.java:411)
at java.lang.ClassLoader.loadClass(ClassLoader.java:411)
at java.lang.ClassLoader.loadClass(ClassLoader.java:411)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.nifi.processors.hadoop.AbstractHadoopProcessor$ExtendedConfiguration.getClassByNameOrNull(AbstractHadoopProcessor.java:603)
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2497)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2593)
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:3269)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3301)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:476)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:227)
at org.apache.nifi.processors.hadoop.AbstractHadoopProcessor$1.run(AbstractHadoopProcessor.java:434)
at org.apache.nifi.processors.hadoop.AbstractHadoopProcessor$1.run(AbstractHadoopProcessor.java:431)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at org.apache.nifi.processors.hadoop.AbstractHadoopProcessor.getFileSystemAsUser(AbstractHadoopProcessor.java:431)
at org.apache.nifi.processors.hadoop.AbstractHadoopProcessor.resetHDFSResources(AbstractHadoopProcessor.java:393)
at org.apache.nifi.processors.hadoop.AbstractHadoopProcessor.abstractOnScheduled(AbstractHadoopProcessor.java:251)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.nifi.util.ReflectionUtils.invokeMethodsWithAnnotations(ReflectionUtils.java:142)I use List/Fetch/Put HDFS Processor
- rarity1210Jan 15, 2020Copper Contributor