Custom Script in Azure Data Factory & Databricks

%3CLINGO-SUB%20id%3D%22lingo-sub-1261008%22%20slang%3D%22en-US%22%3ECustom%20Script%20in%20Azure%20Data%20Factory%20%26amp%3B%20Databricks%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-1261008%22%20slang%3D%22en-US%22%3E%3CP%3EHi%2C%3C%2FP%3E%3CP%3EI%20have%20a%20requirement%20to%20parse%20a%20lot%20of%20small%20files%20and%20load%20them%20into%20a%20database%20in%20a%20flattened%20structure.%20I%20prefer%20to%20use%20ADF%20V2%20and%20SQL%20Database%20to%20accomplish%20it.%20The%20file%20parsing%20is%20already%20done%20using%20Python%20script%20and%20I%20wanted%20to%20orchestrate%20it%20in%20ADF.%20I%20could%20see%20an%20option%20of%20using%20Python%20Notebook%20%3CSPAN%3Econnector%20to%3C%2FSPAN%3E%20Azure%20Databricks%20in%20ADF%20v2.%20May%20I%20ask%20if%20I%20will%20be%20able%20to%20just%20run%20a%20plain%20Python%20script%20in%20Azure%20Databricks%20through%20ADF%3F%20If%20I%20do%20so%2C%20will%20I%20just%20run%20the%20script%20in%20Databricks%20cluster's%20driver%20only%20and%20might%20not%20utilize%20the%20cluster's%20full%20capacity.%20I%20am%20also%20thinking%20of%20calling%20Azure%20functions%20as%20well.%20Please%20advise%20which%20one%20is%20more%20appropriate%20in%20this%20case.%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-LABS%20id%3D%22lingo-labs-1261008%22%20slang%3D%22en-US%22%3E%3CLINGO-LABEL%3EAzure%20Data%20Factory%3C%2FLINGO-LABEL%3E%3CLINGO-LABEL%3EAzure%20Data%20Integration%3C%2FLINGO-LABEL%3E%3C%2FLINGO-LABS%3E
Frequent Visitor

Hi,

I have a requirement to parse a lot of small files and load them into a database in a flattened structure. I prefer to use ADF V2 and SQL Database to accomplish it. The file parsing is already done using Python script and I wanted to orchestrate it in ADF. I could see an option of using Python Notebook connector to Azure Databricks in ADF v2. May I ask if I will be able to just run a plain Python script in Azure Databricks through ADF? If I do so, will I just run the script in Databricks cluster's driver only and might not utilize the cluster's full capacity. I am also thinking of calling Azure functions as well. Please advise which one is more appropriate in this case.

0 Replies