load last modified files from subfolders in dataflow

%3CLINGO-SUB%20id%3D%22lingo-sub-2431591%22%20slang%3D%22en-US%22%3Eload%20last%20modified%20files%20from%20subfolders%20in%20dataflow%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-2431591%22%20slang%3D%22en-US%22%3E%3CP%3EI%20have%20the%20following%20directory%20structure%20on%20an%20Azure%20container%3A%3C%2FP%3E%3CPRE%3E-Main_Folder%0A%20%20%20-2021-01%0A%20%20%20%20%20-file1.parquet%0A%20%20%20-2021-02%0A%20%20%20%20%20-file2.parquet%0A%20%20%20%20%20-file3.parquet%3C%2FPRE%3E%3CP%3Ewhere%20the%20Data%20is%20partitioned%20by%20year%20and%20month%20to%20create%20subfolders.%20Within%20these%20sub-folders%2C%20I%20have%20my%20data%20files.%20I%20want%20to%20load%20into%20my%20data%20flow%20only%20the%20latest%20files%20that%20were%20added%20within%20one%20day%20from%20running%20my%20data%20flow%20pipeline.%3C%2FP%3E%3CP%3EI%20tried%20using%20currentUTC()%20in%20End%20Time%20and%20subtracting%20one%20day%20-%26gt%3B%20AddDays(currentUTC()%2C%20-1)%20in%20Start%20Time%20in%20the%20'Filter%20by%20last%20modified'%20option%20provided%20in%20source%20options%20but%20it%20didn't%20work.%3C%2FP%3E%3CP%3EI%20also%20tried%20using%20currentTimestamp()%20instead%20but%20to%20no%20avail.%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3E%3CSPAN%3EThe%20subfolders%20are%20populated%20by%20a%20pipeline%20that%20is%20triggered%20daily%2C%20and%20the%20data%20is%20partitioned%20based%20on%20year%20and%20month.%20So%20It%20is%20true%20that%20I%20want%20to%20target%20the%20last%20folder%20that%20I%20have%2C%20but%20within%20that%20folder%2C%20I%20want%20to%20load%20only%20the%20files%20created%20from%20the%20last%20pipeline%20run%2C%20especially%20that%20these%20subfolders%20are%20created%20dynamically%20as%20we%20go%20from%20month%20to%20month%20and%20year%20to%20year.%3C%2FSPAN%3E%3C%2FP%3E%3CP%3E%3CSPAN%20class%3D%22lia-inline-image-display-wrapper%20lia-image-align-inline%22%20image-alt%3D%22stackoverflow1.JPG%22%20style%3D%22width%3A%20999px%3B%22%3E%3CIMG%20src%3D%22https%3A%2F%2Ftechcommunity.microsoft.com%2Ft5%2Fimage%2Fserverpage%2Fimage-id%2F287525i6168984C68CA1CC9%2Fimage-size%2Flarge%3Fv%3Dv2%26amp%3Bpx%3D999%22%20role%3D%22button%22%20title%3D%22stackoverflow1.JPG%22%20alt%3D%22stackoverflow1.JPG%22%20%2F%3E%3C%2FSPAN%3E%3CSPAN%20class%3D%22lia-inline-image-display-wrapper%20lia-image-align-inline%22%20image-alt%3D%22stackoverflow2.JPG%22%20style%3D%22width%3A%20999px%3B%22%3E%3CIMG%20src%3D%22https%3A%2F%2Ftechcommunity.microsoft.com%2Ft5%2Fimage%2Fserverpage%2Fimage-id%2F287526iFE4EE1D8B6E284B4%2Fimage-size%2Flarge%3Fv%3Dv2%26amp%3Bpx%3D999%22%20role%3D%22button%22%20title%3D%22stackoverflow2.JPG%22%20alt%3D%22stackoverflow2.JPG%22%20%2F%3E%3C%2FSPAN%3E%3C%2FP%3E%3CDIV%20class%3D%22s-prose%20js-post-body%22%3E%3CP%3EHow%20do%20I%20go%20about%20solving%20this%3F%3C%2FP%3E%3C%2FDIV%3E%3CDIV%20class%3D%22mt24%20mb12%22%3E%3CDIV%20class%3D%22post-taglist%20grid%20gs4%20gsy%20fd-column%22%3E%3CDIV%20class%3D%22grid%20ps-relative%22%3E%26nbsp%3B%3C%2FDIV%3E%3C%2FDIV%3E%3C%2FDIV%3E%3C%2FLINGO-BODY%3E%3CLINGO-SUB%20id%3D%22lingo-sub-2449634%22%20slang%3D%22en-US%22%3ERe%3A%20load%20last%20modified%20files%20from%20subfolders%20in%20dataflow%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-2449634%22%20slang%3D%22en-US%22%3EResolved.%20Check%20this%20link%20if%20you%20are%20interested%20in%20the%20solution.%3CBR%20%2F%3E%3CA%20href%3D%22https%3A%2F%2Fstackoverflow.com%2Fquestions%2F67903842%2Fhow-to-load-data-from-last-modified-files-within-one-day-from-subfolders-azure-d%22%20target%3D%22_blank%22%20rel%3D%22nofollow%20noopener%20noreferrer%22%3Ehttps%3A%2F%2Fstackoverflow.com%2Fquestions%2F67903842%2Fhow-to-load-data-from-last-modified-files-within-one-day-from-subfolders-azure-d%3C%2FA%3E%3C%2FLINGO-BODY%3E
New Contributor

I have the following directory structure on an Azure container:

-Main_Folder
   -2021-01
     -file1.parquet
   -2021-02
     -file2.parquet
     -file3.parquet

where the Data is partitioned by year and month to create subfolders. Within these sub-folders, I have my data files. I want to load into my data flow only the latest files that were added within one day from running my data flow pipeline.

I tried using currentUTC() in End Time and subtracting one day -> AddDays(currentUTC(), -1) in Start Time in the 'Filter by last modified' option provided in source options but it didn't work.

I also tried using currentTimestamp() instead but to no avail.

 

The subfolders are populated by a pipeline that is triggered daily, and the data is partitioned based on year and month. So It is true that I want to target the last folder that I have, but within that folder, I want to load only the files created from the last pipeline run, especially that these subfolders are created dynamically as we go from month to month and year to year.

stackoverflow1.JPGstackoverflow2.JPG

How do I go about solving this?

1 Reply