How do I go about solving this?
Forum Discussion
load last modified files from subfolders in dataflow
I have the following directory structure on an Azure container:
-Main_Folder -2021-01 -file1.parquet -2021-02 -file2.parquet -file3.parquet
where the Data is partitioned by year and month to create subfolders. Within these sub-folders, I have my data files. I want to load into my data flow only the latest files that were added within one day from running my data flow pipeline.
I tried using currentUTC() in End Time and subtracting one day -> AddDays(currentUTC(), -1) in Start Time in the 'Filter by last modified' option provided in source options but it didn't work.
I also tried using currentTimestamp() instead but to no avail.
The subfolders are populated by a pipeline that is triggered daily, and the data is partitioned based on year and month. So It is true that I want to target the last folder that I have, but within that folder, I want to load only the files created from the last pipeline run, especially that these subfolders are created dynamically as we go from month to month and year to year.
1 Reply
- amroghoneimCopper ContributorResolved. Check this link if you are interested in the solution.
https://stackoverflow.com/questions/67903842/how-to-load-data-from-last-modified-files-within-one-day-from-subfolders-azure-d