Issues while accessing files in timeseries folder on ADLS gen2 from Azure Synapse

Microsoft

Hi,

I have mounted the container in synapse workspace and I need to list all the file present in subfolders using Synapse notebook. The same code is working on windows but not on synapse. For example on my windows machine, when I run the below command :

from pathlib import Path

list(Path("C:/Users/sutripathi/Documents/PySpark/archive").rglob(f'{year}/{month}/{day}/*.csv'))

o/p :
[WindowsPath('C:/Users/sutripathi/Documents/PySpark/archive/2022/08/16/20220804000000_availabilityzones_v0.csv'),
WindowsPath('C:/Users/sutripathi/Documents/PySpark/archive/2022/08/16/20220804010000_availabilityzones_v0.csv')]

But when I am using to get the files from Synapse, it is returning empty list :

list(Path("synfs:/8/mnt/qoscontainer/publish/xxxx/yyyy/zzzzz").rglob('2022/08/16/*.parquet'))

o/p : []

Please check and help

1 Reply

Hi @sutripathi,

Have you tried 

mssparkutils.fs.ls('abfss://<container_name>@<storage_account_name>.dfs.core.windows.net/<path>')

For details see https://docs.microsoft.com/en-us/azure/synapse-analytics/spark/microsoft-spark-utilities?pivots=prog...