Forum Discussion
Issues while accessing files in timeseries folder on ADLS gen2 from Azure Synapse
Hi,
I have mounted the container in synapse workspace and I need to list all the file present in subfolders using Synapse notebook. The same code is working on windows but not on synapse. For example on my windows machine, when I run the below command :
from pathlib import Path
list(Path("C:/Users/sutripathi/Documents/PySpark/archive").rglob(f'{year}/{month}/{day}/*.csv'))
o/p :
[WindowsPath('C:/Users/sutripathi/Documents/PySpark/archive/2022/08/16/20220804000000_availabilityzones_v0.csv'),
WindowsPath('C:/Users/sutripathi/Documents/PySpark/archive/2022/08/16/20220804010000_availabilityzones_v0.csv')]
But when I am using to get the files from Synapse, it is returning empty list :
list(Path("synfs:/8/mnt/qoscontainer/publish/xxxx/yyyy/zzzzz").rglob('2022/08/16/*.parquet'))
o/p : []
Please check and help
- _MartinBIron Contributor
Hi sutripathi,
Have you tried
mssparkutils.fs.ls('abfss://<container_name>@<storage_account_name>.dfs.core.windows.net/<path>')
For details see https://docs.microsoft.com/en-us/azure/synapse-analytics/spark/microsoft-spark-utilities?pivots=programming-language-python#list-files