We recently announced the COPY statement for Azure Synapse Analytics (formerly Azure SQL Data Warehouse). Since then, customers have been able to take advantage of many flexible features within the COPY statement which enables a seamless loading experience. One feature which is becoming wildly popular is being able to specify wildcards and multi-file locations in a single COPY statement.
Before the COPY statement, the loading experience was limited to only a single LOCATION parameter which pointed to either a file or folder. If a folder is specified, the load would retrieve all the files from the folder and all its subfolders. This all or nothing behavior posed manageability issues which was cumbersome for customers when exporting data to Azure and orchestrating ELT pipelines. Customers had to ensure specific source systems exported data to different folder structures (which may not always be feasible), different LOCATION paths and External Table definitions were dynamically created during the load process, and the storage location path was cleaned up after each load and archived to separate locations.
With the flexibility of wildcards in the COPY statement, customers can simply export to a single folder structure and target specific files by writing filtering logic using wildcards within their COPY statement. In addition, there are scenarios where different source systems export data to different storage container paths where multi-file locations can simply be used for loading. Here are a few common usage examples of wildcards and multi-file locations:
Load all parquet files ignoring other file formats that may be residing in the customers folder including its sub folders:
COPY INTO customer_table FROM 'https://myaccount.blob.core.windows.net/myblobcontainer/customers/*.parquet';
Load and back fill sales data during the month of December 2019:
COPY INTO sales_table FROM 'https://myaccount.blob.core.windows.net/myblobcontainer/sales/12*2019.parquet';
Load files only from a known set of customers:
COPY INTO top_customers_adhoc_V1 FROM 'https://myaccount.blob.core.windows.net/myblobcontainer/customers/customer1', 'https://myaccount.blob.core.windows.net/myblobcontainer/customers/customer2', 'https://myaccount.blob.core.windows.net/myblobcontainer/ customers/customer3'
Learn more about the COPY statement here. Please send any feedback or issues to the following distribution list: firstname.lastname@example.org.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.