Soft delete with blob version enabled at the storage account could provide a version recovery file option, this will only work with a non-hierarchical namespace.
Another potential option was Delta files, which better suited our scenario and have the time travel functionality.
What is Delta?
Delta lake is an open-source storage framework that enables building a Lakehouse Architecture. Delta Lake is simple: Information about which objects are part of a Delta table is maintained in an ACID manner, using a write-ahead log that is itself stored in the cloud object store. The objects themselves are encoded in Parquet. Delta lake is becoming increasingly popular and you can use it inside of your Synapse Workspace.
FactCurrencyRate_join contains the following columns - Fig. 1 - Metadata:
Fig. 1 - Metadata
Suppose I increased by "mistake" a few times the AverageRate values and now I need to travel back in time to bring back the last valid value before I changed. Following the code to travel in time based on the checkpoint where the information was in the state before the changes, which in my example is "2022-09-16 08:41:00". If you want to list the last checkpoint and versions, follow the code example. Fig. 2 - shows the results and Fig. 3 - checkpoints illustration:
I am working here with files in a Delta format, you can also do the same with Spark Delta tables if you prefer.
How can I make Delta Lake accessible for BI tools?
You can make Delta lake files accessible to BI tools using a Serverless SQL pool. Serverless SQL Pool supports Delta files and tables in your Spark pool. It enables you to easily query them or create views on top of them. You can even build a logical Data Warehouse or just explore the data. Below is a sample query of our persisted Delta table after time travel was applied, and the query results, Fig. 4 - Results:
TOP 100 *
FORMAT = 'Delta'
) AS [result]
Fig. 4 - Results
Creating Delta Tables
<Tables based on Delta format integrated to Serverless SQL Pool are currently in Public Preview>
The Serverless SQL pool has a service that ensures metadata stays synchronized with Spark. This enables external tables created and managed on the Lake Database to be available as external tables with the same name in the corresponding synchronized database on your Serverless SQL pool.
For example, I am reading the files from my folder which contains a set parquet with information of FactCurrencyRate - called FactCurrencyRate_Parquet, and then I save them as Delta Table on the Lake Database Default:
Fig. 5 - Shows the Delta table created accessible from Synapse Studio - > Data:
Fig. 5 - Shows the Delta table
Follow Fig. 6 - Results from Serverless SQL Pool:
Fig. 6 - Results
Delta time travel can be used in Apache Spark for Synapse as an option to do a point-in-time recovery while building a Lakehouse architecture. This is very useful as it enables you to roll back changes, create snapshot scenarios, and manage the file versions and changes on top of your Data lake architecture. As Serverless SQL pools support Delta format and it is integrated with Spark you can use it to explore further as an end-to-end solution.