As data engineers, we grapple with numerous challenges daily. Data is often scattered across various sources, residing in a multitude of file types with varying data quality. The time spent locating specific files—figuring out which tenant they belong to and deciphering access rights—can be exasperating. This is where OneLake steps in.
OneLake streamlines data management, breaks down silos, and ensures that your data resides in one unified home—just like OneDrive for files!
A Basic Setup would be;
What is OneLake?
OneLake is essentially the OneDrive for data within the Fabric ecosystem. Just like OneDrive, it’s automatically provisioned for every Fabric tenant, requiring no infrastructure management.
Key benefits of OneLake include:
How does OneLake work?
The architecture of OneLake allows seamless connectivity to multiple cloud providers. Let’s explore the basics:
Managed Data: Tables
Tables play a crucial role in managing and organizing data within the lakehouse architecture. Once set up in the managed section of the lakehouse, you have several options:
Connecting External Data to Microsoft Fabric OneLake
Now that you’ve grasped how the oneLake works, let’s get some data from an External source into oneLake. For this, we will be using the Data Engineering Experience, feel free to choose any other Experience.
Select lakehouse item from the drop-down menu and give it a name
Setup a Lakehouse:
Next, create a lakehouse item within your workspace by following the following steps;3. Ingest the Data from an Extenal source into the Lakehouse.
Use any of the following options to create a shortcut, which is allows you to point to other storage locations, which can either be internal or external to oneLake.
That will launch up a shortcut wizard, select the source you want to pull your data from. For this demo select OneLake to create an internal shortcut.
Find and connect to the data you want to use with your shortcut. And click next. Your data will be loaded in the files section of your lakehouse
Preview the Loaded data by clicking on the files section
4. Transform the Data into Delta Tables
Once your data is in the Lakehouse, create a new notebook and associate it with the Lakehouse created. Drag and drop the file into the notebook.
5. Transform it into delta tables using Spark within the Fabric notebook. Delta tables provide efficient change tracking and management.
6. Build Reports and Analyze the Data
From the table view, click on Lakehouse and select SQL analytics endpoint.
From the SQL endpoint view, select new visual to create a simple visual
You can create the visuals manually, or let co-pilot do the magic for you.
Clean Up Resources: After completing the task, remember to clean up any temporary or test data.
Conclusion
OneLake aims to give you the most value possible out of a single copy of data without data movement or duplication. You no longer need to copy data just to use it with another engine or to break down silos so you can analyze the data with data from other sources.
Further Guides
Signup for the Microsoft Fabric Global AI Hack, a virtual event where you can learn, experiment, and hack together with the new Copilot and AI features in Microsoft Fabric!
Sign up for the Fabric Cloud Skills Challenge at https://aka.ms/fabric30dtli and complete all the modules to become eligible for a 50% discount on the DP-600 exam.
Learn how to use copilot in Microsoft Fabric, your data insights AI assistant.
Join the Fabric Community to stay updated on the latest About Microsoft Fabric
Consider joining the Fabric Career Hub so you won’t miss out on any Careers in Microsoft Fabric
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.