OneLake for all your data in Microsoft Fabric, let's create one!
Published Sep 07 2023 04:09 AM 4,531 Views
Microsoft

Fabric OneLake, a single unified SaaS data lake

 

With OneLake, you can get a data lake as a service rather than having to develop one yourself. You've had "OneDrive" for all of your documents for a long time. For all of your data, you currently have "OneLake." OneLake gives your entire company access to a single data lake. 

This means that you will always have exactly one OneLake for each Fabric tenant. Never zero or two. No infrastructure needs to be set up or maintained. 

 
 

OluwaseunOyero_5-1694046029405.png

 

One particular advantage of a SaaS service is the idea of a tenant. Through its use, we are able to automatically create a single management and governance boundary for a complete organization that is ultimately under the control of a tenant admin. Any data entering OneLake will automatically participate in out-of-the-box data governance, including data linage, data protection, certification, catalog integration, etc. The admin establishes this initial boundary. A tenant admin is ultimately in charge of all data. Different corporate groups must, however, be able to operate autonomously without involving a central gatekeeper.   

 

Through workplaces, OneLake makes distributed ownership possible. The organization's various departments can operate independently while still contributing to the same data lake thanks to various workspaces. Each workspace is capable of having a separate administrator, access control, region, and billing capacity. Setting up a workstation is pretty simple. It inherits the tenant admin's rules, so there is no need to implement the same governance again or waste time attempting to get various resources to communicate with one another.   

 

Data in Onelake can logically span the world

You may believe that your company cannot have a single lake since you operate in several different countries and have laws requiring that data be stored there. OneLake addresses this by covering the entire world. Regions can contain a variety of workspaces. This implies that all data in those workspaces will be stored there as well. Azure Data Lake Store gen2 serves as the foundation for OneLake. Under the hood, it might employ a number of storage accounts in several locations, but OneLake will virtualize them into a single logical lake. 

 

Open access to data in OneLake 

Tenants will appear as one big storage account with different workspaces appearing as different containers with data organized into folders. OneLake is compatible with existing ADLS applications by supporting the ADLS Gen2 DFS APIs and SDKs. 

 

The core of every Fabric data item 

Most Fabric data items are prewired to store their data in OneLake using open file formats, and all data in OneLake is included as part of a Fabric data item. Fabric introduces a number of new data items, each with experiences that are customized for certain personas. For instance, a Lakehouse for data engineers and a completely transactional data warehouse for T-SQL developers. For someone used to working with storage today, the lakehouse offers the closest experience to a lake, but it also offers so much more. Whichever item you choose to start with will all keep your data in OneLake similarly to how Word, Excel and PowerPoint saves in OneDrive. You won't find data items and workspaces if you truly look at how this data is kept in OneLake. Similar to what you might see in a data lake today, you will see files and folders. Every workspace will be a folder, as will every piece of data. Any tabular information is kept in delta lake format. 

 

Create a shortcut to connect your data 

Shortcuts let you connect data across business domains without data movement. Your company may simply transfer data between users and applications with the help of shortcuts, eliminating the need to move and duplicate data. Shortcuts make it possible to mix data from various business groups and domains into a virtual data product to suit a user's particular needs when teams collaborate independently in separate workspaces. A shortcut is a pointer to information kept in different file locations. These file locations can be in OneLake or outside of OneLake in ADLS or S3, within the same workplace or across various workspaces. The reference, regardless of the location, gives the impression that the files and folders are locally stored. 

 

One Security 

Data can be protected at the workspace or item level. When a user accesses a warehouse through OneLake, for instance, they can either see the entire warehouse or none of it. Once the data is secured you can use wherever and only users with access to all of the data for that warehouse can have direct access to the item in the lake. It is possible to define additional engine-specific security and data in OneLake is secured at the item or workspace level.

 

One copy of data 

OneLake aims to provide you with the most benefit from a single copy of data while preventing data transfer or duplication. You won't need to duplicate data in order to use it with another engine or to dismantle data silos in order to combine it with other data for analysis. 

 

OneLake for all domains 

A domain is a means to logically group all the information in a company that is pertinent to a particular region or field. Domain administrators and contributors, who create the domains, can logically organize workspaces together within a domain. 

The management barrier that domains establish between a workspace and its tenant allows domain administrators to have more detailed control over a variety of workspaces. Different business groups can now function freely inside the same data lake without worrying about managing various storage resources.

 

Let's Create a lakehouse 

  1. Sign in to Microsoft Fabric. 
  2. Switch to the Data Engineering experience using the experience switcher icon at the left corner of your homepage. 
  3. Select Workspaces from the left-hand menu. 
  4. To open your workspace, enter its name in the search textbox located at the top and select it from the search results. 
  5. In the upper left corner of the workspace home page, select New and then choose Lakehouse. 
  6. Give your lakehouse a name and select Create.  OluwaseunOyero_2-1694052175852.png
  7. A new lakehouse is created and if this is your first OneLake item, OneLake is provisioned behind the scenes. At this point, you have a lakehouse running on top of OneLake. Next, add some data and start organizing your lake. OluwaseunOyero_9-1694047957306.png

     

    Load data to a lakehouse 
  8. In the file browser on the left, select Files and then select New subfolder. Name your subfolder and select Create.   OluwaseunOyero_10-1694048129390.png

     

  9. You can repeat this step to add more subfolders as needed. 
  10. Select a folder and the select Upload files from the list. 
  11. Choose the file you want from your local machine and then select Upload.     OluwaseunOyero_3-1694052333882.png

     

  12. You’ve now added data to OneLake. To add data in bulk or schedule data loads into OneLake, use the Get data button to create pipelines. See more details about options for getting data here.
  13. Select the More icon () for the file you uploaded and select Properties from the menu. 
  14. The Properties screen shows the various details for the file, including the URL and Azure Blob File System (ABFS) path for use with Notebooks. You can copy the ABFS into a Fabric Notebook to query the data using Spark. To learn more about notebooks in Fabric, see Explore the data in your Lakehouse with a notebook. 

Microsoft Fabric is in preview as of today and you can try it out for your organization with 60 days free trial.

 

Thanks for reading!

 

 

 

 

 

Co-Authors
Version history
Last update:
‎Jan 26 2024 10:31 AM
Updated by: