Jupyter provides the basis of the Azure Notebooks user experience. There are many ways to get your data in your notebooks ranging from using curl or leveraging the Azure package to access a variety of data all while working from a Jupyter Notebook.
Here are some of the most popular ways
curl
to retrieve a file from GitHub
We can call bash commands by starting our line with a
!
. In this way we can just curl a file down from the internet, like this csv about oil prices.
In [1]:
!curl https://raw.githubusercontent.com/petroleum101/figures/db46e7f48b8aab67a0dfe31696f6071fb7a84f1e/oil_... -o oil_price.csvThen, if we wanted to do something with it, we might choose to load it into pandas.
In [2]:
import pandasOut[2]:
Interacting with Azure Blobs
We can also use Azure Storage to store our data. It also makes it pretty straightforward to keep our data private or public. The below code shows using private keys first. Then, in the shared access section a shared access signature for read-only access is created.
Before we can do anything though, we need an Azure Storage Account. Read the documentation article on creating storage accounts or create a storage account using the Azure SDK .
You can put content into blobs using AzCopy or by using the Python Azure SDK as shown in the example below.
Once you retrieve your account and key, you can enter them below. This code will create a container and blob in the azure storage you provide. Then we will read that blob back.
Within your Jupyter Notebook you now need to define the connection parameters, So in a code block create the following and take the details from your Azure Account.
Example of Notebooks setup
So Code Block is where we define the connection
blob_account_name = "" # fill in your blob account name
blob_account_key = "" # fill in your blob account key
mycontainer = "" # fill in the container name
myblobname = "" # fill in the blob name
mydatafile = "" # fill in the output file name
The Azure storage account provides a unique namespace to store and access your Azure Storage data objects. All objects in a storage account are billed together as a group. By default, the data in your account is available only to you, the account owner.
There are two types of storage accounts:
In a new code block create your connection and query strings
import os # import OS dependant functionality
import pandas as pd #import data analysis library required
from azure.storage.blob import BlobServicedirname = os.getcwd()
blob_service = BlobService(account_name=blob_account_name,
account_key=blob_account_key)blob_service.get_blob_to_path(mycontainer, myblobname, mydatafile)
mydata = pd.read_csv(mydatafile, header = 0)
os.remove(os.path.join(dirname, mydatafile))
print(mydata.shape)
Another way is as follows
In [3]:
azure_storage_account_name = NoneIn [4]:
!pip install azure-storage==0.32.0In [6]:
from azure.storage.blob import BlockBlobServiceAzure Table Storage can be used in much the same way as Blob Storage. Below you will find creating a table in a storage account, adding rows, removing rows, and querying for data.
In [7]:
from azure.storage.table import TableServiceSometimes you want to share your data but you don't want to give them the ability to edit the dataset. Shared Access Signatures allow you to share your data and provide whatever level of control you want to the receiver. A common use case is to provide read only access to a user so they can read your data but not edit it.
Below, we create a shared access signature for our table (this also works with blobs) with read permissions. We show that we can read the table but we show that we can't write. With tables you also need to provide permission to query.
Creating a Shared Access Signature ¶In [8]:
from azure.storage.blob.models import BlobPermissionsOut[8]:
'se=2016-05-20T18%3A25%3A13Z&sig=rskxaKrEtnWcvVzfjW2rdofv5gWV9NVLgixH6HbkrK4%3D&sp=r&sv=2015-07-08&sr=b'Using a Shared Access Signature ¶In [9]:
# Create a service and use the SASOut[9]:
'your text file content would go here'Cleaning up our blobs and tablesIn [10]:
# Finally, let's clean up the resources created.Out[10]:
TrueUsing SQLWith the assistance of the pyodbc library we can access our SQL Servers in Microsoft Azure. To create a SQL Server you can see the documentation for Creating and Using Azure SQL Documentation .
In [ ]:
!pip install pyodbcIn [ ]:
import pyodbcIn [ ]:
#PYMSSQL --> NOTE the connection parameter settings for pymssql are different from pyodbc above.You can download files from OneDrive by viewing the file in the web UI. You can get the download id and authkey from viewing the 'embed' code. Change the link slightly to go to 'download' instead of 'embed' and use the requests library.
The HTML provided by embed contains a source link that just needs to be changed to use /download instead of /embed.
<iframe src="https://onedrive.live.com/embed?cid=72087E967DE94E66&resid=72087E967DE94E66%21107&authkey=AB..." width="98" height="120" frameborder="0" scrolling="no"></iframe>
https://onedrive.live.com/download?cid=72087E967DE94E66&resid=72087E967DE94E66%21107&authkey=AB1cjNa...
You can then use
requests
or
curl
to get the file.
In [ ]:
# Option 1: curlIn [9]:
# Option 2: requests in Python 3You can download files from Dropbox by clicking the 'Share' button in the Dropbox UI and get a link.
You can use that link to download the file using
curl
. or upload the file using Data –> Choose from Dropbox
Or use Curl
In [ ]:
!curl "https://www.dropbox.com/s/lvn3qoz8o03a5a1/Python-3-vs-Python-2-Converging.png?dl=0" -L -o Py3-vs-Py2.pngTo do this simply select Data –> Upload and upload the necessary data files to the Notebook
Other Resources
Copy Wizard for Azure Data Factory
Using External Storage Data https://blogs.msdn.microsoft.com/uk_faculty_connection/2017/07/20/using-external-data-with-azur...
Using Local Data https://blogs.msdn.microsoft.com/uk_faculty_connection/2017/07/04/how-to-implement-the-backprop...
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.