Finally, you are ready to download your Kaggle dataset via the command line in the terminal. The API command to do so is available on the Kaggle dataset page itself. Click on the three dots next to New Notebook and select ‘copy API command’.
Note: This might take a while as you can see the file is approx 4GB in size
Voila…. you will see your dataset will be downloaded (as a zip file) in your current working directory onto your Azure workspace.
Alternatively, you can specify a folder where the files should be downloaded using optional arguments in the API call (for more info, see Kaggle documentationhere). For example:
kaggle datasets download -p images/train/
The following code goes in your Notebook.
import os import zipfile# name of the zip file you want to unzip local_zip = 'amex-data-integer-dtypes-parquet-format.zip'# opening a file with mode parameter 'r' : read existing file zip_ref = zipfile.ZipFile(local_zip, 'r')# extract all contents of the zip file zip_ref.extractall('')# close the file zip_ref.close()
And there you have it. All your data would be unzipped into a new folder, which will be sitting in your current working directory on Azure.