Blobfuse uses the libfuse open source library to communicate with the Linux FUSE kernel module, and implements the filesystem operations using the Azure Storage Blob REST APIs.
FeaturesYou can install blobfuse from the Linux Software Repository for Microsoft products. The process is explained in the blobfuse installation page. Alternatively, you can clone this repository, install the dependencies (fuse, libcurl, gcrypt and GnuTLS) and build from source code. See details in the wiki and the GitHub Repo .
Blobfuse and Data Science Virtual Machine
Blobfuse is already installed on the
Ubuntu DSVM
. To use it, create a configuration file /opt/blobfuse.cfg as described
https://docs.microsoft.com/en-us/azure/storage/blobs/storage-how-to-mount-container-linux
or
https://github.com/Azure/azure-storage-fuse/tree/43e82df5d85a4c082dc67af8131bcf05f4d9270a
Once you have installed blobfuse, configure your account credentials either in the template provided in blobfuse folder (connection.cfg), or in the environment variables. For brevity, let's use the environment variables:
export AZURE_STORAGE_ACCOUNT=myaccountname
export AZURE_STORAGE_ACCESS_KEY=myaccountkey
Then mount your blob storage on the VM:
Use of a high performance disk, or ramdisk for the local cache is recommended. In Azure VMs, this is the ephemeral disk which is mounted on /mnt in Ubuntu, and /mnt/resource in RHEL. Please make sure that your user has write access to this location. If not, create and
chown
to your user.
sudo mkdir /images
sudo mkdir /mnt/blobfusecache
chown -R <your-user-account> /images
chown -R <your-user-account> /mnt/blobfusecache/
Create your mountpoint (
mkdir /path/to/mount
) and mount a Blob container (must already exist) with blobfuse:
blobfuse /images --tmp-path=/mnt/blobfusecache -o big_writes -o max_read=131072 -o max_write=131072 -o attr_timeout=240 -o fsname=blobfuse -o entry_timeout=240 -o negative_timeout=120 --config-file=/opt/blobfuse.cfg
NOTE Use absolute paths for directory paths in the command. Relative, and shortcut paths (~/) do not work. Blobfuse does not support multiple writers to a single blob, so you will need to guarantee that the file names generated during the extraction part are unique.
For more information, see the wiki
Interested in Data EngineeringCheck out the Data Engineering learning resources at Microsoft learn
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.