You can install blobfuse from the Linux Software Repository for Microsoft products. The process is explained in the blobfuse installation page. Alternatively, you can clone this repository, install the dependencies (fuse, libcurl, gcrypt and GnuTLS) and build from source code. See details in the wiki and the GitHub Repo .Blobfuse and Data Science Virtual Machine
Blobfuse is already installed on the
. To use it, create a configuration file /opt/blobfuse.cfg as described
Once you have installed blobfuse, configure your account credentials either in the template provided in blobfuse folder (connection.cfg), or in the environment variables. For brevity, let's use the environment variables:
Then mount your blob storage on the VM:
Use of a high performance disk, or ramdisk for the local cache is recommended. In Azure VMs, this is the ephemeral disk which is mounted on /mnt in Ubuntu, and /mnt/resource in RHEL. Please make sure that your user has write access to this location. If not, create and
to your user.
sudo mkdir /images
sudo mkdir /mnt/blobfusecache
chown -R <your-user-account> /images
chown -R <your-user-account> /mnt/blobfusecache/
Create your mountpoint (
) and mount a Blob container (must already exist) with blobfuse:
blobfuse /images --tmp-path=/mnt/blobfusecache -o big_writes -o max_read=131072 -o max_write=131072 -o attr_timeout=240 -o fsname=blobfuse -o entry_timeout=240 -o negative_timeout=120 --config-file=/opt/blobfuse.cfg
NOTE Use absolute paths for directory paths in the command. Relative, and shortcut paths (~/) do not work. Blobfuse does not support multiple writers to a single blob, so you will need to guarantee that the file names generated during the extraction part are unique.
For more information, see the wikiInterested in Data Engineering
Check out the Data Engineering learning resources at Microsoft learn
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.