First published on MSDN on Nov 09, 2018
Written by Kanchan Mehrotra, Tony Wu, and Rakesh Patil from AzureCAT. Reviewed by Solliance. Edited by Nanette Ray.
This article is also available as an eBook:
Find Part 1 and Part 2 of the this series here:
The GlusterFS file system is a free, scalable, open-source distributed file system specifically optimized for cloud storage, and it works great for applications hosted on Azure. Unlike Lustre, it does not have separate metadata servers. Instead, metadata is integrated into the file storage.
The GlusterFS architecture includes a server node and a client node in a cluster. You can designate a single node for both storage and metadata—separate servers are not required. GlusterFS stores and locates data using an elastic hash algorithm and doesn’t need any metadata servers, so I/O bottlenecks are removed, which improves performance and parallel access. Servers can be added or removed whenever required.
For our tests, we used the following node configuration:
With this cluster topology, we captured performance statistics for throughput and IOPs. The test environment used openMPI (included in CentOS 7.2) as the MPI interface.
To get started, you need at least two nodes of any Linux distribution to serve as server nodes (metadata server and storage server). GlusterFS can be installed on any Linux distribution. We used an OpenLogic CentOS 7.3 image from Azure Marketplace for our tests. We chose a virtual machine size of DS14v2 and attached 10 additional P30 data disks of 1 TB each.
sudo yum update sudo reboot sudo yum install microsoft-hyper-vNOTE: As with our Lustre deployment, we performed the GlusterFS tests without the LIS packages. Since we tested, SR-IOV drivers have become available for accelerated networking.
yum install centos-release-gluster yum install glusterfs-cli glusterfs-geo-replication glusterfs-fuse glusterfs-server glusterfs
systemctl enable glusterd.service systemctl enable glusterfsd.service systemctl start glusterd.service systemctl start glusterfsd.service systemctl status glusterfsd.service systemctl status glusterd.service
yum clean all && yum update yum install mdadm –y
ls -l /dev | grep sd
mdadm --examine /dev/sd[c-l]
mdadm -C /dev/md0 -l raid0 -n 10 /dev/sd[c-l]
cat /proc/mdstat
mdadm --examine --scan | sudo tee /etc/mdadm.conf
pvcreate --dataalignment 1024K /dev/md/md0 vgcreate --physicalextentsize 256K rhgs-data /dev/md/md0 lvcreate -L 9216g -T rhgs-data/brickpool -c 256K lvchange --zero n rhgs-data/brickpool lvcreate -V 9216g -T rhgs-data/brickpool -n brick1
mkfs.xfs -f -K -i size=512 -n size=8192 /dev/rhgs-data/brick1 mkdir -p /rhs/brick1
echo -e "/dev/rhgs-data/brick1\t/rhs/brick1\txfs\ tdefaults,inode64,nobarrier, noatime,nouuid 0 2" | sudo tee -a /etc/fstab
mount -a && mount
df –h
gluster peer probe server2Then, on the second server ("server2"), run:
gluster peer probe server1
After a trusted pool has been established, we can create a GlusterFS volume. For our testing purposes, we set up a distributed volume, but you can choose another type. Each volume type has advantages and disadvantages as follows:
To set up a distributed GlusterFS volume, execute the following commands in order:
mkdir /data/brick1/gv0
mkdir /data/brick1/gv0
gluster vol create dist-vol glusterserver1:/rhs/brick1/dist-vol glusterserver2:/rhs/brick1/dist-vol gluster volume start dist-vol
gluster volume info
gluster volume status
gluster volume add-brick dist-vol Server02:/rhs/brick1/dist-vol
gluster volume rebalance dist-vol fix-layout start
For automatic high-availability failover in our tests, we mounted the GlusterFS volume as a FUSE client, although Network File System (NFS) is also supported.
yum install glusterfs-fuse
mkdir /mnt/gluster
mount -t glusterfs server0:/dist-vol /mnt/gluster
To evaluate the performance of our GlusterFS file system, we ran tests to measure maximum throughput and IOPs just as we did with the Lustre file system. We changed various IOR test parameters, including the transfer rate, block size, number of processes, and number of client nodes. As with our Lustre tests, the performance of GlusterFS varied depending on the following:
To measure read and write throughput, we used the following IOR syntax:
mpiexec --hosts=<client_hostname> -np <n_procs> /gluster/software/ior/bin/ior -a MPIIO -v -B -F -w -t 32m -b 4G -o /mnt/gluster/test.`date +"%Y-%m-%d_%H-%M-%S"`
Where:
-v indicates verbose mode
-B indicates by passing cache
-z indicates random task ordering
-F indicates file per process
-w indicates write operation
-r indicates read operation
-t indicates transfer rate (size of transfer in bytes)
-b indicates block size
-o indicates output test file
<client_hostname> is the name of the host used for the client
<n_procs> is the number of processes
Table 1 shows the results.
Table 1. Maximum throughput results for GlusterFS
Write throughput results are shown in Figure 1. The y-axis shows maximum throughput and the x-axis shows the number of GlusterFS server nodes.
Figure 1. Write throughput results for GlusterFS showing linear performance as servers are added
Read throughput results are displayed in Figure 2. The y-axis shows maximum throughput and the x-axis shows the number of GlusterFS server nodes.
Figure 2. Read throughput results for GlusterFS showing linear performance as servers are added
To measure IOPs on Azure, we used the following syntax:
mpiexec --hosts=<client_hostname> -np <n_procs> /lustre/software/ior/bin/ior -a MPIIO -v -z -F –w -r -t 4k -b 320m -o /mnt/lustre/test.`date +"%Y-%m-%d_%H-%M-%S"`
The results are shown in Table 2.
Table 2. IOPs results for GlusterFS
The write IOPs results show a linear increase as the number of GlusterFS servers are increased—but only up to a certain point, as Figure 3 shows. We observed no real gain above 12 servers but ran out of time to try more performance tuning, which we suspect would help.
Figure 3. Write IOPs results per number of GlusterFS servers
The read IOPs performance increases as the number of GlusterFS servers is increased, as Figure 4 shows.
Figure 4. Read IOPs results per number of GlusterFS servers
GlusterFS performance was good, and we thought the overall user experience was subjectively better than Lustre. Unfortunately, the results may not be entirely reliable, and we hope to investigate further in the future. Here are our observations and conclusions:
Figure 5. Transfer endpoint error during GlusterFS testing
Thank you for reading! This article is also available as an eBook:
Azure CAT Guidance
"Hands-on solutions, with our heads in the Cloud!"
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.