I was recently asked to help out our friends at Channel 9 with a storage problem. They capture large video files as part of their recordings and have to have them local for fast access in order to edit them. They currently use a proprietary Linux box for NFS storage which is just about full. I have spec'ed out a Storage Spaces Direct setup on Windows 2019 from the Azure Stack HCI catalog - but it won't be in for a while. This is the first bare metal system I've had to rack and stack in a while and it's part of a longer project - so I thought it would be good to write up a short set of topical blog posts to go along with the process.
Finding the right physical server from on-premises inventory
First off - I am spoiled by running a simple Azure command to provision whatever size box I need in a matter of minutes - this is going back in time for me. Luckily, I have an old DataOn Cluster-in-a-box system from a lab that has some decent (70 TB) worth of storage to temporarily hold them over until the new system arrives.
Like a fresh coat of paint - flattening and rebuilding the servers from a USB media was kind of refreshing. In no time at all I have two nodes of a Windows Server 2019 up and running.
Installing Data Deduplication
The files they are storing includes large video files copied up to the system by various camera operators in the field. There isn't a sophisticated logging system on what file came where or if files have already been dumped up on the share. As a result - we've got duplicates. I figured one easy way of resolving this space issue is to enable deduplication. Data Deduplication in Windows Server 2019 is more efficient than previous releases - it can run in the background on multiple volumes with minimal impact on other workloads. If you want some more details on Deduplication - we've got a good article over on Microsoft Docs you should check out.
The temporary server in question has the fresh install of Windows Server 2019 with all available drives configured in a storage pool with a single volume. I really just needed to get this thing up, enable deduplication and copy over the data from the Linux box until the new AzureStack HCI configuration comes in.
The fastest way to install the DataDeduplication feature is to crack open a PowerShell window and type in Add-WindowsFeature -name FS-Data-Deduplication
Enabling and Configuring Data Deduplication on Data Drive
To enable the default settings for DataDeduplication for my E: drive- it's was simple as typing in Enable-DedupVolume E:
Because this is a temporary server - I want to configure the minimum file age to be "0" in order for the server to start deduplicating files on the next job schedule or manual kickoff. To change this parameter - type in Set-DedupVolume -Volume "E:" -MinimumFileAgeDays 0
Time to Finish Rack and Stack / Install
At this point we're all ready to go. It's a matter of transporting and installing this box into it's temporary home and transferring the files over to it from the Linux NAS box. But for this - you'll have to wait until the next post in this short series.
From start to finish - this took the better part of an afternoon to provision some bare metal, with 70 TB of storage as a temporary server. That is a tad longer then what seems like a very long 5 minutes to provision a system in Azure - but the proximity and fast access to these source files is what is important to the team at Channel 9.
All this being said - as I was going through this - I was thinking of all the value-add services from Azure I could bring to this on-prem bare metal box. Once the more permanent solution is in place - I'll go through the process of adding on a few of these Azure services to help me out: Azure File Sync, Azure Backup and a bunch of the Management add-ons to monitor the health of the system and so on.
What about you? Have you considered or started to explore any of the Hybrid Azure-to-on-prem services?