What is pros and cons of CycleCloud vs Azure Batch when it comes to HPC?

Copper Contributor

Hi everyone,

As you know there are some different ways to run a HPC on Azure.

Azure Batch and CycleCloud are 2 good option.

 

But what makes them different from each other? 

To put it another way, what's better and when it's better?

6 Replies

Hi @Khoi Thinh - does this image help?


CycleCloud creates HPC clusters that have third party industry standard schedulers included (E.g. Slurm or LSF cluster).  It’s mostly aimed at traditional Linux HPC admins. Batch is mostly aimed at developers, folks building a capability into their own product or service, and it includes its own scheduler to run jobs. Both Batch and CycleCloud create sets of VMs to run your application, but use different schedulers and interfaces. 

 

299D08F1-5CA7-4BBB-9B81-6528FFA761CC.jpeg

Thanks @karlpodesta 

Can you give me reference for original of the image.

Further, how do the costs vary between Cycle & Batch (for sporadic use of say 10 HPC nodes)?

m

I see also a deeper dive is at
https://learn.microsoft.com/en-us/azure/architecture/topics/high-performance-computing
I'm still interested to hear regarding relative costing (I can give a more detailed example if you wish). m
The original image came from some slide decks we use for customer conversations. For relative costing, if you can provide more detail that would be great. Batch and CycleCloud are both free to use, you only pay for the compute/networking/storage.

I will claim some personal responsibility for the slide - I made it :) (that includes any inaccuracies!). Originally made for an internal Microsoft training, but also now used in slide decks for external presentations as Andy said.

To add a bit more detail on a cost example - while both products are free to use - CycleCloud uses an Azure VM to host it's web/db service, which is an additional cost to the HPC clusters you make (e.g. ~$150/month if left on all the time). You can choose a small VM, and you can turn it off when you're not using it (much like the HPC clusters themselves!), or hugely discount it over 1 or 3 years.

Cost is a big topic. We'll write & share more in future! You can always play with the Azure Cost Calculator - and I made a sample HPC cluster example for 10 nodes (with our latest HBv4 machines - i.e. 1760 cores, 6 TiB RAM, 36 TB disk) - for 100 hours = ~$7238. BUT: it's not just about the cores/RAM/disk - it's about how fast you can get your simulation to run (time is also money) with the right configuration - and also about the value this creates for your business or scenario.  This might be the fastest/cheapest cluster for a large CFD job, but not for a genomics pipeline.  This was also done assuming pay-as-you-go pricing without any reservations, spot (90% cheaper!!) or other discounts. https://azure.com/e/6e1736448e7f43c5866d5c688252149b

One large and overlooked cost is that cycle cloud encourages use of disk even after simulation is complete which can be a huge ongoing cost even if the compute nodes are done.
Batch forces one to think about moving the simulation data to cheaper storage blobs and then mounting or copying when required to disk