Blog Post

Educator Developer Blog
3 MIN READ

Setting up more than 18 GPU Instances on Azure using VMs or Containers

Lee_Stott's avatar
Lee_Stott
Icon for Microsoft rankMicrosoft
Mar 21, 2019
First published on MSDN on Mar 14, 2017

I have been getting a number of questions around the availability of Azure N Series GPU

at present we have two SKUs NC (GPU Compute}_ and NV (GPU Visualisation)

This blog explains the differences between the SKUs and where NC vs NV hardware instances should be used https://blogs.msdn.microsoft.com/uk_faculty_connection/2017/01/10/azure-cloud-gpu-for-datascience-and-academic-activities-such-as-cloud-rendering/

DataCenter & OS Availability

For NV machines (visualization):
OS Availability- Windows Server 2016 or Windows Server 2012 R2
Region Availability – South Central US, West EU, South East Asia

For NC machines (compute):
OS Availability- Windows Server 2016, Windows Server 2012 R2 or Ubuntu 16.04 LTS
Region Availability – South Central US, East US

Getting Started

– You should choose HDD and not SSD

– You should install the necessary GPU drivers all documentation is available here: https://docs.microsoft.com/en-us/azure/virtual-machines/virtual-machines-windows-n-series-driver-setup and https://docs.microsoft.com/en-us/azure/virtual-machines/virtual-machines-linux-n-series-driver-setup

Capacity Restriction

In regards to provisioning at present Azure GPU provision via portal.azure.com is set to a maximum of 18 Instances. If you require more Instances, the following is a quick overview of the steps you should take.

Provisioning Process.

Step 1 . You make a provisioning request on Azure via the Azure Portal http://portal.azure.com , so you simply try to create the VM.

Step 2 . You will initially get a failure as the maximum number of NC24 is 20 Instances this places your request in a hold so please don’t allocate without letting me know anything over 18 instances

The error is "Operation results in exceeding quota limits of Core. Maximum allowed: 20, Current in use: 18," when trying to allocate the VMs in South Central US

Step 3. You will receive an email within 24 hours from the provision team look at the request and generally get in touch to confirm this wasn’t a mistake as NC series are costly.

To request a quota increase, you must open a Support Request with Microsoft. Load the Microsoft Azure Portal and click the question mark icon in the upper-right corner to get started.

The email you receive may from an alias or one of the provisioning team directly but it will be titled like this

RE: [REG:XXXXXXXX (is a number)] Quota request - Cores Initial Response

Information you need to submit

If you require over 18 Instances for your courses its always good to be clear to what these are being used for

The Azure Region you wish to deploy this to
Course Name/Title.
Requiremen t Number of Instances x Type of GPU Instance i.e. 20 x NC6 server instances and specification of available services and costs are below

What will the GPU be used for

i.e GPU Cores for use within Deep Learning curriculum

Course Codes:

Course Title: i.e Data Analysis and Probabilistic Inference

Course begin date: i.e. 16/1/17

Course end date: i.e 1/4/17 Number of participating students: 120 Profile of students: 4th-year machine learning students (undergraduate)
Proposed MS Azure utilisation in support of course teaching: estimated 880 hours of teaching utilisation be mindful of the costs which will be charged to your Azure Subscriptions

So in this example you have 20 x NC6 for 880 hrs per instance so 17,600 hours (733 days) of compute at $0.90 per hr =  $15,840 charge to your Azure Subscription

Number of Students i.e. 125

What will students be doing i.e. SSH for Students Approx 3 students per NC6

The provisioning team then check with the capacity team to see if a pre auth has been given for you and for the details above they then pass/fail the request this and your machine can be provisioned.

This typically takes 2 days

Step 4. You will receive a confirmation email simply go back to the portal.azure.com and create the VMs again in the same region to the same instance size and these get deployed within 15 mins

Resources

Azure Batch Shipyard Data Science Containers https://blogs.msdn.microsoft.com/uk_faculty_connection/2017/02/13/deep-learning-using-cntk-caffe-keras-theanotorch-tensorflow-on-docker-with-microsoft-azure-batch-shipyard/

juypter notebooks Tensorflow Deep Learning https://notebooks.azure.com/library/OEdO6ybBxM4/dashboard?page=1

DataScience VM https://blogs.msdn.microsoft.com/uk_faculty_connection/2017/01/28/data-science-virtual-machines-windows-vm-update-jan-2017/

Updated Mar 21, 2019
Version 2.0
No CommentsBe the first to comment