This article describes how to scale up a Service Fabric cluster primary node type with minimal downtime using PowerShell. In-place SKU upgrades are not recommended on Service Fabric cluster nodes, as such operations potentially involve data and availability loss so you can refer below method for scaling up a Service Fabric node type.
Note: We will be making use of PowerShell cmdlets for all the changes we are going to perform in the cluster, so in case you want to go with the ARM template approach to add a VMSS you can refer Scale up an Azure Service Fabric primary node type - Azure Service Fabric | Microsoft Learn. This document is not applicable to add node type which is using remote storage i.e. managed disk.
Please take care of these prerequisites before you start the process.
- The cluster is healthy.
- There will still be sufficient capacity after the node type is removed e.g., a sufficient number of nodes to place the required replica count.
- Move all services that have placement constraints to use node type of the node type.
- Modify the Application / Service Manifest to no longer reference the node type.
- Make sure you also have SF runtime & service Fabric SDK installed as well: Install-the-SDK-and-tools
- Then validate that all the services modified above are no longer running on the Node belonging to the existing node type. All the services are healthy.
- Also, this article will make use of the AZ cmdlet, therefore make sure you have the Az.servicefabric module present in your system, if not, then install it with the below cmd:
Install-Module -Name Az.servicefabric -Scope CurrentUser -Repository PSGallery -Force
- To begin with, you must connect to your Azure account before working with the rest of the cmdlets:
### Login to your Azure account
Login-AZAccount
2. Add a new node-type (NT2) to your Service Fabric cluster, backed by your upgraded (or modified) virtual machine scale set SKU and configuration. This step also set up a new load balancer, subnet, and public IP for the scale set with the default configuration.
### Create a new nodetype
$pwd = ConvertTo-SecureString -String ‘setup a password' -AsPlainText -Force
$resourceGroup = "enter resource group name"
$clusterName = "Mention name of your cluster"
$nodeTypeName = "New Nodetype name"
Add-AzServiceFabricNodeType -ResourceGroupName $resourceGroup -Name $clusterName -NodeType $nodeTypeName -Capacity 5 -VmUserName 'adminvm' -VmPassword $pwd -VmSku Standard_D2_V2 -DurabilityLevel Silver -Verbose -IsPrimaryNodeType $True
Additionally, as per your environment need you can specify the values of other parameters like VMImageOffer, VMImageSku, VMImageVersion, etc. below are the other optional parameters:
For reference: Add-AzServiceFabricNodeType (Az.ServiceFabric) | Microsoft Learn
Add-AzServiceFabricNodeType
[-ResourceGroupName] <String>
[-Name] <String>
-Capacity <Int32>
-VmUserName <String>
-VmPassword <SecureString>
[-VmSku <String>]
[-Tier <String>]
[-DurabilityLevel <DurabilityLevel>]
[-IsPrimaryNodeType <Boolean>]
[-VMImagePublisher <String>]
[-VMImageOffer <String>]
[-VMImageSku <String>]
[-VMImageVersion <String>]
-NodeType <String>
[-DefaultProfile <IAzureContextContainer>]
[-WhatIf]
[-Confirm]
[<CommonParameters>]
Now, once the new Nodetype (NT2) deployment is complete, you can check the newly added VMSS, LB & Public IP on the Azure portal under the resource group.
Make sure all the instances of VMSS & other resources are healthy from the Azure portal.
3. Next step is to set the “isprimarynodetype” parameter of the older primary (NT1) to false which will convert the original primary to non-primary nodetype:
### Update the older primary into non primary Nodetype
Update-AzServiceFabricNodeType -ResourceGroupName ‘Resource group name’ -Name ‘name of cluster’ -NodeType ‘old primary nodetype name’ -IsPrimaryNodeType $false
Cluster Upgrade will be triggered and shall take some time to complete.
4. The original (NT1) and upgraded scale set (NT2) will be running side by side. You can check the updated settings (isPrimary = True for NT2 & False for NT1) from resources.azure.com in Microsoft.ServiceFabric as shown below:
Deactivate(removedata) the nodes one by one from the original nodetype (NT1) so that the system services (or replicas of stateful services) migrate to the new scale set.
Remove Node intent: Specifies that the data on the node is to be permanently lost. This cmdlet creates copies of the replicas that run on the node on other nodes to ensure high availability. You might specify this setting when the node is being removed from the cluster.
$nodeType = ‘name of node type’
$nodes = Get-ServiceFabricNode
foreach($node in $nodes)
{
if ($node.NodeType -eq $nodeType)
{
$node.NodeName
Disable-ServiceFabricNode -Intent RemoveNode -NodeName $node.NodeName -Force
}
}
Alternatively, you can perform the same step from SFX by enabling the advance setting from the tool icon at the top right then select Deactivate (remove data) from the drop-down at the right side of the specific node.
Note:
- For bronze durability, wait for all nodes to get to a disabled state
- For silver and gold durability, some nodes will go into disabled and the rest will be in the disabling state. Check the details tab of the nodes in the disabling state, if they are all stuck on ensuring quorum for Infrastructure service partitions, then it is safe to continue.
5. While the initial deployment of the cluster, the management endpoint resolves at the Public IP of the primary node type (NT1) which was created while the first deployment, therefore before deleting the cluster's older Primary nodetype (NT1) you need to map the endpoint of the cluster to the new public IP which got created with the new Nodetype(NT2) deployment.
- On Azure Portal, go to the Public IP address of the older node type (NT1), then under settings click on configuration.
- The DNS you will see here will be the same as cluster FQDN. You can verify the IP linked to this DNS, it should resolve at the same IP as of cluster endpoint’s IP.
c. Therefore, in order to map this DNS to the new primary node type (NT2), we need to change it to some other name as it is not possible to have two DNS with the same name. As shown below & save the changes.
d. Now navigate to the configuration setting of the Public IP address of the new nodetype(NT2) to make it the same as cluster FQDN & save the changes. It will take a few minutes to update.
To verify if the changes are working as expected do a nslookup of the cluster endpoint, it should resolve at the public IP of the new nodetype.
6. Stop data for the node type. Wait till all the nodes for node type are marked Down.
foreach($node in $nodes)
{
if ($node.NodeType -eq $nodeType)
{
$node.NodeName
Start-ServiceFabricNodeTransition -Stop -OperationId (New-Guid) -NodeInstanceId $node.NodeInstanceId -NodeName $node.NodeName -StopDurationInSeconds 10000
}
}
7. Ensure the cluster and new nodes are healthy, then remove the original scale set i. N1 from the portal (and clean up its related resources like load balancer, Public IP address).
8. Now remove the node state for the deleted nodes from SFX one by one.
Or make use of the below command in case you have a large number of nodes.
$nodeType = ‘name of node type’
$nodes = Get-ServiceFabricNode
foreach($node in $nodes)
{
if ($node.NodeType -eq $nodeType)
{
$node.NodeName
Remove-ServiceFabricNodeState -NodeName $node.NodeName -Force
}
}
9. Only for Silver and higher durability clusters, update the cluster resource from Resource Explorer (azure.com) by removing the reference of old nodetype (N1) in the cluster, or else your infrastructure service will go in a warning state.
Warning on Instructure:
Changes to be done in Resource Explorer:
Go to Resource Explorer (azure.com) > Select your subscription > Enable read/write > Select the respective resource group> Go to Microsoft.Servicefabric > Edit > Remove the old nodetype section (NT1).
With this, you have upgraded the primary nodetype successfully.
Happy Learning!