azure service fabric
51 TopicsHow Do You Handle Multiple Server Certificate Thumbprints in Azure Service Fabric Managed Clusters?
Hi everyone, I wanted to share a common challenge we’ve encountered in DevOps pipelines when working with Azure Service Fabric Managed Clusters (SFMC) — and open it up for discussion to hear how others are handling it. 🔍 The Issue When retrieving the cluster certificate thumbprints using PowerShell: (Get-AzResource -ResourceId "/subscriptions/<SUBSCRIPTION_ID>/resourceGroups/<RG_NAME>/providers/Microsoft.ServiceFabric/managedclusters/<CLUSTER_NAME>").Properties.clusterCertificateThumbprints …it often returns multiple thumbprints. This typically happens due to certificate renewals or rollovers. Including all of them in your DevOps configuration isn’t practical. ✅ What Worked for Us We’ve had success using the last thumbprint in the list, assuming it’s the most recently active certificate: (Get-AzResource -ResourceId "/subscriptions/<SUBSCRIPTION_ID>/resourceGroups/<RG_NAME>/providers/Microsoft.ServiceFabric/managedclusters/<CLUSTER_NAME>").Properties.clusterCertificateThumbprints | Select-Object -Last 1 This approach has helped us maintain stable and secure connections in our pipelines. 🔍 Solution 2: Get current Server Certificate You can also verify the active certificate using OpenSSL: openssl s_client -connect <MyCluster>.<REGION>.cloudapp.azure.com:19080 -servername <MyCluster>.<REGION>.cloudapp.azure.com | openssl x509 -noout -fingerprint -sha1 🛠️ Tip for New Deployments If you're deploying a new SFMC, consider setting the following property in your ARM or Bicep template: "autoGeneratedDomainNameLabelScope": "ResourceGroupReuse" This ensures the domain name is reused within the resource group, which helps reduce certificate churn and keeps the thumbprint list clean and manageable. ⚠️ Note: This setting only applies during initial deployment and cannot be retroactively applied to existing clusters.59Views0likes0CommentsGuidance for Certificate Use in CI/CD Pipelines for Service Fabric
In non-interactive CI/CD scenarios where certificates are used to authenticate with Azure Service Fabric, consider the following best practices: Use Admin Certificates Instead of Cluster Certificates Cluster certificates are used for node-to-node and cluster-level authentication and are highly privileged. For CI/CD pipelines, prefer using a dedicated Admin client certificate: Grants administrative access only at the client level. Limits the blast radius in case of exposure. Easier to rotate or revoke without impacting cluster internals. Best practices to protect your service fabric certificates: - Provision a dedicated Service Fabric Admin certificate specifically for the CI/CD pipeline instead of cluster certificate. This certificate should not be reused across other services or users. - Restrict access to this certificate strictly to the pipeline environment. It should never be distributed beyond what is necessary. - Secure the pipeline itself, as it is part of the cluster’s supply chain and a high-value target for attackers. - Implement telemetry and monitoring to detect potential exposure—such as unauthorized access to the CI/CD machine or unexpected distribution of the certificate. - Establish a revocation and rotation plan to quickly respond if the certificate is compromised.60Views0likes0CommentsInstalling AzureMonitoringAgent and linking it to your Log Analytics Workspace
The current Service Fabric clusters are currently equipped with the MicrosoftMonitoringAgent (MMA) as the default installation. However, it is essential to note that MMA will be deprecated in August 2024, for more details refer- We're retiring the Log Analytics agent in Azure Monitor on 31 August 2024 | Azure updates | Microsoft Azure. Therefore, if you are currently utilizing MMA, it is imperative to initiate the migration process to AzureMonitoringAgent (AMA). Installation and Linking of AzureMonitoringAgent to a Log Analytics Workspace: Create a Log Analytics Workspace (if not already established): Access the Azure portal and search for "Log Analytics Workspace." Proceed to create a new Log Analytics Workspace. Ensure that you select the identical resource group and geographical region where your cluster is located. Detailed explanation: Create Log Analytics workspaces - Azure Monitor | Microsoft Learn Create Data Collection Rules: Access the Azure portal and search for "Data Collection Rules (DCR)”. Select the same resource group and region as of your cluster. In Platform type, select the type of instance you have like Windows, Linux or both. You can leave data collection endpoint as blank. In the resources section, add the Virtual machine Scale Set (VMSS) resource which is attached to the Service fabric cluster. In the "Collect and deliver" section, click on Add data source and add both Performance Counters and Windows Event Logs one by one. Choose the destination for both the data sources as Azure Monitor Logs and in the Account or namespace dropdown, select the name of the Log Analytics workspace that we have created in step 1 and click on Add data source. Next click on review and create. Note: - For more detailed explanation on how to create DCR and various ways of creating it, you can follow - Collect events and performance counters from virtual machines with Azure Monitor Agent - Azure Monitor | Microsoft Learn Adding the VMSS instances resource with DCR: Once the DCR is created, in the left panel, click on Resources. Check if you can see the VMSS resource that we have added while creating DCR or not. If not then, click on "Add" and navigate to the VMSS attached to service fabric cluster and click on Apply. Refresh the resources tab to see whether you can see VMSS in the resources section or not. If not, try adding a couple of times if needed. Querying Logs and Verifying AzureMonitoringAgent Setup: Please allow for 10-15 minutes waiting period before proceeding. After this time has elapsed, navigate to your Log Analytics workspace, and access the 'Logs' section by scrolling through the left panel. Run your queries to see the logs. For example, query to check the heartbeat of all instances:Heartbeat | where Category contains "Azure Monitor Agent" | where OSType contains "Windows" You will see the logs there in the bottom panel as shown in the above screenshot. Also, you can modify the query as per your requirement. For more details related to Log Analytics queries, you can refer- Log Analytics tutorial - Azure Monitor | Microsoft Learn Perform the uninstallation of the MicrosoftMonitoringAgent (MMA): Once you have verified that the logs are getting generated, you can go to Virtual Machine Scale Set and then to the "Extensions + applications" section and delete the old MMA extension from VMSS.5.5KViews4likes2CommentsPreserve Disk space in ImageStore for Service Fabric Managed Clusters
As mentioned in this article: Service Fabric: Best Practices to preserve disk space in Image Store ImageStore keeps copied packages and provisioned packages. In this article, we will discuss how can you configure cleaning up the copied application package for Service Fabric Managed Cluster 'SFMC'. The mitigation is to set "AllowRuntimeCleanupUnusedApplicationTypesPolicy": "true". For properties, specify the following tags: ... "applicationTypeVersionsCleanupPolicy": { "maxUnusedVersionsToKeep": 3 } Let me show you a step-by-step guide to automatically remove the unwanted application versions in your Service Fabric Managed cluster below: Scenario: - I have deployed 4 versions of my app (1 InUse - 3 UnUsed) to my managed cluster as the following: Symptom: I need Service Fabric to do automatic cleanup for the Application Unused Versions and keep only the last 3 so as not to fill the disk space. Mitigation Steps: From https://resources.azure.com/ open your managed cluster resource and open the read/write mode. Add the tag "AllowRuntimeCleanupUnusedApplicationTypesPolicy": "true" Under fabricsettings, add the param "name": "CleanupUnusedApplicationTypes", "value": "true" under fabricsettings and set the "maxUnusedVersionsToKeep": 3 Click on PUT to save the changes, and I deployed the 5th version (1.0.4) to the cluster, which should make the cleaning happens for the oldest version (1.0.0) Note: The automatic clean-up should be effective after 24 hours of making those changes. Then I tried to deploy a new version, and I could see that the oldest version was also cleaned up. For manual cleanup of the ImageStoreService: You can use PowerShell commands to delete copied packages and unregister application types as needed. This includes using Get-ServiceFabricImageStoreContent to retrieve content and Remove-ServiceFabricApplicationPackage to delete it, as well as Unregister-ServiceFabricApplicationType to remove application packages from the image store and image cache on nodes.2.1KViews0likes0CommentsNot enough disk space issue in Service Fabric cluster
Time by time, when you use Service Fabric cluster, the cluster may meet different issues and in the reported error/warning message, it’s marked that one or some specific nodes do not have enough disk space. This may be caused by different reasons. This blog will talk about the common solution of this issue. Possible root causes: There will be lots of possible root causes for the not enough disk space issue. In this blog, we'll mainly talk about following five: Diagnostic log files (.trace and .etl) consumes too much space Paging file consumes too much space Too many application packages existed in node Too many registered versions of application type Too many images existed (Only for cluster used with container) To better identify which one is matching your own scenario, please kindly check the following description of them: For the log files, we can RDP into the node reporting not enough disk space and check the size of the folder D:\SvcFab\Log. If the size of this folder is bigger than expected, then we can try to reconfigure the cluster to decrease the size limit of the diagnostic log files. For the paging files, it's a built-in feature of Windows system. For detailed introduction, please check this document. To verify if we got this issue, we can RDP into the node and check whether we can find the hidden file D:\pagefile.sys. If we can find it, that means your service fabric cluster is consuming some disk space as RAM. We can consider about configuring the Paging file to be saved in disk C instead of disk D. For too many application packages existed in node which consume too much disk space, we can verify it in Service Fabric Explorer (SFX). By visiting SFX from Azure Portal Service Fabric overview page, we can turn to the Image Store page of the cluster and verify whether there is any record with name different from Store and WindowsFabricStore. If yes, then please click on Load Size button to check its size. Similar to point 3, for too many registered versions of application type, we can check same page, but pay attention to size of Store and see if it consumes lots of disk space. When a version of application type is registered in SF cluster, SF will save some files used for deploy the services included by this new version into each node. More versions are registered, more disk space it will consume. For the too many images existed cause, this will only happen when our service fabric cluster is running with Container feature. We can RDP into the node and use command docker image ls to list all used images on this node. If there are some images which were used before but not removed/pruned even after no longer being used, it will consume a lot of disk space since the image file is normally with a huge size. For example, the size of image for windows server core is more than 10 GB. Possible solutions: Then let's talk about the solutions of the above four kinds of issue. 1. To reconfigure the size limit of the diagnostic log files, we need to open a PowerShell command window with Az module installed. Please refer to the official document for how to install. After login successfully, we can use the following command to set the expected size limit. Set-AzServiceFabricSetting -ResourceGroupName SF-normal -Name sfhttpjerry -Section Diagnostics -Parameter MaxDiskQuotaInMB -Value 25600 Please remember to replace the resource group name, service fabric cluster name and the number for size limit by yourself before running command. Once this command is run successfully, it may not take effect immediately. The Service Fabric cluster will scan size of the diagnostic log periodically. We need to wait until next scan is triggered. Once it's triggered and if the size of the diagnostic log files are bigger than your setting number (25600MB = 20 GB in my example), cluster will automatically delete some log files to release more disk space. 2. To change the path of paging file, we can follow these steps to switch. Check the status of our Service Fabric cluster in Service Fabric Explorer to make sure every node, service and application is healthy. RDP into the VMSS node In the Search bar, type in "Advanced System Setting". Then choose Advanced -> Advanced -> Change -> Next is to set D drive to No Paging file and set C drive to System Managed Size. This setting change will need user to reboot the VMSS node to take effect. Please reboot the node and wait until everything is back to healthy status in Service Fabric Explorer before RDP into next node. Repeat above steps for all nodes. 3. To clean up the application package, this is easy to do in SFX. Once we visit SFX, go to the same Image Store page as how we check the issue for this kind of root cause. Then on the left side, there will be a menu to delete the unneeded package. After typing in the name in confirmation window and select Delete Image Store content, cluster will automatically delete the unneeded application on every node. 4. For the issue caused by too many registered versions of application type, we need to manually unregister the not needed versions. In the Service Fabric Explorer, we can click on the Application/Application type to see the currently existing versions. If there is any not currently used and no more needed versions, please use following command to unregister: Unregister-ServiceFabricApplicationType -ApplicationTypeName "application type name" -ApplicationTypeVersion "version number" -Force 5. For the issue caused by too many images, we can configure the cluster to automatically deleted the unused image. The detailed configuration can be found in this document and the way about how to update cluster configuration will be as following: a. Visit Azure Resource Explorer with Read/Write mode, login and find the Service Fabric cluster. b. Click Edit button and modify the json format cluster configuration as expected. In this solution, it will be to add some configuration into fabricSettings part. c. Send out the request to save the new configuration by clicking the green PUT button and wait until the provisioning status of this cluster becomes Succeeded. To make this solution work, there is one more thing which we need to do is to unregister all unnecessary and unused applications. This can also be done by the command documented here. Since the parameter ApplicationTypeName and ApplicationTypeVersion are both required for this command, that means we can only unregister one version of an application type after running the command once. But since maybe you may have very many versions and many application types, here are 2 following possible ways: If there is/are actually some versions of some application types which you want to keep it registered for future use in this cluster, please unregister those unnecessary versions by running command Unregister-ServiceFabricApplicationType -ApplicationTypeName VotingType -ApplicationTypeVersion 1.0.1 (Remember to replace the ApplicationTypeName and ApplicationTypeVersion and Use step 2.e to connect to cluster at first.) If there isn't any version of any application type which you want to keep specially, which means we only need to keep the application type being used by running application, then we can use step 2.e to connect to cluster and then run the following script: $apptypes = Get-ServiceFabricApplicationType $apps = Get-ServiceFabricApplication $using = $false foreach ($apptype in $apptypes) { $using = $false foreach ($app in $apps) { if ($apptype.ApplicationTypeName -eq $app.ApplicationTypeName -and $apptype.ApplicationTypeVersion -eq $app.ApplicationTypeVersion) { $using = $true break } } if ($using -eq $false) { Unregister-ServiceFabricApplicationType -ApplicationTypeName $apptype.ApplicationTypeName -ApplicationTypeVersion $apptype.ApplicationTypeVersion -Force } } In additional to the above four possible causes and solutions, there are three more possible solutions for the "Not enough disk space" issue. The following is the explanation. Scale out VMSS: Sometimes scaling out, which means increasing the number of nodes of Service Fabric cluster will also help us to mitigate the disk full issue. This operation will not only be useful to improve the CPU and memory usage, but will also auto-balance the distribution of the services among nodes to improve the disk usage. When using the Silver or higher durability, to scale out the VMSS instances, we can scale out the nodes number in the VMSS directly. Scale up VMSS: It is easy to understand this point. Since the issue is about the full disk, we can simply change the VM sku to a bigger size to have bigger disk space. But please kindly check all above solutions at first to make sure everything is reasonable and we do really need more disk size to handle more data. For example, if our application is with stateful services and the full disk happens due to our stateful services save too much data, then we should consider about improving the code logic but not scaling out the VMSS at first. Otherwise, with bigger VM sku, the issue will still reproduce sooner or later. To scale up the VMSS, we can following two ways: We can use the command Update-AzVmss to update the state of a VMSS. This is the simple way however the solution is not recommended, because there is a little risk of data loss/instance down. When using the Silver or higher durability, the risk can be mitigated because they support repair tasks. The second way to upgrade the size of the SF primary node type is adding the new node type with the bigger SKU. The option is much more difficult than the option one but officially recommended and you can check the document for more information. Reconfigure the ReplicatorLog size: Please be careful that the ReplicatorLog is not saving any kind of log file. It's saving some important data of both Service Fabric cluster and application. Delete this folder will possibly cause data loss. And the size of this folder is fixed to the configured size, by default 8 GB. It will always be the same size no matter how much data is saved. It's NOT recommended to modify this setting. You should only do it if you absolute have to do so. It may run the risk of data loss. This should only be done if absolutely required. For the ReplicatorLog size, as mentioned above, the key point is to add a customized ktlLogger setting into the Service Fabric cluster. To do that, we need to: a. Visit Azure Resource Explorer with Read/Write mode, login and find the Service Fabric cluster. b. Add the ktlLogger setting into fabricSettings part. The expected expression will be such as following: { "name": "KtlLogger", "parameters": [{ "name": "SharedLogSizeInMB", "value": "4096" }] } c. Send out the request to save the new configuration by clicking the green PUT button and wait until the provisioning status of this cluster becomes Succeeded. d. Visit SFX and check the status to make sure everything is in healthy state. e. Open a PowerShell command window from a computer where the cluster certificate is installed. If the Service Fabric module is not installed yet, please refer to our document to install at first. Then run following command to connect to the Service Fabric cluster. Here the thumbprint is the one of the cluster certificate and also remember to replace the cluster name by correct URL. $ClusterName= "xxx.australiaeast.cloudapp.azure.com:19000" $CertThumbprint= "7279972D160AB4C3CBxxxxx34EA2BCFDFAC2B42" Connect-serviceFabricCluster -ConnectionEndpoint $ClusterName -KeepAliveIntervalInSec 10 -X509Credential -ServerCertThumbprint $CertThumbprint -FindType FindByThumbprint -FindValue $CertThumbprint -StoreLocation CurrentUser -StoreName My f. Use command to disable one node from Service Fabric cluster. (_nodetype1_0 in example code) Disable-ServiceFabricNode -NodeName "_nodetype1_0" -Intent RemoveData -Force g. Monitor in SFX until the node included in last command is with status disabled. h. RDP into this node and manually delete the D:\SvcFab\ReplicatorLog folder. Attention! This operation will remove all logs in ReplicatorLog. Please double-confirm whether any context there is still needed before deletion. i. Use following command to enable the disabled node. Monitor until the node is with status Up. Enable-ServiceFabricNode -NodeName "_nodetype1_0" j. Wait until everything is healthy in SFX and repeat step f to step i on every node. After that, the ReplicatorLog folder in node will be with new customized size.9.9KViews3likes2CommentsAzure Logic Apps : HTTP Request OR Custom Connector
Hello, As far as I know, We use HTTP requests while consuming the First-party/third-party API, then when should we use a custom connector? What are those business cases where one should use an HTTP request in PowerAutomate and use in PowerApps Or use a custom connector and use in PowerApps and Power Automate? What are the pros and cons of HTTP Request OR Custom Connector? Thanks and Regards, -Sri739Views0likes1CommentSSL/TLS connection issue troubleshooting guide
You may experience exceptions or errors when establishing TLS connections with Azure services. Exceptions are vary dramatically depending on the client and server types. A typical ones such as "Could not create SSL/TLS secure channel." "SSL Handshake Failed", etc. In this article we will discuss common causes of TLS related issue and troubleshooting steps.40KViews9likes1Comment