When it comes to managing your Azure SQL Managed Instances and ensuring their reliability and performance, understanding planned maintenance events is important. In this blog post, we will demystify how maintenance works, what to expect during the maintenance process, and some of the considerations and best practices associated with them.
Across Azure services, a unified process is followed to ensure that changes are carefully introduced and validated before reaching the production stage. This approach revolves around the following key principles:
To automate change deployment while adhering to the principles above, Azure follows the Safe Deployment Practice (SDP) framework. This applies to all Azure SQL Managed Instance logical upgrades (all that are SQL related). The SDP framework ensures that all code and configuration changes progress through specific stages, monitored by health metrics. Automated actions and alerts are triggered if any degradation is detected, ensuring timely responses to potential issues.
The deployment journey begins with developers modifying their code and testing it on their systems. The code then moves to staging environments, where interaction between different components is tested. Azure's integration environment comes into play here, as it is dedicated to testing interactions between specific Azure services.
The subsequent stages include:
Azure maintenance windows are designated periods of time during which Microsoft performs routine maintenance tasks on its infrastructure. These tasks can include updates to the underlying infrastructure, software patches, or other improvements aimed at enhancing security, performance, or stability.
Azure's maintenance windows are the key to orchestrating smooth updates and patches across its vast infrastructure. These windows are carefully designed time frames during which updates are applied to the regions where your instance is deployed. Once the code reaches the region where your instance is deployed, the maintenance windows ensure that the slots when these deployments occur are narrowed. By strategically narrowing down these slots, Azure aims to minimize potential disruptions caused by maintenance activities.
During maintenance windows, Azure services are fully online, but might experience transient faults such as brief network interruptions or temporary loss of connectivity to a database. However, these transient faults can be eliminated with proper retry logic in place at the application level. Retry logic is designed to handle these transient faults by automatically retrying failed operations.
Currently, Azure SQL Managed Instance offers three distinct maintenance window options, each tailored to different needs:
Default Window: This window spans from 5 PM to 8 AM of the next day, encompassing all days of the week. While it does not mean that every day within this range will see maintenance events, it signifies that these days are candidates for deployments.
Weekday Slot: Operating from Monday to Thursday, this window runs from 10 PM in the evening until 6 AM the next morning. The narrower time span allows for focused maintenance activities, while still ensuring your Azure SQL Managed Instances are online during most of the window.
Weekend Slot: Extending from Friday to Sunday, the weekend slot provides an extended period for maintenance activities. Like the other slots, instances remain operational during this window, with potential failovers having limited impact.
Note: Local time Zone is determined by the location of Azure region that hosts the resource and may observe daylight saving time in accordance with local time zone definition. It is not determined by the time zone configured on the managed instance.
Once the maintenance window selection is made and service configuration completed, planned maintenance will occur only during the window of your choice. As the window opens and if maintenance has been scheduled for the underlying services (Maintenance does not happen in every window) Azure's deployment process begins. The virtual machines hosting the managed instances go through a rolling upgrade process where the process involves updating individual virtual machines one by one, rather than updating the entire system at once. If all the necessary Virtual machines or hosts are patched within the window, the event concludes without a hitch. However, if there are outstanding updates, the event might be extended to the next day or week, based on the maintenance window configuration.
The maintenance event may contain updates for hardware, firmware, operating system, satellite software components, or the SQL database engine. They are typically combined into a single batch to minimize the incidence of maintenance events. In the case of SQL Managed Instance, updates are combined in two batches, one focused on physical infrastructure, and another one focused on SQL engine and logical infrastructure.
Throughout the planned maintenance event, resources within your Azure SQL Managed instance environment remain accessible, ensuring minimal disruption to your services. This means that, for the most part, your database remains operational and responsive. Towards the end of the maintenance event, there is a brief reconfiguration period that occurs. During this time, some changes are applied to your database resources. However, this period is intentionally kept extremely short, typically lasting less than 8 seconds. The brevity of this reconfiguration minimizes the impact on your application's availability.
If your application is actively engaged in a long-running process (for example - a long running query in a database) when the reconfiguration occurs, it may need to reestablish its connection to the database. This is akin to what happens in an on-premises scenario when a primary database fails over to a secondary one. Having robust retry logic in your application is crucial during planned maintenance events. This logic should be programmed to handle temporary connection interruptions gracefully. If a connection is lost, the application should automatically attempt to reconnect. A well-implemented retry mechanism ensures that your application can continue its operations seamlessly after a brief interruption.
Azure SQL Managed Instance failover for high availability refers to the process and mechanisms Azure employs to ensure that your SQL Managed Instances remain highly available, especially in the face of potential disruptions such as hardware or software failures, maintenance activities, or other unexpected events. Azure SQL Managed Instance automatically handles failover in the background. In the event of a failure, it automatically switches to a standby replica to ensure minimal disruption to services.
Failovers play a crucial role in maintaining service availability during maintenance windows. If a host or VM with an Azure SQL managed instance primary replica is being patched, a failover can swiftly transfer operations to ensure continuity. These failovers typically occur one to two times during a maintenance slot, lasting around 8 seconds.
For Azure SQL Managed Instances in the business-critical tier, failovers are even faster due to the optimized setup of Always On Availability Groups and local storage.
If one is implementing disaster recovery through the configuration of auto-failover groups in Azure SQL Managed instance, we recommend that you replicate workloads across regional pairs to benefit from Azure’s isolation and availability policies. Also, failover groups in paired regions have better performance compared to unpaired regions. Azure paired regions are guaranteed not to be deployed to at the same time. However, it is not possible to predict which region will be upgraded first, so the order of deployment is not guaranteed. Sometimes, your geo-primary instance will be upgraded first, and sometimes it will be the geo-secondary.
In situations where your Azure SQL managed instance has auto-failover groups, and the groups are not aligned with the Azure region pairing, you should select different maintenance window schedules for your primary and secondary database. For example, you can select Weekday maintenance window for your geo-secondary database and Weekend maintenance window for your geo-primary database.
Although rare, failures or interruptions during a maintenance event can occur. In case of a failure, changes are rolled back, and the maintenance will be rescheduled to another time.
You can opt in to receive notification 24 hours prior to the maintenance event, immediately before maintenance starts, and when the maintenance window is completed. The Resource health center can be checked for more information. To receive emails, advance notifications must be configured. For more information, see Advance notifications.
As of this writing, advanced notifications for maintenance windows are in public preview for Azure SQL Managed Instance.
By understanding what to expect during a planned maintenance event and having appropriate measures in place, such as retry logic, error handling and selecting the proper connectivity mode you can help ensure a smooth transition through these events and maintain a high level of service availability for your Azure SQL Managed Instance.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.