This is a consistent question I receive often and although RAC isn't supported in any third-party cloud by Oracle, it's an important topic as more workloads lift and shift to Azure and there is absolutely a reason to have or not to have Real Applications Clusters, (RAC) as part of them. The only current option for RAC on Azure available is:
The goal of this post is to push past the idea that a lift and shift should always be a 1:1 move. It's important when moving to the cloud to use the right tools, not just the tools you've always used, which is a very important lesson when it comes to Oracle Real Application Cluster, (RAC).
I'm going to repeat this once more- Oracle doesn't support RAC on Azure. Flashgrid does have excellent support of their RAC solution in Azure, but the support is through Flashgrid. So lets start discussing why architecting for the cloud may be very different than architecting on-prem by dispersing with some myths.
RAC is Not High Availability
This may be an unpopular opinion by many in the Oracle world, but RAC doesn't meet many of the requirements for HA. If a solution doesn't meet even one, do you really have a solution?
- RAC does have rolling patches that eliminate some of the downtime for patching, but doesn't ensure that ALL patches are delivered in this manner. As patches are built by different teams at Oracle or didn't have enough time to build a rolling patch for RAC, there will be downtime.
- One of the biggest flaws in RAC for meeting HA requirements is by default all nodes reside in one datacenter unless an extended distance cluster has been deployed. All nodes will be in a single Availability Zone, which means if the AZ goes down, so do all the nodes and the shared storage, which means you still need to failover to a DataGuard standby for High Availability.
- RAC possesses only one database that interacts with multiple nodes, unlike Always-on Availability Groups, which has multiple databases. This also doesn't protect from data corruption, which relies on Data Guard.
RAC was architected for scalability and instance resiliency, which it does very well, but the default deployment will result, if there is a datacenter failure, the loss of all nodes and database, failing HA requirements.
- Another consistent issue is most applications still connected to RAC databases aren't "RAC aware". This results in outages when a failover or patch occurs, which is another challenge to the HA guarantee. All tiers/stack must be included in an HA solution.
- RAC environments have a number of additional components that add complexity to the environment that can create issues that cause failure, during failover and outside of it.
There have been a significant number of times where I've reviewed an AWR report and thought the best thing for an environment was to UnRAC it. The code and database design simply wasn't designed to run effectively or efficiently on RAC, resulting in high global cache waits, etc.
RAC, in many experts view, is for scalability and for me, scalability is as likely for growth as it is necessary due to lacking resources or experience to manage and build solutions to handle the needs.
Only one project out of 100 that I've worked with at Microsoft has had a real need for RAC and yes, I work mostly with multi-terabyte workloads, so an assumption should not be made that it was just small databases.
Deploying RAC in Azure
For RAC to work on Azure today, (not counting the new private preview) requires a third party service that works as a communication center for the software cluster.
Flashgrid works with IaaS VM Images and is supported through Flashgrid if any issues arise. A high level architecture diagram looks like the following:
If the customer is running anything more than a simple Oracle RAC environment- they've deployed a complex data model, complex code or application layer or they don't meet any of the requirements for scaling with RAC, I'm going to try to convince them to architect for the cloud instead.
Architecting for the Cloud
An Azure datacenter is built on a globally distributed infrastructure, which contains numerous layers of redundancy and a resilient interconnected network. This is far superior to an on-prem datacenter because it HAS TO BE. Geo-regions are fault tolerant in case of complete regional datacenter failure, which means the way we architect for the cloud is often different than the way we architect on-prem.
The Keep It Simple Silly, (KISS) principle comes in handy here, as complexity only impacts management, deployment and licensing costs for our Azure lift and shift projects. Best practice for Oracle, which is also stated in the docs states:
- With the scalability of Azure cloud VMs, the database to deploy should be a single instance Oracle database, (or Oracle supported product).
- To help scale, consider using Oracle Active Dataguard to leverage 1, 2 or more secondary databases for reporting, feeding ELT/ETL or backups.
- If deploying a secondary Dataguard to another region, consider using an Oracle Far Sync instance to assist in keeping them up to date.
- Also use Oracle Active Dataguard configured automatic failover for DR purposes, designating sequential failover steps as required.
- Use Azure Site Recovery, (ASR) to take snapshots of the Oracle VM(s) and create new copies that can be used to quickly do a final recovery to a consistent state vs. cloning or recovering from a full backup.
- Use RMAN to take backups and save backups to Azure Blob storage.
Oracle on Azure High Level Diagram
If the database needs more resources, it is easy to scale the VM(s) up as necessary. I spend a larger amount of time calculating IO to make sure the disk IO has room to grow over time. Now Disk is separate from the VMs and is important to Oracle- I'd like to save that for another post, so I will leave you with this:
- Have the discussion about what Azure cloud, any cloud is and how it is architected differently than an on-prem datacenter.
- Ask the customer why they are using RAC and then ask them if their RAC environment passes the HA or scalability needs of RAC.
- Seriously consider how Oracle Dataguard, either passive or active can play a role into a strong HA and DR story for the customer. The product is incredibly robust and is superior to cloud needs for customer's Oracle databases. As Dataguard is less than RAC, it can save the customer a considerable amount of money on licensing costs, too.
It's alright the DBA might want to simply keep their RAC skills up-to-date by having it- I understand, I've got 2 decades under my belt as a DBA. The thing is, there are so many cool new tools and products, like Azure CLI, Azure services and automation with DevOps to learn, there's plenty of new skills they'll acquire that will make them more valuable than just knowing Oracle RAC.