database platform cse
55 TopicsDiagnosing and Resolving Oracle to Azure PostgreSQL Migration Failures with Ora2Pg
Introduction Migrating from Oracle to Azure PostgreSQL is a complex endeavor with multiple coordinated steps. Ora2Pg is a widely used tooling to accelerate the migration for both schema conversion and data migration. For installing Ora2Pg, access the step-by-step installation guide for Windows and Linux here After successful schema migration, Data migration can become complicated and time-consuming, especially when dealing with tables containing millions or billions of rows. The process may also encounter failures due to data errors related to platform, data type, or character set incompatibility. Identifying these errors can be challenging, but by applying the following debugging methods and troubleshooting steps, one can efficiently resolve these issues during large and complex migrations. Steps to debug the errors: Locate the Configuration File: Open your ora2pg.conf file. This file is typically found in the directory where you initialized your Ora2Pg project. Find the LOG_ON_ERROR parameter in the Ora2Pg configuration file. If it doesn't exist, you can add it. Set the value of LOG_ON_ERROR to 1 to enable logging of errors. Here's a brief example of how you set this in your ora2pg.conf file: Specify the Output Directory: Ensure that you have specified an output directory where the error logs will be saved. This can be done using the OUTPUT directive in the same configuration file. Proceed with the data import. When Ora2Pg encounters an error in a COPY or INSERT statement from PostgreSQL, it logs the problematic statements and data into a file named TABLENAME_error.log within the output directory and the migration will continue. By default, this setting is disabled, causing the import to abort upon error. Continue with the next batch of data. This approach allows you to troubleshoot and manually reload the error log file after fixing the issues. Review each column of the row to identify the problematic data. If you still can't determine the issue, update the log file by removing the SQL syntax and keeping the data in a tab-delimited format. Use the pgAdmin import utility to load the file and check if it generates the same error for the specific row. Capture the problematic data rows and attempt to insert them into a test environment with the same table structure or recreate the table. Run Ora2PG to import the data and check if the issue can be reproduced. If you have specific errors that you want to handle differently, you might need to adjust other parameters or review the logged errors to make necessary corrections before re-running the migration. Use Case: 1. Data migration failed and debug logs are showing some error message. 2. Enable LOG_ON_ERROR parameter file in config file. Re-run the data migration using Ora2Pg, which will generate a file containing data errors in the output directory. 3. Review the log file to identify problematic data. Apply appropriate remedies to resolve these issues. Search for errors and possible solution from here. 4. If no solution is available to fix data errors, consider using one of the following methods or tools to manually insert the data. Using Azure data factory. Update bad data manually in tablename_error file and import it using pgadmin. Export table using oracle developer into flat file and import using pgAdmin. Let’s go through some scenarios to illustrate why data copying might fail with Ora2Pg but succeed with Azure Data Factory or manual insertion using pgAdmin: Scenario 1: Data Type Handling Issue: Ora2Pg encounters an error while migrating a table with a NUMBER data type that has precision and scale values unsupported by PostgreSQL. Ora2Pg: Throws an error due to the unsupported precision and scale. Azure Data Factory: Automatically converts the NUMBER data type to a compatible PostgreSQL data type, such as NUMERIC. pgAdmin: Allows manual intervention to adjust the data type during insertion, avoiding the errors due to precision mismatch between Number and Numeric. Scenario 2: Character Set Issues Issue: The table contains special characters or emojis that are not supported by the configured character set in PostgreSQL. Ora2Pg: Fails to migrate the data due to character set incompatibility. Azure Data Factory: Handles the character set conversion more gracefully, ensuring the special characters are correctly encoded. pgAdmin: Allows manual encoding adjustments during data insertion, bypassing the character set issue. Scenario 3: Platform-Specific Incompatibilities Issue: The table includes Oracle-specific functions or features that are not directly compatible with PostgreSQL. (examples, Inbuilt Functions, Views or UDF) Ora2Pg: Encounters errors due to the strict enforcement of platform-specific rules. Azure Data Factory: Uses built-in transformations to convert Oracle-specific features to their PostgreSQL equivalents. pgAdmin: Allows manual editing of the data or SQL queries to ensure compatibility with PostgreSQL. Scenario 4: Error Handling and Logging Issue: A specific row contains invalid data that violates PostgreSQL constraints. Ora2Pg: Stops the migration process and logs the error, requiring manual intervention to fix the data. Azure Data Factory: Logs the error but continues with the migration, allowing you to address the problematic rows later. pgAdmin: Provides immediate feedback during manual insertion, allowing you to correct the data on the spot. Scenario 5: Configuration Differences Issue: Different default settings for data transfer and conversion. Ora2Pg: Uses strict default settings that might not be optimal for all data types and structures. Azure Data Factory: Offers more flexible configuration options that can be adjusted to handle various data scenarios. pgAdmin: Allows for manual configuration and adjustments during the data insertion process. Feedback and suggestions If you have feedback or suggestions for improving this data migration asset, please send an email to Azure Databases CSE Engineering Team.Optimizing Data Archival with Partitioning in Azure PostgreSQL for Oracle Migrations
Introduction As enterprises migrate mission-critical workloads from heterogeneous databases like Oracle to Azure Database for PostgreSQL Flexible Server, managing large datasets while ensuring compliance with strict data retention policies becomes a key priority. Industries such as retail, telecommunications, transportation and logistics industry, among others, enforce stringent data retention requirements to safeguard customer information, operational efficiency and maintain service reliability. Failure to meet these standards can lead to increased risks, including data loss, inefficiencies, and potential non-compliance issues. Implementing a robust data retention and partitioning strategy in PostgreSQL helps organizations efficiently manage and archive historical data while optimizing performance. Azure Database for PostgreSQL Flexible Server provides powerful partitioning capabilities that allow organizations to manage large volumes of data effectively. By partitioning tables based on time intervals or other logical segments, we can improve query performance, automate archival processes, and ensure efficient data purging—all while maintaining referential integrity across complex schemas. Migration Challenges While both Azure Database for PostgreSQL Flexible Server and Oracle support partitioning, their approaches differ significantly. In Azure Database for PostgreSQL Flexible Server, partitions are not created automatically; each partition must be explicitly defined using the CREATE TABLE statement. This means that when setting up a partitioned table, each partition must be created separately, requiring careful planning and implementation. This blog explores best practices for implementing range partitioning in Azure Database for PostgreSQL Flexible Server, maintaining referential integrity across multiple levels, and leveraging Azure Blob Storage through the azure storage extension to efficiently archive partitioned data. Implementing Partitioning with Example In this example, we demonstrate a partitioning strategy within the test_part schema, where a parent table logs and a child table child_logs are structured using range partitioning with a monthly interval. Depending on specific business requirements, the partitioning strategy can also be adjusted to quarterly or yearly intervals to optimize storage and query performance. --Create and set schema for the session CREATE SCHEMA test_part; SET SEARCH_PATH=test_part; --Create a partitioned table CREATE TABLE logs ( id integer not null, log_date date not null, message text ) PARTITION BY RANGE (log_date); --Add primary key constraints in parent partition table ALTER TABLE ONLY logs ADD primary key (id,log_date); --Define partition for each month CREATE TABLE logs_2024_01 PARTITION OF logs FOR VALUES FROM ('2024-01-01') TO ('2024-02-01'); CREATE TABLE logs_2024_02 PARTITION OF logs FOR VALUES FROM ('2024-02-01') TO ('2024-03-01'); --Create a Child partition table CREATE TABLE logs_child ( id integer, log_date date, message text, logs_parent_id integer ) PARTITION BY RANGE (log_date); --Add constraints ALTER TABLE ONLY logs_child ADD primary key (id,log_date); ALTER TABLE logs_child add constraint logs_child_fk foreign key(logs_parent_id,log_date) references logs(id,log_date) ON DELETE CASCADE; --Define a partition for each month CREATE TABLE logs_child_2024_01 PARTITION OF logs_child FOR VALUES FROM ('2024-01-01') TO ('2024-02-01'); CREATE TABLE logs_child_2024_02 PARTITION OF logs_child FOR VALUES FROM ('2024-02-01') TO ('2024-03-01'); --Insert data into the parent partition table : INSERT INTO logs (id,log_date, message) VALUES (1,'2024-01-15', 'Log message 1'); INSERT INTO logs (id,log_date, message) VALUES (11,'2024-01-15', 'Log message 1'); INSERT INTO logs (id,log_date, message) VALUES (2,'2024-02-15', 'Log message 2'); INSERT INTO logs (id,log_date, message) VALUES (22,'2024-02-15', 'Log message 2'); --Insert data into child partition table: INSERT INTO logs_child values (1,'2024-01-15', 'Log message 1',1); INSERT INTO logs_child values (2,'2024-01-15', 'Log message 1',1); INSERT INTO logs_child values (5,'2024-02-15', 'Log message 2',22); INSERT INTO logs_child values (6,'2024-02-15', 'Log message 2',2); --Review data using Select SELECT * FROM logs; SELECT * FROM logs_2024_01; SELECT * FROM logs_2024_02; SELECT * FROM logs_child_2024_01; SELECT * FROM logs_child_2024_02; Detach the partition: While detaching partition follow below order, first detach the child table partition, remove FK and then detach the parent table partition. --Remove partitioning tables ALTER TABLE logs_child DETACH PARTITION logs_child_2024_02; ALTER TABLE logs_child_2024_02 DROP CONSTRAINT logs_child_fk; ALTER TABLE logs DETACH PARTITION logs_2024_02; Archive the partition table in the Azure blob storage: The following steps demonstrate how to restore removed partition data in Azure Blob Storage using Microsoft Entra ID for authorization. Create an azure_storage extension by following steps provided in the link here. --Create an extension CREATE EXTENSION azure_storage; SET search_path=Azure_storage; --Add account (Entra id or Authentication keys steps provided in reference document link here) SELECT * FROM azure_storage.account_options_managed_identity('shaystorage','blob'); SELECT * FROM azure_storage.account_add('{ "auth_type": "managed-identity", "account_name": "shayrgstorage", "account_type": "blob" }'); SET SEARCH_PATH=test_part; COPY test_part.logs_child_2024_02 TO 'https://shayrgstorage.blob.core.windows.net/pgtable/logs_child_2024_02.csv' WITH (FORMAT 'csv', header); View or load the archived partitioned table --After truncating data from the partition, view data from Azure storage .csv file. When archival data needed for ready only purpose TRUNCATE TABLE test_part.logs_child_2024_02; SELECT * FROM test_part.logs_child_2024_02; SELECT * FROM azure_storage.blob_get ('shayrgstorage' ,'pgtable' ,'logs_child_2024_02.csv' ,NULL::test_part.logs_child_2024_02 ,options := azure_storage.options_csv_get(delimiter := ',' , header := 'true') ); --Load data from .csv file to Azure database for PostgreSQL flexible server table. When need to restore the data for update purpose TRUNCATE TABLE test_part.logs_child_2024_02; INSERT INTO test_part.logs_child_2024_02 (id,log_date,message,logs_parent_id) SELECT * FROM azure_storage.blob_get ('shayrgstorage' ,'pgtable' ,'logs_child_2024_02.csv' ,NULL::test_part.logs_child_2024_02 ,options := azure_storage.options_csv_get(delimiter := ',' , header := 'true') ); Attach the partition table --Attach the partition table to view data in partition table for operation purpose ALTER TABLE test_part.logs attach PARTITION test_part.logs_2024_02 for values from ('2024-02-01') TO ('2024-03-01'); ALTER TABLE test_part.logs_child attach PARTITION test_part.logs_child_2024_02 for values from ('2024-02-01') TO ('2024-03-01'); Alternative Data Archival Strategies Based on Business Requirements: Deploy a lower sku Azure database for PostgreSQL Server such as Burstable or General Purpose service tier. Utilize the postgres_fdw extension to move the data between tables resides in different PostgreSQL databases or instances. Burstable servers are available with up to 64 TB of space. Automate database start/stop processes to minimize expenses when loading or extracting data. If the database size is relatively small, consider removing a partition from a partitioned table using the ALTER TABLE DETACH PARTITION command, converting it into a standalone table for easier archival. Use LTR options to retain the database backups for up to 10 years, depending on the business requirement and restore it when needed. For more information review here Utilize Azure Data Factory (ADF) Pipeline to move data into Azure storage and restore it as needed using automation scripts. Feedback and suggestions If you have feedback or suggestions for improving this data migration asset, please contact the Databases SQL Customer Success Engineering (Ninja) Team (datasqlninja@microsoft.com). Thanks for your support! Note: For additional information about migrating various source databases to Azure, see the Azure Database Migration Guide.Azure SQL Managed Instance – Sync Agent Jobs and Logins in Failover Groups
In an Azure SQL Managed Instance setup with Failover Group, the schema, data, and database-level users will always be synced between primary and secondary instances. As of October 2021, there is no sync mechanism for SQL Server Agent Jobs or Server Logins/Roles because the functionality of replicating system databases does not exist. This document provides guidance and scripts to implement a workaround solution to have the Agent Jobs and Server Logins/Roles synced between primary and secondary instances.Oracle to Azure SQL Post Migration Optimization: Handling Oracle Function-Based Indexes in Azure SQL
Why Post-Migration Optimization Matters For customers migrating from Oracle to Azure SQL Databases, understanding Azure SQL optimization techniques can make a significant difference in overall database performance. During the Application Stress Testing phase, it is essential to identify query patterns, evaluate index usage, and refine queries to prevent performance bottlenecks and ensure system efficiency. While Azure SQL databases Intelligent Query Processing (IQP) can significantly improve performance with minimal intervention, certain complex scenarios may still require manual query rewrites to achieve the desired level of performance. Leveraging SARGable Queries in SQL Server One such optimization technique involves ensuring that queries in SQL Server are SARGable (Search ARGument-able). SARGable queries enable the SQL Engine to efficiently leverage indexes, significantly reducing resource consumption and execution time. For instance, small adjustments such as replacing column transformations with range-based conditions, creating computed columns, and adding required indexes on computed columns can have a significant impact on query performance. Metric Non-SARGable Query SARGable Query Execution Plan Index Scan (usually less efficient) Index Seek (usually more efficient) IO Cost High (Reads Many Pages) Low (Reads Fewer Pages) CPU Time Higher (Function Evaluation for Rows) Lower (Direct Use of Index) Logical Reads High (entire index/table scanned) Low (only relevant index pages accessed) Handling Function-Based Indexes: Oracle vs SQL Server Oracle: In Oracle, function-based indexes allow indexing on the result of a function or expression. This enables efficient query execution when functions or expressions are used in WHERE or JOIN conditions. The TRUNC(APPLN_START_DT) function is precomputed and stored in the Index. Queries can get the value of the expression from the index instead of computing it. The following is an example of this in practice: -- Create a function-based index on the TRUNC(APPLN_START_DT) expression CREATE INDEX idx_appln_start_dt ON appln_data ( trunc(appln_start_dt) ); -- Filter rows where the year of APPLN_START_DT matches the current year. SELECT * FROM appln_data WHERE trunc(appln_start_dt, 'YYYY') = trunc(sysdate, 'YYYY'); SQL Server: SQL Server takes a slightly different approach to handling Indexes and Query Optimization, particularly when working with large datasets or tables containing millions of rows. The SQL optimizer excels when queries are SARGable (Search ARGument-able), meaning the predicates are written to allow direct index utilization without applying transformations to columns. Let’s analyze this with a simple and practical example to see how query design impacts performance, resource usage, and execution efficiency. Let's analyze two versions of a query on a table [APPLN_DATA] with 50 million rows to understand how query formulation impacts performance, resource utilization, and execution efficiency. Comparing SARGable and Non-SARGable Queries Non-SARGable Query: -- Filter rows where the year of APPLN_START_DT matches the current year. SELECT * FROM [dbo].[APPLN_DATA] WHERE YEAR(APPLN_START_DT) = YEAR(GETDATE()); Execution Plan: Index Scan. Logical Reads: 928,804 CPU Time: 7,562 ms Physical Reads: 2 Elapsed Time: 63,130 ms Execution Plan: Why It's Inefficient: The YEAR() function transforms the APPLN_START_DT column, which prevents the optimizer from performing an Index Seek and forces an Index Scan. The execution plan confirms this by showing an Index Scan, where SQL Server reads through all rows instead of directly seeking the relevant ones using the index. This behavior increases resource consumption, including logical reads, CPU usage, I/O costs, and overall query execution time. SARGable Query: SELECT * FROM [dbo].[APPLN_DATA] WHERE APPLN_START_DT >= CAST(YEAR(GETDATE()) AS CHAR(4)) + '-01-01' AND APPLN_START_DT < CAST(YEAR(GETDATE()) + 1 AS CHAR(4)) + '-01-01'; Execution Plan: Index Seek. Logical Reads: 173,041 CPU Time: 3,547 ms Elapsed Time: 52,092 ms Execution Plan: Why It's Better: The query uses range-based conditions (>= and <) directly on the APPLN_START_DT column without applying any functions. This makes the query SARGable, allowing SQL Server to efficiently perform an Index Seek, as shown in the execution plan. An Index Seek ensures that SQL Server retrieves only the relevant rows, significantly reducing logical reads, CPU usage, and overall query execution time. Understanding the Key Differences and Impact of SARGable and Non-SARGable Queries Aspect Non-SARGable SARGable Query Example WHERE YEAR(APPLN_START_DT) = YEAR(GETDATE()) WHERE APPLN_START_DT >= '2024-01-01' Index Usage Index Scan (Due to YEAR() function - inefficient) Index Seek (efficient) Logical Reads High (entire index/table scanned) Low (only relevant index pages accessed) CPU Time High Low Elapsed Time Longer Shorter Query Optimization Applies function on the column Uses range conditions without transformations Additional Optimization Techniques: A Computed column with an index was not used here because the query retrieves rows for an entire year, making range-based conditions more efficient. Computed indexes are more suitable for scenarios with specific transformations on smaller subsets of data, such as Calculations and Filtering for a particular month or day. In our example, we use SELECT * for simplicity, but in real-world workloads, queries typically retrieve specific columns based on application requirements. Selecting only the required columns reduces I/O and improves query performance. Additionally, Covering Indexes (indexes that include all queried columns) should be considered to eliminate Key Lookups, further enhancing performance for frequently accessed queries. If your table has billions of rows, even efficient Index Seeks may take considerable time. Filtered Indexes, Indexed Views and Partitioning the table based on a logical key (e.g., date ranges) can help improve query performance further. Key Takeaways for a successful Heterogenous Migration: Oracle's function-based indexes can be adapted in SQL Server using Range-based Conditions, Computed Columns with Indexes, Indexed Views, or Application-level Optimizations. Aim to write SARGable queries to leverage SQL Server's indexing capabilities effectively. Avoid using Functions in Query Predicates. Evaluate and consider the use of Indexed Views and Filtered Indexes. It is crucial to iteratively analyze and Update Index Statistics, Evaluate Index Access patterns, Review Histograms, monitor Query Plans and Fragmentation, Rebuilding Indexes during the App Stress Testing phase to achieve optimal performance. Collaborate with application developers to refactor query logic when needed. References: Intelligent query processing details - SQL Server | Microsoft Learn Create indexed views - SQL Server | Microsoft Learn Create filtered indexes - SQL Server | Microsoft Learn Migrating Oracle Databases to SQL Server (OracleToSQL) - SQL Server | Microsoft Learn Automatic tuning - SQL Server | Microsoft Learn Feedback and suggestions If you have feedback or suggestions for improving this data migration asset, please contact the Databases SQL Ninja Engineering Team (datasqlninja@microsoft.com). Thanks for your support! Note: For additional information about migrating various source databases to Azure, see the Azure Database Migration Guide.Troubleshooting page-related performance issues in Azure SQL
Azure SQL is a family of managed, secure, and intelligent products that use the SQL Server database engine in the Azure cloud. Though Azure SQL is built upon the familiar SQL Server engine, there are some differences between SQL Server and Azure SQL, such as availability of certain diagnostic commands like DBCC PAGE. DBCC PAGE is a very useful command in SQL Server for troubleshooting and inspecting the internal structure of data pages, but it is not supported in Azure SQL due to differences in the underlying infrastructure and management approaches. This limitation can present some challenges for database administrators and developers who depend on DBCC PAGE for troubleshooting. Nevertheless, Azure SQL provides alternative methods and tools for database troubleshooting, ensuring that DBAs can still achieve effective results, even without the use of DBCC PAGE. This article explores these alternatives, though they do not fully replace DBCC PAGE.Data Archiving Strategies for SQL Server
As databases grow, keeping them performing well becomes increasingly more difficult. Often pruning data that is no longer required is a better strategy. This blog explores different techniques to remove or archive data at a high level. Some of the techniques depend on the version of SQL Server that you are running (box product, Azure SQL DB/HyperScale, or Azure SQL Managed Instance), but some techniques apply to all.Database Schema Compare Tool
Migration to Azure SQL is far easier than before. Availability of Schema migration, Database migration, Database comparison, Schema comparison tools from Microsoft makes migration easier and risk free. This blog is about how the Database Schema comparison tool can be used to perform Db2 and Azure SQL Schema comparison.Part 1 - Azure SQL DB Hyperscale Table Partitioning - Best Practices & Recommendations
Implementing table partitioning on a table that is exceptionally large in Azure SQL Database Hyperscale is not trivial due to the large data movement operations involved, and potential downtime needed to accomplish them efficiently. To help customers implement partitioning on these large tables, this 2-part article goes over the details. Part 1 of this blog aims to help consider table partitioning fundamentals alongside other database features and concepts, in preparation to implement table partitioning in an actual table which is covered in Part 2.