azure sql database
142 TopicsLesson Learned #515:Recommendations for Troubleshooting - Login failed for user 'XXX'. (18456)
During a recent support case, a customer encountered the error:pyodbc.InterfaceError: ('28000', "[28000] [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]Login failed for user 'XXX'. (18456) (SQLDriverConnect); ") using a Python code. Following, I would like to share my lessons learned to fix this issue. The error code18456 is typically caused by login issues, such as incorrect or missing credentials, rather than connectivity or networking problems. In our investigation, we identified the root cause and suggested recommendations to prevent and resolve similar issues. Root Cause The application was configured to retrieve the database server and host name from environment variables. However: Missing Environment Variables: One or more variables were not set. Default Value Misconfiguration: The code defaulted to a hardcoded value when variables were missing. For example, the server defaulted to "localhost", which was not the intended database server. As a result, the application attempted to connect to an unintended server with incorrect or missing credentials, leading to the Login failed error. Recommendations 1. Validate Environment Variables Always validate critical environment variables like server, username, and password. If a required variable is missing or empty, the application should raise an explicit error or log a clear warning. 2. Avoid Misleading Defaults Use placeholder values, such as "NOT_SET", as defaults for critical variables. This approach ensures that misconfigurations are immediately visible and do not silently fail. 3. Log Connection Details Log critical details like the server and database being accessed. Ensure this information is included in application logs to make troubleshooting easier. Avoid logging sensitive information such as passwords. Python Solution I was thinking how to improve the Python code, implementing a validation of environment variables, handle errors and log critical connection details: import os def get_env_variable(var_name, default_value=None, allow_empty=False): """ Retrieves and validates an environment variable. :param var_name: The name of the environment variable. :param default_value: The default value if the variable is missing. :param allow_empty: If False, raises an error for empty variables. :return: The value of the environment variable or default_value. Example: server = get_env_variable("DB_SERVER", default_value="NOT_SET") """ value = os.getenv(var_name, default_value) if value is None or (not allow_empty and (value.strip() == "" or value.strip() == "NOT_SET" or default_value is None)): raise ValueError(f"Environment variable '{var_name}' is required but not set.") return valueLesson Learned #514: Optimizing Bulk Insert Performance in Parallel Data Ingestion - Part1
While working on a support case, we encountered an issue where bulk inserts were taking longer than our customer’s SLA allowed. Working on the troubleshooting scenario, we identified three main factors affecting performance: High Transaction Log (TLOG) Threshold: The transaction log was under significant pressure, impacting bulk insert speeds. Excessive Indexes: The table had multiple indexes, adding overhead during each insert. High Index Fragmentation: Index fragmentation was high, further slowing down data ingestion. We found that the customer was using partitioning within a single table, and we began considering a restructuring approach that would partition data across multiple databases rather than within a single table. This approach provided several advantages: Set individual Transaction LOG thresholds for each partition, reducing the log pressure. Disable indexes before each insert and rebuild them afterward. In a single table with partitioning, it is currently not possible to disable indexes per partition individually. Select specific Service Level Objectives (SLOs) for each database, optimizing resource allocation based on volume and SLA requirements. Indexed Views for Real-Time Data Retrieval: Partitioned databases allowed us to create indexed views that updated simultaneously with data inserts more faster because the volumen is less, enhancing read performance and consistency. Reduced Fragmentation and Improved Query Performance: By rebuilding indexes after each bulk insert, we maintained optimal index structure across all partitions, significantly improving read speeds and taking less time. By implementing these changes, we were able to achieve a significant reduction in bulk insert times, helping our customer meet SLA targets. Following, I would like to share the following code: This code reads CSV files from a specified folder and performs a parallel bulk inserts per each file.Each CSV file has a header, and the data is separated by the | character. During the reading process, the code identifies the value in column 20, which is a date in the format YYYY-MM-DD. This date value is converted to the YYYY-MM format, which is used to determine the target database where the data should be inserted.For example, if the value in column20 is 2023-08-15, it extracts 2023-08 and directs the record to the corresponding database for that period, for example, db202308. Once the batchsize is reached per partition (rows per YYYY-MM read in the CSV file), the application executes in parallel the SQLBulkCopy. using System; using System.Collections.Generic; using System.Data; using System.Diagnostics; using System.IO; using System.Runtime.Remoting.Contexts; using System.Threading; using System.Threading.Tasks; using Microsoft.Data.SqlClient; namespace BulkInsert { class Program { static string sqlConnectionStringTemplate = "Server=servername.database.windows.net;Database={0};Authentication=Active Directory Managed Identity;User Id=xxxx;MultipleActiveResultSets=False;Encrypt=True;TrustServerCertificate=False;Pooling=true;Max Pool size=300;Min Pool Size=100;ConnectRetryCount=3;ConnectRetryInterval=10;Connection Lifetime=0;Application Name=ConnTest Check Jump;Packet Size=32767"; static string localDirectory = @"C:\CsvFiles"; static int batchSize = 1248000; static SemaphoreSlim semaphore = new SemaphoreSlim(10); static async Task Main(string[] args) { var files = Directory.GetFiles(localDirectory, "*.csv"); Stopwatch totalStopwatch = Stopwatch.StartNew(); foreach (var filePath in files) { Console.WriteLine($"Processing file: {filePath}"); await LoadDataFromCsv(filePath); } totalStopwatch.Stop(); Console.WriteLine($"Total processing finished in {totalStopwatch.Elapsed.TotalSeconds} seconds."); } static async Task LoadDataFromCsv(string filePath) { Stopwatch fileReadStopwatch = Stopwatch.StartNew(); var dataTables = new Dictionary<string, DataTable>(); var bulkCopyTasks = new List<Task>(); using (StreamReader reader = new StreamReader(filePath, System.Text.Encoding.UTF8, true, 819200)) { string line; string key; long lineCount = 0; long lShow = batchSize / 2; reader.ReadLine(); fileReadStopwatch.Stop(); Console.WriteLine($"File read initialization took {fileReadStopwatch.Elapsed.TotalSeconds} seconds."); Stopwatch processStopwatch = Stopwatch.StartNew(); string[] values = new string[31]; while (!reader.EndOfStream) { line = await reader.ReadLineAsync(); lineCount++; if(lineCount % lShow == 0 ) { Console.WriteLine($"Read {lineCount}"); } values = line.Split('|'); key = DateTime.Parse(values[19]).ToString("yyyyMM"); if (!dataTables.ContainsKey(key)) { dataTables[key] = CreateTableSchema(); } var batchTable = dataTables[key]; DataRow row = batchTable.NewRow(); for (int i = 0; i < 31; i++) { row[i] = ParseValue(values[i], batchTable.Columns[i].DataType,i); } batchTable.Rows.Add(row); if (batchTable.Rows.Count >= batchSize) { Console.WriteLine($"BatchSize processing {key} - {batchTable.Rows.Count}."); Stopwatch insertStopwatch = Stopwatch.StartNew(); bulkCopyTasks.Add(ProcessBatchAsync(dataTables[key], key, insertStopwatch)); dataTables[key] = CreateTableSchema(); } } processStopwatch.Stop(); Console.WriteLine($"File read and processing of {lineCount} lines completed in {processStopwatch.Elapsed.TotalSeconds} seconds."); } foreach (var key in dataTables.Keys) { if (dataTables[key].Rows.Count > 0) { Stopwatch insertStopwatch = Stopwatch.StartNew(); bulkCopyTasks.Add(ProcessBatchAsync(dataTables[key], key, insertStopwatch)); } } await Task.WhenAll(bulkCopyTasks); } static async Task ProcessBatchAsync(DataTable batchTable, string yearMonth, Stopwatch insertStopwatch) { await semaphore.WaitAsync(); try { using (SqlConnection conn = new SqlConnection(string.Format(sqlConnectionStringTemplate, $"db{yearMonth}"))) { await conn.OpenAsync(); using (SqlTransaction transaction = conn.BeginTransaction()) using (SqlBulkCopy bulkCopy = new SqlBulkCopy(conn, SqlBulkCopyOptions.Default, transaction) { DestinationTableName = "dbo.dummyTable", BulkCopyTimeout = 40000, BatchSize = batchSize, EnableStreaming = true }) { await bulkCopy.WriteToServerAsync(batchTable); transaction.Commit(); } } insertStopwatch.Stop(); Console.WriteLine($"Inserted batch of {batchTable.Rows.Count} rows to db{yearMonth} in {insertStopwatch.Elapsed.TotalSeconds} seconds."); } finally { semaphore.Release(); } } static DataTable CreateTableSchema() { DataTable dataTable = new DataTable(); dataTable.Columns.Add("Column1", typeof(long)); dataTable.Columns.Add("Column2", typeof(long)); dataTable.Columns.Add("Column3", typeof(long)); dataTable.Columns.Add("Column4", typeof(long)); dataTable.Columns.Add("Column5", typeof(long)); dataTable.Columns.Add("Column6", typeof(long)); dataTable.Columns.Add("Column7", typeof(long)); dataTable.Columns.Add("Column8", typeof(long)); dataTable.Columns.Add("Column9", typeof(long)); dataTable.Columns.Add("Column10", typeof(long)); dataTable.Columns.Add("Column11", typeof(long)); dataTable.Columns.Add("Column12", typeof(long)); dataTable.Columns.Add("Column13", typeof(long)); dataTable.Columns.Add("Column14", typeof(DateTime)); dataTable.Columns.Add("Column15", typeof(double)); dataTable.Columns.Add("Column16", typeof(double)); dataTable.Columns.Add("Column17", typeof(string)); dataTable.Columns.Add("Column18", typeof(long)); dataTable.Columns.Add("Column19", typeof(DateTime)); dataTable.Columns.Add("Column20", typeof(DateTime)); dataTable.Columns.Add("Column21", typeof(DateTime)); dataTable.Columns.Add("Column22", typeof(string)); dataTable.Columns.Add("Column23", typeof(long)); dataTable.Columns.Add("Column24", typeof(double)); dataTable.Columns.Add("Column25", typeof(short)); dataTable.Columns.Add("Column26", typeof(short)); dataTable.Columns.Add("Column27", typeof(short)); dataTable.Columns.Add("Column28", typeof(short)); dataTable.Columns.Add("Column29", typeof(short)); dataTable.Columns.Add("Column30", typeof(short)); dataTable.Columns.Add("Column31", typeof(short)); dataTable.BeginLoadData(); dataTable.MinimumCapacity = batchSize; return dataTable; } static object ParseValue(string value, Type targetType, int i) { if (string.IsNullOrWhiteSpace(value)) return DBNull.Value; if (long.TryParse(value, out long longVal)) return longVal; if (double.TryParse(value, out double doubleVal)) return doubleVal; if (DateTime.TryParse(value, out DateTime dateVal)) return dateVal; return value; } } }Lesson Learned #513: Using SQL Copilot and Python Retry Logic to Resolve Deadlocks in Azure SQL DB
A few weeks ago, I worked with a service request where our customer has a Python application that was reporting a deadlock with the following error message: INFO:root:Connected to the Database in dbname - Thread id 21132 - 0.0454 seconds --- INFO:root:Connected to the Database in dbname - Thread id 39856 - 0.4066 seconds --- ERROR:root:Transaction 2 failed: ('40001', '[40001] [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]Transaction (Process ID 84) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction. (1205) (SQLExecDirectW)') INFO:root:Transaction 1 completed successfully. In many of our articles and public information like these ones Analyze and prevent deadlocks in Azure SQL Database or Lesson Learned #19: How to obtain the deadlocks of your Azure SQL Database or Managed Instance?, we review ways to detect and resolve deadlocks by analyzing their details. At this time, I would like to share my experience using Copilot in Azure SQL Database to pinpoint the deadlock. After the error received, we started using with several prompts like Give me the deadlock that occurred in my database recently or Provide detailed information about the last deadlock in my database, including involved tables and queries and we got the following information: Based on the internal Azure SQL telemetry, the following deadlock query hashes have been identified in your database within the past 2 hours: Deadlock Details Query Hash 1: 0xD57AFD9584A2C053 Query Text 1: (@2 tinyint,@1 varchar(8000))UPDATE [dbo].[MS_TableA_MS] SET [ValueA] = @1 WHERE [ID]=@2; Query Hash 2: 0xBEAB84B27CE1F534 Query Text 2: (@2 tinyint,@1 varchar(8000))UPDATE [dbo].[MS_TableB_MS] SET [ValueB] = @1 WHERE [ID]=@2; At this point we identified the queries involved on this deadlock and we executed a second prompt that we used was: Suggest options to prevent or mitigate the recent deadlock in my database and we got very useful information about multiple options to prevent or mitigate the deadlock, like, Index Optimization, Force a Query Plan, Rewriting Transactions, Retry Logic in Application or T-SQL Code, etc.. For application retry logic, we traced the source code that triggered the issue and added a retry mechanism to handle the deadlock. Here’s a simple example of what we implemented. Basically, this small code, creates two tables, runs two transactions in two different threads and depending on the value of retry parameter retry or not the deadlock operation. # Constants for table name used in the deadlock scenario TABLE_A_NAME = "dbo.MS_TableA_MS" TABLE_B_NAME = "dbo.MS_TableB_MS" TABLE_A_COLUMN = "ValueA" TABLE_B_COLUMN = "ValueB" ITEM_ID = 1 TABLE_A_INITIAL_VALUE = 'Value A1' TABLE_B_INITIAL_VALUE = 'Value B1' def simulate_deadlock(retry=False, retry_attempts=3): """ Simulates a deadlock by running two transactions in parallel that lock resources. If retry is enabled, the transactions will attempt to retry up to a specified number of attempts. """ setup_deadlock_tables() thread1 = threading.Thread(target=run_transaction_with_retry, args=(deadlock_transaction1, retry_attempts,retry)) thread2 = threading.Thread(target=run_transaction_with_retry, args=(deadlock_transaction2, retry_attempts,retry)) thread1.start() thread2.start() thread1.join() thread2.join() def run_transaction_with_retry(transaction_func, attempt_limit,retry): attempt = 0 while attempt < attempt_limit: try: transaction_func(retry) break except pyodbc.Error as e: if 'deadlock' in str(e).lower() and retry: attempt += 1 logging.warning(f"Deadlock detected. Retrying transaction... Attempt {attempt}/{attempt_limit}") time.sleep(1) else: logging.error(f"Transaction failed: {e}") break def setup_deadlock_tables(): """ Sets up the tables required for the deadlock simulation, using constants for table and column names. """ conn, dbNameReturn = ConnectToTheDB() if conn is None: logging.info('Error establishing connection to the database. Exiting setup.') return cursor = conn.cursor() try: cursor.execute(f""" IF OBJECT_ID('{TABLE_A_NAME}', 'U') IS NULL CREATE TABLE {TABLE_A_NAME} ( ID INT PRIMARY KEY, {TABLE_A_COLUMN} VARCHAR(100) ); """) cursor.execute(f""" IF OBJECT_ID('{TABLE_B_NAME}', 'U') IS NULL CREATE TABLE {TABLE_B_NAME} ( ID INT PRIMARY KEY, {TABLE_B_COLUMN} VARCHAR(100) ); """) cursor.execute(f"SELECT COUNT(*) FROM {TABLE_A_NAME}") if cursor.fetchone()[0] == 0: cursor.execute(f"INSERT INTO {TABLE_A_NAME} (ID, {TABLE_A_COLUMN}) VALUES ({ITEM_ID}, '{TABLE_A_INITIAL_VALUE}');") cursor.execute(f"SELECT COUNT(*) FROM {TABLE_B_NAME}") if cursor.fetchone()[0] == 0: cursor.execute(f"INSERT INTO {TABLE_B_NAME} (ID, {TABLE_B_COLUMN}) VALUES ({ITEM_ID}, '{TABLE_B_INITIAL_VALUE}');") conn.commit() logging.info("Tables prepared successfully.") except Exception as e: logging.error(f"An error occurred in setup_deadlock_tables: {e}") conn.rollback() finally: conn.close() def deadlock_transaction1(retry=False): conn, dbNameReturn = ConnectToTheDB() if conn is None: logging.info('Error establishing connection to the database. Exiting transaction 1.') return cursor = conn.cursor() try: cursor.execute("BEGIN TRANSACTION;") cursor.execute(f"UPDATE {TABLE_A_NAME} SET {TABLE_A_COLUMN} = 'Transaction 1' WHERE ID = {ITEM_ID};") time.sleep(2) cursor.execute(f"UPDATE {TABLE_B_NAME} SET {TABLE_B_COLUMN} = 'Transaction 1' WHERE ID = {ITEM_ID};") conn.commit() logging.info("Transaction 1 completed successfully.") except pyodbc.Error as e: logging.error(f"Transaction 1 failed: {e}") conn.rollback() if 'deadlock' in str(e).lower() and retry: raise e # Rethrow the exception to trigger retry in run_transaction_with_retry finally: conn.close() def deadlock_transaction2(retry=False): conn, dbNameReturn = ConnectToTheDB() if conn is None: logging.info('Error establishing connection to the database. Exiting transaction 2.') return cursor = conn.cursor() try: cursor.execute("BEGIN TRANSACTION;") cursor.execute(f"UPDATE {TABLE_B_NAME} SET {TABLE_B_COLUMN} = 'Transaction 2' WHERE ID = {ITEM_ID};") time.sleep(2) cursor.execute(f"UPDATE {TABLE_A_NAME} SET {TABLE_A_COLUMN} = 'Transaction 2' WHERE ID = {ITEM_ID};") conn.commit() logging.info("Transaction 2 completed successfully.") except pyodbc.Error as e: logging.error(f"Transaction 2 failed: {e}") conn.rollback() if 'deadlock' in str(e).lower() and retry: raise e # Rethrow the exception to trigger retry in run_transaction_with_retry, finally: conn.close()Lesson Learned #509: KeepAliveTime parameter in HikariCP
Today, I have been working on a service request where, at certain times, we observed that connections could be disconnected due to external factors such as firewalls or other components due to inactivity policies. For this reason, I would like to share my experience using the KeepAliveTime parameter.Lesson Learned #511: Timeout Attempting to Open the Connection in High-Thread Applications
Recently, I worked on a service request that a customer application reported the following error connecting to the database:"Timeout attempting to open the connection. The time period elapsed prior to attempting to open the connection has been exceeded. This may have occurred because of too many simultaneous non-pooled connection attempts.". Following, I would like to share the experience learned here.Lesson Learned #510: Using CProfiler to Analyze Python Call Performance in Support Scenarios
Last week, while working on a support case, our customer was facing performance issues in their Python application. After some investigation, I decided to suggestCProfiler to identify which function call was taking the most time and use that as a starting point for troubleshooting. So profiling the Python code became essential to pinpoint the bottleneck.I suggested using CProfiler, a built-in Python module, which helps you profile your code and identify performance issues in real time.565Views0likes0CommentsLesson Learned #497:Understanding the Ordering of uniqueidentifier in SQL Server
Today, I worked on a service request that our customer asked about how SQL Server sorts the uniqueidentifier data type. We know that uniqueidentifier store globally unique identifiers (GUIDs). GUIDs are widely used for unique keys due to their extremely low probability of duplication. One common method to generate a GUID in SQL Server is by using the NEWID() function. However, the ordering of GUIDs, especially those generated by NEWID(), can appear non-intuitive. I would like to share my lessons learned how to determine the shorting method using uniqueidentifier and NEWID().2.6KViews0likes1CommentLesson Learned #508: Monitoring Wait Stats and Handling Large Data Set
Sometimes, we get asked how much data the client application is receiving from Azure SQL Database, along with the time spent and the number of rows returned. I would like to share a simple example that allows us to approximate these details. I hope that you could find useful.1.3KViews0likes0Comments