KQL
97 TopicsQuery Acceleration for Delta External Tables (Preview)
An external table is a schema entity that references data stored external to a Kusto database. Queries run over external tables can be less performant than on data that is ingested due to various factors such as network calls to fetch data from storage, the absence of indexes, and more. Query acceleration allows specifying a policy on top of external delta tables. This policy defines a number of days to cache data for high-performance queries. Query Acceleration policy allows customers to set a policy on top of external delta tables to define the number of days to cache. Behind the scenes, Kusto continuously indexes and caches the data for that period, allowing customers to run performant queries on top. QAP is supported by Azure Data Explorer (ADX) over ADLSgen2/blob storage and Eventhouse over OneLake/ADLSgen2/blob storage. Query Acceleration policy We are introducing a new policy to enable acceleration for delta external tables: Syntax .alter external table <TableName> policy query_acceleration 'Policy' Where: <TableName> is the name of a Delta Parquet external table. <Policy> is a string literal holding a JSON property bag with the following properties: IsEnabled : Boolean, required. - If true, query acceleration is enabled. Hot: TimeSpan, last 'N' days of data to cache. Steps to enable Query Acceleration Create a delta external table as described inthis document: .create-or-alter external table <TableName> kind=delta ( h@'https://storageaccount.blob.core.windows.net/container;<credentials> ) Set a query acceleration policy .alter external table <TableName> policy query_acceleration ```{ "IsEnabled": true, "Hot": "36500d" }``` Query the table. external_table('TableName') Note: Indexing and caching might take some time depending on the volume of data and cluster size. For monitoring the progress, see Monitoring command Costs/Billing Enabling Query Acceleration does come with some additional costs. The accelerated data will be ingested in Kusto and count towards the SSD storage, similar to native Kusto tables. You can control the amount of data to accelerate by configuring number of days to cache. Conclusion Query Acceleration is a powerful feature designed to enhance your data querying capabilities on PetaBytes of data. By understanding when and how to use this feature, you can significantly improve the efficiency and speed of your data operations - whether you are dealing with large datasets, complex queries, or real-time analytics, Query Acceleration provides the performance boost you need to stay ahead. Get started with Azure Data Explorer. Get started with Eventhouse in Microsoft Fabric.117Views1like0CommentsCountry and Region Information in current_principal_details
Discover how Eventhouse's newCountry and Region Informationfeature in the current_principal_details() function is revolutionizing data security and accessibility. By leveraging Microsoft Entra ID, this feature provides country and region information, ensuring sensitive data is accessible only to authorized individuals within specified locations.1.1KViews0likes0CommentsOptimizing Vector Similarity Search on Azure Data Explorer – Performance Update
This post is co-authored byAnshul_Sharma (Senior Program Manager, Microsoft). This blog is an update of Optimizing Vector Similarity Searches at Scale. We continue to improve the performance of vector similarity search in Azure Data Explorer (Kusto). We present the new functions and policies to maximize performance and the resulting search times. The following table and chart present the search time for the top 3 most similar vectors to a supplied vector: # of vectors Total time [sec.] 25,000 0.03 50,000 0.035 100,000 0.047 200,000 0.062 400,000 0.094 800,000 0.125 1,600,000 0.14 3,200,000 0.15 6,400,000 0.19 12,800,000 0.35 25,600,000 0.55 51,200,000 1.1 102,400,000 2.3 204,800,000 3.9 409,600,000 7.6 This benchmark was done on a medium size Kusto cluster (containing 29 nodes), searching for the most similar vectors in a table of Azure OpenAI embedding vectors. Each vector was generated using ‘text-embedding-ada-002’ embedding model and contains 1536 coefficients. These are the steps to achieve the best performance of similarity search: Use series_cosine_similarity(), the new optimized native function to calculate cosine similarity Set the encoding of the embeddings column to Vector16, the new 16 bit encoding of the vectors coefficients (instead of the default 64 bit) Store the embedding vectors table on all nodes with at least one shard per processor. This can be achieved by limiting the number of embedding vectors per shard by altering ShardEngineMaxRowCount of the sharding policy and RowCountUpperBoundForMerge of the merging policy. Suppose our table contains 1M vectors and our Kusto cluster has 20 nodes each has 16 processors. The table’s shards should contain at most 1000000/(20*16)=3125 rows. These are the KQL commands to create the empty table and set the required policies and encoding: .create table embedding_vectors(vector_id:long, vector:dynamic) // more columns can be added .alter-merge table embedding_vectors policy sharding '{ "ShardEngineMaxRowCount" : 3125 }' .alter-merge table embedding_vectors policy merge '{ "RowCountUpperBoundForMerge" : 3125 }' .alter column embedding_vectors.vector policy encoding type = 'Vector16' Now we can ingest the vectors into the table. And here is a typical search query: let searched_vector = repeat(1536, 0); // to be replaced with real embedding vector. embedding_vectors | extend similarity = series_cosine_similarity_fl(vector, searched_vector, 1, 1) | top 10 by similarity desc The current semantic search times enable usage of ADX as embedding vectors storage platform for RAG (Retrieval Augmented Generation) scenarios and beyond, We continue to improve vector search performance, stay tuned!4.4KViews4likes2CommentsADX Continuous Export to Delta Table - Public Preview
We're excited to announce that continuous export to Delta table is now available in Preview. Continuous export in ADX allows you to export data from Kusto to an external table with a periodically run query.The results are stored in the external table, which defines the destination, such as Azure Blob Storage, and the schema of the exported data. This process guarantees that all records are exported "exactly once", with someexceptions. Continous export previously supported CSV, TSV, JSON and Parquet formats.4KViews1like1CommentThe #1 factor in ADX/KQL database performance
The most important thing determining the performance of a KQL query is making sure that the minimum part of the data is scanned. In almost all cases a filter on a datetime column is used to determine what part of the data is relevant for the query results. The filter can be expressed in many ways on the actual table or a joined table. All variations are returning the correct results but the difference in performance can be 50X The different variations are described, and the reasons why are performant, and some are not.1.9KViews0likes3Comments