How do you take two great Azure services and create a greater solution? You integrate them together. What if there is a built-in integration between the two?
Azure Cosmos DB and Azure Data Explorer (ADX) services deal with big data with semi-structured formats being generated with high throughput.
Azure Cosmos DB is a strong transactional database: zoom in a micro-partition, scan a few records and return data in milliseconds. Example: 30 000 IoT sensors emitting readings every 5 seconds ; what is the latest temperature value of sensor 4A12?
ADX is a strong analytical database specialized in time series (logs, time series and user activities): take a slice of time, scan billions of records to return aggregations under a second. Example: same IoT scenario ; over the last hour, what are the 10 sensors which temperature values fluctuated the most?
Both services complement each other. The new Azure Cosmos to Azure Data Explorer Synapse Link integration is designed to bring them together, to get the best of both world (Hybrid Transactional-Analytical Processing or HTAP): a world class transactional & analytical solution. This done in a managed way without you having to get the proverbial duct tape out!
Here are five reasons why you use this new integration.
Having an out-of-the-box integration saves you cycles at both deployment time and also during the lifetime of your solution.
The new integration takes seconds to setup and the data flow from Cosmos DB to ADX. With streaming ingestion, the latency between a document being committed to Cosmos DB and being queryable in ADX can be as low as a few seconds, making it a real time integration.
Building you own integration between the two services would require many components (e.g. Azure Function & Azure Event Hub) you need to provision (and script to automate provisioning). You would then need to monitor those components and learn their failure patterns to monitor them well.
The new data connection is an ARM resource that can be scripted with ARM / Bicep Template or Terraform easily. It leverages existing ingestion metrics so you can monitor it now with known metrics.
The data connection is built with resilience and all corner cases in mind and will behave in a predictable manner when different failures occur in your cloud environment (e.g. Cosmos DB is down, ADX Cluster is shut down / restart, etc.).
ADX allows you to get all kind of insights about your NoSQL data. You can ask a lot of questions you can't easily ask a transactional database. You can aggregate data across partitions over billions of records in seconds. You can look at time series trend. You can compare users / sensors / devices / etc. and find the needle in the hay stack.
You can enrich your Cosmos DB data with referenced data. You can join with data coming from other Cosmos DB containers or completely different sources and get a richer information from combining those data.
When ingesting Cosmos DB data into ADX, you have complete control over your data: you can filter the data you want to ingest (e.g. dropping application logic specific data), route different document types in different Kusto tables, expand an array of values over multiple rows, etc. .
Your Cosmos DB container data might be polymorphic: different documents might have different schema. You might have sensor readings for temperature and GPS that have different format. You might have an e-commerce site transaction mixed with cart states and user profiles. Inferring a schema fitting all those documents can lead to complexity. An easier path can be to route each of those document types to different Kusto tables where they can be represented with strong schema.
If a part of a document does vary and it doesn't make sense to expend it to a strong schema, you can leave it as JSON, as Kusto understand and index JSON.
Since ADX can ingest Azure Cosmos DB data in real time and can be queried in real time by those visualization tool, you can visualize your Cosmos DB data in real time with hundreds of concurrent users!
A common scenario for the new data connection is to do analytics on the "latest state" of a Cosmos DB container.
Another common scenario is to look at the evolution, over time, of documents in Cosmos DB. Each time a document is updated, it creates a record in Cosmos DB' change feed which can be ingested as a separate record in ADX. This can make ADX into an audit log / archive of a Cosmos DB container.
From there one can perform time retrieval queries but also time series analysis. For instance, which documents have been edited the most in the last 30 days? Which products are trending up? What seasonality patterns have the sales of product X?
In summary, Azure Cosmos DB and Azure Data Explorer can complement each other in a data solution. The former excel as a transactional workload while the latter is unparalleled in analytics.
The new data connection, Azure Cosmos DB to Azure Data Explorer Synapse Link, provides a managed experience so you can get up and running quickly and keep the TCO of the integration low. It enables you to get analytical insights of your Cosmos DB data. It allows you to control your data and schema as it enters ADX. Different Data Visualization tools (e.g. Power BI) can then connect on ADX in real time with hundreds of concurrent users. Finally, you can create an audit trail / archive of your Cosmos DB data.
Try the data connection on your data. With a couple of clicks you can get data flowing in ADX and start asking questions about your data. Learning about your data is an empowering experience and ADX is a proven tool on that journey!
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.