performance
120 TopicsReal-time performance monitoring for Azure SQL Database Managed Instance
First published on MSDN on Sep 26, 2018 This article was co-authored by Dimitri Furman and Denzil RibeiroReviewed by: Danimir Ljepava, Borko Novakovic, Jovan Popovic, Rajesh Setlem, Mike WeinerIntroductionIn our ongoing engagements with the SQL DB Managed Instance preview customers a common requirement has been to monitor database workload performance in real time.19KViews6likes4CommentsMost effecient way to identify duplicates in data?
We're moving data analytics towards Kusto and one feature we would like to have is to sanity-check our data and find duplicates of data in a dataset. The problem I want to solve is to make sure we don't accidentally ingest duplicates and then report too high sales numers. Our plan now is to introduce an extra column with a sha1 hash of the row and do something like "summarize count() by sha1 | where count_ > 1" ... but that would need to touch every column in the data set. I realize that's in the nature of the problem, but I'd just like to investigate strategies what would make this as effecient as possible. Strategies I've thought of would be to first limit the dataset to a certain timewindow or perhaps by customerId. I also know about the ingestion tags but I don't want to impact extents too much since this is more of a sanity-check. What other strategies could we use to make this operation as efficient as possible?19KViews0likes1Comment