Forum Discussion

Deyan_Petrov's avatar
Deyan_Petrov
Copper Contributor
Jan 22, 2024

Materialized view with inner join over 2 tables skips data

Hi,

 

I have 2 tables which are ingested from event hubs with standard injection policy (10 minutes etc).

 

I have created a materialized view which has roughly the following definition:

 

.create materialized-view TransactionsWithCustDataMatView on table Transactions
{
Transactions
| join kind=inner Customers on CustomerId
| where CreatedOn > LastModifiedOn1
| summarize arg_max(LastModifiedOn1, *) by TransactionId
}
 
The problem is that sometimes a customer registers and within 1 minute also performs a transactions. However, since Customers and Transactions are different tables, each with its own ingestion (batching), it happens that sometimes the transaction of a new customer is ingested before the corresponding customer is ingested, and then the above materialized view "skips" the transaction due to the inner join. 
 
Is there a way how to control/sync the ingestions of the 2 source table, or schedule the materialized view to run with some delay of 10-20 minutes (when ingestions to both tables would have happened)?
 
Br,
Deyan

4 Replies

  • Deyan_Petrov's avatar
    Deyan_Petrov
    Copper Contributor
    Note, the column CreatedOn belongs to Transactions table, and the column LastModifiedOn1 belongs to the Customers table. The purpose of this condition is to select the last data record for a customer at the time the transaction was executed (new customer data records may be written also after the transaction has been executed, but we need the customer data *at* the time the transaction was executed).
    • wernerzirkel's avatar
      wernerzirkel
      Brass Contributor

      Deyan_Petrov if it was my decision, I would try to avoid having this join in the materialized view definition. Instead, I would prepare a base table thich contains information of both tables. I would .append new data every 20 minutes in an ADF task from both tables. The materialized view would just refer to this base table.

      The basetable could have a retention policy of a few days, so you would not keep the data twice.

      • Deyan_Petrov's avatar
        Deyan_Petrov
        Copper Contributor
        wernerzirkel, thanks for the advice but how do you "prepare a base table thich contains information of both tables"? I have 2 different event hubs (Customers and Transactions) which are mapped via ingestion mapping each to a separate table. How do you map to different event hubs to the same table?

Resources