Forum Discussion

Bartek_insight's avatar
Bartek_insight
Copper Contributor
Mar 25, 2024

Why does ADX caching result from related dimension table/mv/function

I'm testing materialized views based on the sample queries below:

  • Lookup to another materialized view (bar_mv) storing last timestamps in field LAST_FOO_TS for some key(a,b,c)
.create materialized-view with (dimensionMaterializedViews = "bar_mv") 
foo_mv on table events_table{
events_table 
|lookup materialized_view('bar_mv') on a,b,c
|where TIMESTAMP > LAST_FOO_TS
}
  • Lookup to stored function (bar_v) returning last timestamps in field LAST_FOO_TS for some key(a,b,c)
.create materialized-view
foo_mv1 on table events_table{
events_table 
|lookup bar_v() on a,b,c
|where TIMESTAMP > LAST_FOO_TS
}

In both cases, looks like ADX is caching results of the lookup materialized view/function and not refresing it for long time or ever. Is there any way to force refresh of those to get the newest values of LAST_FOO_TS in "where" condition?

5 Replies

  • Bartek_insight  ADX caches the results for a certain period by default to avoid unnecessary re-computation. This caching behavior can lead to data needing to be updated promptly. Unfortunately, there is no direct way to force an immediate refresh of the cached results. However, you can try the following approaches:

     

    1- Adjust Cache Policy: Check if you can adjust the cache policy for the materialized view or function. You can reduce the cache duration or disable caching altogether, but this could impact query performance.

     

    2- Use a Hybrid Approach: Consider using a hybrid approach in which you periodically refresh the materialized view or function. For example, schedule a job to refresh the data regularly, like every hour. This way, you balance performance gains from caching with the need for fresh data.

     

    Lastly, if real-time data is critical, you might need to manually trigger a refresh. You can do this by re-creating the materialized view or invoking the function again. This approach requires careful handling to avoid unnecessary overhead.

     

    Remember that caching is a trade-off between performance and freshness. Evaluate your specific use case and choose a strategy that best aligns with your requirements.

     

    I suggest you read:

    Guy_Reginiano post on Monitor and optimize the cache usage with ADX Insights

     and

    Bhaskar post on ADX Query Performance Unleashed: Best Practices and Pro Tips 

    --

    If this post is helpful, please give my response a thumbs up! You can also mark it as the solution to help others find it easily.

     

    Thanks

    • Bartek_insight's avatar
      Bartek_insight
      Copper Contributor

      BabatundeDallas thanks for your input. I will comment your proposals below:
      1. Caching policy has nothing to do in this case, it is only about storage setup if we want to store data in RAM/SSD or regular disks.

       

      2. What do you mean by refreshing materialized view? 

       

      3. Yes, real-time is a must, so I would need to recreate the materialized view each few seconds with backfill of 14 days, so it won't suite here.

       

      It, would be great if materialized view give an option to switch off subquery/lookup results...

      • BabatundeDallas's avatar
        BabatundeDallas
        Brass Contributor

        Bartek_insight You’re correct that the caching policy doesn’t directly impact your scenario; it was just a possible suggestion for debugging.

         

         When I mention “refreshing,” I mean updating the materialized view with fresh data. In ADX, materialized views are precomputed and stored. To refresh them, you’d typically recompute the view based on the underlying data. However, your requirement for real-time data makes frequent recomputation impractical.

         

        Recreating the materialized view every few seconds with a 14-day backfill is inefficient because it’s resource-intensive and may not align with your use case. I understand your need for an option to control subquery/lookup results. Unfortunately, ADX doesn’t currently provide a direct switch to disable caching or force immediate refresh for subqueries.

         

        However, I recommend exploring alternative approaches like Custom Logic, Scheduled Recomputation, or a Hybrid Approach.

         

        1) Implement custom logic in your queries to handle real-time conditions. For example, materialized view results can be combined with fresh data using UNION or other techniques.
        2) Schedule periodic recomputation of the materialized view, such as an hourly one, if feasible. This balances performance and freshness.
        3) Or keep the materialized view and query the raw data directly when real-time accuracy is essential. Use the materialized view for historical data and the raw data for recent updates.

         

        Note: ADX doesn’t offer an immediate solution to disable subquery caching. Exploring these alternatives might help you achieve your real-time requirements. I will contact the Microsoft developer team about new features that would provide ADX with a direct switch to disable caching or force immediate refresh for subqueries.

         

        --

        If this post is helpful, please give my response a thumbs up! You can also mark it as the solution to help others find it easily.

         

        Thanks