codestories
15 TopicsEssential Microsoft Resources for MVPs & the Tech Community from the AI Tour
Unlock the power of Microsoft AI with redeliverable technical presentations, hands-on workshops, and open-source curriculum from the Microsoft AI Tour! Whether you’re a Microsoft MVP, Developer, or IT Professional, these expertly crafted resources empower you to teach, train, and lead AI adoption in your community. Explore top breakout sessions covering GitHub Copilot, Azure AI, Generative AI, and security best practices—designed to simplify AI integration and accelerate digital transformation. Dive into interactive workshops that provide real-world applications of AI technologies. Take it a step further with Microsoft’s Open-Source AI Curriculum, offering beginner-friendly courses on AI, Machine Learning, Data Science, Cybersecurity, and GitHub Copilot—perfect for upskilling teams and fostering innovation. Don’t just learn—lead. Access these resources, host impactful training sessions, and drive AI adoption in your organization. Start sharing today! Explore now: Microsoft AI Tour Resources.Excited to be at JavaOne 2022 in person!
I will be a part of JavaOne’s technical keynote on Day 2 of the conference (Oct 19 at 1:15 PM), where I will talk about how several Microsoft products and divisions run on Java and how we empower Java developers at one of our customer organizations to achieve truly relevant business impact. Below is our line-up of sessions and booth talks. I hope you will join us at JavaOne. I look forward to seeing you there.How To: Retrieve from CosmosDB using Azure API Management
In this How To, I will show a simple mechanism for reading items from CosmosDB using Azure API Management (APIM). There are many scenarios where you might want to do this in order to leverage the capabilities of APIM while having a highly scalable, flexible data store.From Tunisian classroom full of boys to architect for Canadian government: A journey of perseverance
Hamida Rebai Trabelsi is a passionate learner who juggles multiple roles with ease: a senior technology professional at one of Canada’s government agencies, a Microsoft MVP, a mother. Learn more about her.Postgres as a Distributed Cache Unlocks Speed and Simplicity for Modern .NET Workloads
In the world of high-performance, modern software engineering, developers often face a tough tradeoff: how to achieve lightning-fast data retrieval response rates without adding complexity, sacrificing reliability, or getting locked into specialized, external data caching products or platforms. What if you could harness the power and flexibility of your existing Postgres database to solve this challenge? Enter the Microsoft.Extensions.Caching.Postgres library, a new nuget.org package that brings distributed caching to Postgres, unlocking speed, simplicity, and seamless integration for modern .NET workloads. In this article, we’re going to take a closer look at the Postgres caching store, which introduces a new option for .NET developers planning on implementing a distributed cache, such as HybridCache, paired together with a Postgres database to provide distributed backplane operations. One data platform for multiple workloads Postgres’ reputation for reliability, extensibility, and standards compliance has long been respected, with Postgres databases driving some of today’s largest and most popular platforms. Increasingly developers, data engineers, and entrepreneurs alike all rallying to apply these benefits. One of the most compelling aspects of Postgres is its adaptability: it’s a data platform that can simultaneously handle everything from transactional workloads to analytical queries, JSON documents to geospatial data, and even time-series and vectorized AI search. In an era of specialized services, Postgres is proving that one platform can do it all and do it well. Intrepid engineers have also discovered that Postgres is often just as proficient in handling workloads traditionally supported by other very different technology solutions, such as lake-house, pub-sub, message queues, job schedulers, and session store caches. These roles are all now being powered by Postgres databases, while Postgres simultaneously continues to deliver the same scalable, battle-tested, and mission-critical ACID-compliant core relational database operations we’ve all come to expect. When speed matters most Database-backed cache stores are by no means a new concept; the first version of a database cache library for .NET was made available to developers exploring the nuget.org ecosystem (Microsoft.Extensions.Caching.SqlServer) in June 2016. This library included several impressive features, such as expiration policies, serialization, and dependency injection, making it ideal for multi-instance applications requiring shared cache functionality. It was especially useful in environments where Redis or other cache providers were not available. The convenience of leveraging a transactional database’s usefulness to function as a distributed cache comes with some tradeoffs, especially when compared against services such as Redis or Memcached; in a word: speed. All the features which make your database data durable, reliable, and consistent require precious additional clock cycles and I/O operations, and this “overhead” resulted in performance costs when compared to the alternative memory stores and caching system options. What if it was possible to maintain all those familiar and convenient interfaces for connecting to your database, while simultaneously being able to precisely configure specific tables to throw off the burden of crash consistency and replication logging? What if, for only the tables we selected, we could trade this durability for pure speed? Enter Postgres’ UNLOGGED Tables. Postgres' adaptable performance Another compelling aspect of Postgres databases is the ability to significantly speed up write-performance by bypassing the Write Ahead Log (WAL). The WAL is designed to ensure that data is crash-consistent (and replicable), and writing to your database is comprised of a transparent two-step process: your data is written to your database tables, and these changes are also committed to a separate file to guarantee the data’s persistence. It also happens that in some circumstances, the tradeoff to increase performance can be worth the sacrifice to crash-consistency, especially for short-lived, temporary types of data, like when used as a cache store. This table configuration is scoped to individual tables, which allows for combinations of “logged” and “unlogged” table configurations, both operating side-by-side within the same database instance. The net result: Postgres can provide incredibly performant response times when used as a distributed cache, rivaling the performance of other popular cache stores, while also providing the simplicity, familiarity, and consistency that the Postgres engine naturally offers. HybridCache for your .NET solutions It was this capability*combined with the inspiration from the SQL Server library that inspired the creation of the nuget.org Microsoft.Extensions.Caching.Postgres package. As a longtime .NET developer, I have personally witnessed the incredible evolution of the .NET platform and the amazing growth, enhancements, and improvements to the languages, the tooling, runtimes, and the incredible people behind each of these contributions. The recent addition of HybridCache is especially exciting to consider incorporating into your .NET solutions because it dramatically simplifies the steps required to add caching into your project, while simultaneously linking in-memory cache with a second-level tiered cache service. This seamless integration provides your application with the best of both worlds: blazing fast in-memory retrieval paired with a resilient fail-safe and similarly performant backplane in the event an application instance blinks, scales up/out, etc. Don’t just take my word for it, let’s look at some of the benchmarks between a Redis cache and Postgres database. The tests are comprised of synchronous and async operations across three different sized payloads (128, 1024, and 10240 bytes) for read/write, containing both single and concurrent messages, and at fixed and random positions. The tests are further divided into two types of cache expiration windows: absolute/non-sliding and sliding/relative windows. Consider the output from a suite of benchmarks tests, keeping in mind these results are based on microseconds, meaning 1,000 microseconds equals 1 millisecond: What do these results reveal? In certain respects, there aren’t that many surprises. Bespoke memory-based key-value cache systems like Redis continue to outperform relational databases in terms of pure speed and low latency. What is really exciting to see is that Postgres comes very close to Redis performance for more intensive operations! Finding the right fit for your solution I’m excited to make this Postgres package available to everyone considering distributed caching in their solution designs. The combination of HybridCache paired with your choice of backplane will allow you to select the right technologies and tools that are best suited for your solution. Our GitHub repo also contains a variety of sample applications, which demonstrate how to configure and use HybridCache together with the Postgres distributed cache library within a Console app service, as well as an Aspire-based sample Web API. I encourage you to explore these examples and share your thoughts and ideas. I look forward to any feedback you may have to share about the Microsoft.Extensions.Caching.Postgres package. Keep advancing, keep improving, and keep contributing to be a “builder” and an active part of our incredible community! * This package extension is highly configurable, and you can choose whether to enable/disable bypassing the WAL for your cache table, along with several other options that can be adjusted for your particular use case.Optimizing Change Data Capture (CDC) on PostgreSQL for Enhanced Data Management
Managing the ODS layer becomes crucial, especially when source systems lack physical primary keys. An Online Data Store (ODS) helps offload OLTP workloads by shifting certain queries to alternative databases. While some customers leverage secondary servers for this purpose, database replication in an active-active mode can enhance efficiency. Typically, customers attempt to use secondary servers to reduce the primary servers' workload. Database replication is an effective strategy in these cases, operating in an active-active mode to ensure operational efficiency. Microsoft Fabric offers a mirroring option on OneLake to optimize resource utilization when moving data from OLTP databases such as SQL Server. This feature allows for data replication without delay, ensuring seamless data availability. The need for such a feature in PostgreSQL environments is evident, especially for customer scenarios. For instance, customers running multiple PostgreSQL OLTP systems aim to enable their ODS and Data Warehouse (DW) on Fabric in both real-time and near real-time. This approach not only reduces workloads on PostgreSQL OLTP systems but also facilitates downstream analytics. However, it's important to note that this functionality is currently available for PostgreSQL in Private Preview only. This article explores the options we have until this option becomes generally available. Identifying Change Data Capture (CDC) on PostgreSQL WAL2JSON Utility: This method utilizes PostgreSQL's Write-Ahead Logging (WAL) to capture data changes in JSON format. It provides flexibility for custom processing and error handling but requires manual intervention for WAL file cleanup and operates in near real-time. RTI (Debezium) CDC Connector: A Fabric-integrated solution that automates data capture from PostgreSQL to Eventhouse using Eventstream. It ensures efficient data movement by storing changes in a structured payload format, including metadata such as operation type and transaction details. However, it requires PostgreSQL administrators to manage WAL slots for optimal performance. Comparison of CDC Options CDC Method Description Pros Cons RTI CDC Connector Efficiently moves data from PostgreSQL to Fabric (Eventstream/Eventhouse). Stores data in a payload format, capturing database type, operation, and changed record. Automated, integrates well with Fabric, captures full change history Requires PostgreSQL team management for WAL slot cleanup WAL2JSON Method Provides programmatic control for error tracking and WAL file cleanup. Operates in near real-time. More direct control, flexible error handling Requires manual intervention, slightly delayed processing Approaches for Data Management Truncate and Load (Direct Source Connection) Establish a direct connection to the source tables in PostgreSQL for efficient data extraction. Truncate and reload tables in the Silver layer to maintain data integrity and consistency. Implement a logging mechanism in the Landing table to ensure modifications are accurately recorded and traceable. Despite initial resistance from the Source PostgreSQL team, increasing workloads on PostgreSQL is necessary to enhance performance and meet project goals. Adding a Row ID Column in the Source Table A row ID column is essential to uniquely identify records and facilitate necessary deletions and insertions. However, the Source Systems have not approved this modification, limiting our ability to implement this approach. Using Transaction ID in PostgreSQL Transaction IDs are inconsistent across operations and may overlap with other database transactions. Due to this inconsistency, they cannot be reliably used for tracking changes. Truncate and Load Using the Landing Table Select distinct records with the latest timestamp from the Landing table and truncate/load them into the Silver layer. Implementation Strategy: Develop two separate procedures, one for tables with a primary key and another for tables without a primary key. Move non-primary key tables to primary key-based tables once key columns become available, as determined by respective table owners. Performance Consideration: Large tables may take significant time to process. Current Outcome: This approach has not yielded the expected results. Replica Identity FULL Approach for only Non Primary key tables REPLICA IDENTITY FULL is enabled for Tables with no primary in Postgre. This helps in capturing data for all columns in WAL log irrespective of if they are changed or not. Sample WAL record: Retrieve old record details from the Before Values JSON, use them for deletions, and insert the updated records into the table. When key columns are unavailable, generate a hash key for the "before values" and use it to identify and delete corresponding records in the target table. To optimize performance, an additional hash key column can be created in the target table, reducing delete operation times. Primary key tables will not use hash key columns & Non-primary key tables will incorporate hash key columns. It is assumed that the application layer will send accurate updates to prevent duplicate records. Considerations: Logging Overhead: Increased Write-Ahead Log (WAL) file generation may impact performance and storage. ETL Recommendation: The Product Team recommends an Extract, Transform, Load (ETL) approach as the preferred solution. Bronze-to-Silver Loading: Re-evaluate the strategy once the Mirroring feature becomes available, as it will be essential for this implementation. Performance Impact: Setting Replica Identity to FULL will increase WAL size and add logging overhead, potentially affecting database performance. Data Integrity: The ETL process must ensure accurate change tracking to maintain data integrity. Testing: Conduct thorough staging environment testing to assess performance implications and validate the replication process. Conclusion Adding a Row ID column (Option 2) is the recommended approach, as it allows precise identification of records. However, if modifying source tables is not feasible, using Replica Identity Full (Option 5) is a viable alternative despite its impact on performance and storage. Establishing a Standard Operating Procedure (SOP) in collaboration with the source system to identify and utilize primary keys or unique keys whenever available will significantly enhance the overall efficiency and accuracy of the process. References Debezium for PostgreSQL Documentation Mirroring Azure Database for PostgreSQL Flexible Server in Microsoft Fabric Add PostgreSQL Database CDC source to an eventstream