performance
558 TopicsPotential Consequences of Using Postgres as a Job Queue
Introduction At small scale, using Postgres as a job queue is totally fine, and I’d even say it’s the right call. Fewer moving parts, one less system to manage, ACID guarantees on your jobs. What’s not to love? The problem is that “small scale” has a ceiling, and the ceiling is lower than most people expect. When you’ve got thousands of concurrent workers hammering a jobs table with SELECT ... FOR UPDATE SKIP LOCKED , things start to behave in ways that aren’t obvious from the application layer. CPU usage creeps up. Also vacuum sometimes can’t keep up. Finally, in the wait event stats, you start seeing ominous entries like LWLock:MultiXactSLRU stacking up across many backends. This pattern has tripped up teams more than a few times, and it usually plays out the same way: everything works fine in dev and staging, then goes off a cliff in production once the concurrency gets real. So let’s dig into why this happens, and what the alternatives look like. The Typical Pattern When using Postgres as a job queue, the standard approach looks something like this: CREATE TABLE job_queue ( id bigserial PRIMARY KEY, status text NOT NULL DEFAULT 'pending', payload jsonb NOT NULL, created_at timestamptz NOT NULL DEFAULT now(), locked_by text, locked_at timestamptz ); CREATE INDEX idx_job_queue_status ON job_queue (status) WHERE status = 'pending'; Workers grab jobs with: UPDATE job_queue SET status = 'processing', locked_by = 'worker-42', locked_at = now() WHERE id = ( SELECT id FROM job_queue WHERE status = 'pending' ORDER BY created_at LIMIT 1 FOR UPDATE SKIP LOCKED ) RETURNING *; And then mark them done: UPDATE job_queue SET status = 'completed' WHERE id = $1; Some users may DELETE the row entirely. Either way, the lifecycle is: insert, lock-and-update, update-or-delete. Repeated thousands of times per second. At low concurrency, this works very smoothly. SKIP LOCKED means workers don’t block each other waiting for the same row. Postgres handles the locking, visibility, and ordering. It’s elegant. So where does it break? The MultiXact SLRU Problem When multiple transactions hold locks on the same row, Postgres stores the set of lockers as a MultiXact ID – a pointer into a side structure under pg_multixact/ . With SELECT ... FOR UPDATE SKIP LOCKED , users might think MultiXacts aren’t involved – after all, SKIP LOCKED is supposed to avoid contention. But in practice, with many concurrent workers all racing to lock rows, there are brief windows where multiple transactions reference the same row before one of them “wins” and the others skip. If you combine this with any FOR SHARE or FOR KEY SHARE locks (which are commonly created implicitly by foreign key checks), MultiXact IDs start accumulating quickly. The MultiXact data lives in SLRU buffers (Simple Least Recently Used) – a small, fixed-size shared memory cache. When backends need to read or write MultiXact data, they acquire LWLocks to access these buffers. Under high concurrency, this becomes a bottleneck: wait_event_type | wait_event -----------------+------------------- LWLock | MultiXactMemberSLRU LWLock | MultiXactOffsetSLRU You’ll see dozens or hundreds of backends piled up on these waits. The SLRU cache is small (by design – it’s a fixed number of pages in shared memory), and when the working set of MultiXact lookups exceeds what fits in the cache, you get constant eviction and re-reads from disk. Every lock acquisition and release on a job row potentially triggers a MultiXact SLRU lookup, and at thousands of concurrent sessions, those lookups serialize on LWLocks. The result: CPU gets pegged, throughput collapses, and latency spikes – not because the queries are expensive, but because the locking infrastructure itself is overwhelmed. Bloat: The Silent Killer The other side of this coin is table and index bloat. Every job row goes through multiple updates (and possibly a delete), and each of those operations creates a new tuple version in the heap. The old versions stick around until VACUUM cleans them up. On a busy job queue table: Dead tuples accumulate faster than autovacuum can clean them. By the time autovacuum finishes one pass, tens of thousands of new dead tuples have appeared. The table grows and grows. Index bloat compounds the problem. Every index on the table also accumulates dead entries. The partial index on status = 'pending' gets thrashed especially hard, since rows constantly enter and leave that condition. Sequential scans get slower. As the table bloats, even index scans start doing more I/O because the heap pages are sparsely populated. Vacuum reclaims space at the end of the table, but can’t reclaim space in the middle (unless the pages are completely empty). Job queue tables can grow to tens of gigabytes when the actual “live” data was only a few megabytes. It makes everything slower: scans, vacuum, even pg_dump. You can mitigate this by running vacuum more aggressively (lower autovacuum_vacuum_scale_factor , higher autovacuum_vacuum_cost_limit ), or by partitioning the table and dropping old partitions. But at some point, you’re fighting the fundamental mismatch between MVCC’s design goals and the write pattern of a job queue. CPU and Lock Overhead Beyond the SLRU contention and bloat, there’s just the raw overhead of using Postgres’s full transactional machinery for what is essentially a FIFO dispatch operation: Every lock/unlock is a full WAL-logged transaction. Grabbing a job writes WAL. Marking it complete writes WAL. Deleting it writes WAL. On a system processing thousands of jobs per second, the WAL volume from the job queue alone can saturate your wal_writer and checkpoint processes. SKIP LOCKED still touches rows. The name suggests rows are skipped, but Postgres still has to find them, check their lock status, and move on. With high concurrency, many workers end up scanning past the same locked rows before finding one they can claim. This is wasted CPU. Snapshot management overhead also becomes an issue. Each transaction needs a consistent snapshot, and with thousands of concurrent transactions, the ProcArray (the structure that tracks active transactions) becomes a contention point itself. You might see LWLock:ProcArrayLock waits alongside the MultiXact ones. Vacuum contention. While vacuum is cleaning up dead tuples, it needs locks too. On a table under constant write pressure, vacuum can interfere with the workers and vice versa. I’ve seen systems where disabling autovacuum on the job queue table improved throughput in the short term. Better Alternatives So what should you use instead? It depends on your requirements, but there are several options that handle high-throughput job dispatch more gracefully than a Postgres table. Advisory Locks (Staying in Postgres) If you want to stay within Postgres and avoid adding infrastructure, advisory locks are worth considering for certain queue patterns. Instead of locking rows, you lock on an abstract numeric key: -- Worker tries to acquire a lock on the job ID SELECT pg_try_advisory_lock(id) FROM job_queue WHERE status = 'pending' ORDER BY created_at LIMIT 1; Advisory locks are lightweight – they don’t touch the heap, don’t create MultiXact entries, and don’t generate dead tuples. They live entirely in shared memory. The trade-off is that you lose the atomicity of FOR UPDATE SKIP LOCKED : you need to handle the case where a lock is acquired but the job processing fails, and you need to release the lock explicitly (or rely on session-end cleanup). This approach works well when the queue depth is manageable and you want to avoid the MVCC overhead. But it’s still Postgres, so you’re still subject to connection limits, ProcArray overhead, and general resource contention at very high session counts. pgq (Skytools) pgq is purpose-built for exactly this problem. It’s a queue implementation that sits inside Postgres but uses a batching model that avoids most of the row-level locking and MVCC pitfalls. Events are written to a queue table, but consumers read them in batches and the queue maintenance is done via a ticker process that manages rotation. The key advantages: No row-level contention. Consumers don’t lock individual rows. Built-in batch processing. Events are consumed in chunks, reducing transaction overhead. Efficient cleanup. Old events are rotated out rather than vacuumed row-by-row. The downside is that pgq is not as actively maintained as it once was, and it adds operational complexity (the ticker daemon, consumer registration, etc.). But for teams already deep in the Postgres ecosystem, it’s a battle-tested option. Redis For many teams, Redis is the natural choice for job queues. Using Redis lists (BRPOPLPUSH or the Streams API), you get: Sub-millisecond dispatch latency. No disk I/O, no MVCC, no vacuum. Atomic pop operations. Workers grab jobs without any locking protocol. Simple scaling. Redis handles thousands of concurrent consumers trivially. The trade-off is durability. Redis can persist to disk, but it’s not ACID. If Redis crashes between a pop and the job completing, you might lose or duplicate work (though Redis Streams with consumer groups mitigate this significantly). For most job queue use cases, at-least-once delivery is acceptable, and Redis does that well. Kafka For truly high-throughput, distributed workloads, Apache Kafka is the heavyweight option. Kafka partitions give you parallel consumption with ordering guarantees per partition, durable storage, and replay capability. It’s the right tool when: You need to process thousands of events per second Multiple consumers need to read the same events You want event replay or audit trails Your architecture is already event-driven The operational overhead is nontrivial – ZooKeeper (or KRaft), brokers, topic management, consumer group coordination. But for teams already running Kafka for other reasons, adding a job queue topic is practically free. Choosing the Right Tool Here’s a rough decision guide: Under 100 concurrent workers, simple jobs, Postgres with SKIP LOCKED is fine Moderate concurrency, want to stay in Postgres, Advisory locks or pgq High throughput, low-latency dispatch, Redis (Lists or Streams) Massive scale, distributed, event replay, Kafka Many teams that start with Postgres (reasonably) hit scaling problems and then try to fix Postgres rather than recognizing that the workload has outgrown the tool. They throw more autovacuum workers at it, increase max_connections , add connection poolers – all of which help at the margins, but don’t address the fundamental issue: Postgres’s MVCC and locking machinery wasn’t designed for this access pattern at high concurrency. Conclusion Postgres is great, but it can’t be the best tool for every job. Using it as a job queue is a perfectly valid choice when your scale is modest. But when you’re running thousands of concurrent workers, the combination of MultiXact SLRU contention, heap bloat, vacuum pressure, and raw locking overhead will eventually push you toward a purpose-built solution. The good news is that you don’t have to rip out everything. Advisory locks can buy you headroom without adding infrastructure. Redis can handle dispatch while Postgres keeps owning the data. And if you’re already using Kafka, a job topic is a natural fit. Take your pick – there are many queueing options out there!31Views1like0CommentsStop Experimenting, Start Building: AI Apps & Agents Dev Days Has You Covered
The AI landscape has shifted. The question is no longer “Can we build AI applications?” it’s “Can we build AI applications that actually work in production?” Demos are easy. Reliable, scalable, resilient AI systems that handle real-world complexity? That’s where most teams struggle. If you’re an AI developer, software engineer, or solution architect who’s ready to move beyond prototypes and into production-grade AI, there’s a series built specifically for you. What Is AI Apps & Agents Dev Days? AI Apps & Agents Dev Days is a monthly technical series from Microsoft Reactor, delivered in partnership with Microsoft and NVIDIA. You can explore the full series at https://developer.microsoft.com/en-us/reactor/series/s-1590/ This isn’t a slide deck marathon. The series tagline says it best: “It’s not about slides, it’s about building.” Each session tackles real-world challenges, shares patterns that actually work, and digs into what’s next in AI-driven app and agent design. You bring your curiosity, your code, and your questions. You leave with something you can ship. The sessions are led by experienced engineers and advocates from both Microsoft and NVIDIA, people like Pamela Fox, Bruno Capuano, Anthony Shaw, Gwyneth Peña-Siguenza, and solutions architects from NVIDIA’s Cloud AI team. These aren’t theorists; they’re practitioners who build and ship the tools you use every day. What You’ll Learn The series covers the full spectrum of building AI applications and agent-based systems. Here are the key themes: Building AI Applications with Azure, GitHub, and Modern Tooling Sessions walk through how to wire up AI capabilities using Azure services, GitHub workflows, and the latest SDKs. The focus is always on code-first learning, you’ll see real implementations, not abstract architecture diagrams. Designing and Orchestrating AI Agents Agent development is one of the series’ strongest threads. Sessions cover how to build agents that orchestrate long-running workflows, persist state automatically, recover from failures, and pause for human-in-the-loop input, without losing progress. For example, the session “AI Agents That Don’t Break Under Pressure” demonstrates building durable, production-ready AI agents using the Microsoft Agent Framework, running on Azure Container Apps with NVIDIA serverless GPUs. Scaling LLM Inference and Deploying to Production Moving from a working prototype to a production deployment means grappling with inference performance, GPU infrastructure, and cost management. The series covers how to leverage NVIDIA GPU infrastructure alongside Azure services to scale inference effectively, including patterns for serverless GPU compute. Real-World Architecture Patterns Expect sessions on container-based deployments, distributed agent systems, and enterprise-grade architectures. You’ll learn how to use services like Azure Container Apps to host resilient AI workloads, how Foundry IQ fits into agent architectures as a trusted knowledge source, and how to make architectural decisions that balance performance, cost, and scalability. Why This Matters for Your Day Job There’s a critical gap between what most AI tutorials teach and what production systems actually require. This series bridges that gap: Production-ready patterns, not demos. Every session focuses on code and architecture you can take directly into your projects. You’ll learn patterns for state persistence, failure recovery, and durable execution — the things that break at 2 AM. Enterprise applicability. The scenarios covered — travel planning agents, multi-step workflows, GPU-accelerated inference — map directly to enterprise use cases. Whether you’re building internal tooling or customer-facing AI features, the patterns transfer. Honest trade-off discussions. The speakers don’t shy away from the hard questions: When do you need serverless GPUs versus dedicated compute? How do you handle agent failures gracefully? What does it actually cost to run these systems at scale? Watch On-Demand, Build at Your Own Pace Every session is available on-demand. You can watch, pause, and build along at your own pace, no need to rearrange your schedule. The full playlist is available at This is particularly valuable for technical content. Pause a session while you replicate the architecture in your own environment. Rewind when you need to catch a configuration detail. Build alongside the presenters rather than just watching passively. What You’ll Walk Away Wit After working through the series, you’ll have: Practical agent development skills — how to design, orchestrate, and deploy AI agents that handle real-world complexity, including state management, failure recovery, and human-in-the-loop patterns Production architecture patterns — battle-tested approaches for deploying AI workloads on Azure Container Apps, leveraging NVIDIA GPU infrastructure, and building resilient distributed systems Infrastructure decision-making confidence — a clearer understanding of when to use serverless GPUs, how to optimise inference costs, and how to choose the right compute strategy for your workload Working code and reference implementations — the sessions are built around live coding and sample applications (like the Travel Planner agent demo), giving you starting points you can adapt immediately A framework for continuous learning — with new sessions each month, you’ll stay current as the AI platform evolves and new capabilities emerge Start Building The AI applications that will matter most aren’t the ones with the flashiest demos — they’re the ones that work reliably, scale gracefully, and solve real problems. That’s exactly what this series helps you build. Whether you’re designing your first AI agent system or hardening an existing one for production, the AI Apps & Agents Dev Days sessions give you the patterns, tools, and practical knowledge to move forward with confidence. Explore the series at https://developer.microsoft.com/en-us/reactor/series/s-1590/ and start watching the on-demand sessions at the link above. The best time to level up your AI engineering skills was yesterday. The second-best time is right now and these sessions make it easy to start.Building a Restaurant Management System with Azure Database for MySQL
In this hands-on tutorial, we'll build a Restaurant Management System using Azure Database for MySQL. This project is perfect for beginners looking to understand cloud databases while creating something practical.1.3KViews5likes5CommentsCascading Read Replicas Now Generally Available!
We’re excited to announce the General Availability of cascading read replicas in Azure Database for PostgreSQL. This capability allows you to create read replicas for your Azure Database for PostgreSQL instance not only from a primary server, but also from existing read replicas, enabling multi‑level replication chains. Coordinating read‑heavy database workloads across multiple regions can be challenging, especially when you’re trying to deliver low‑latency read response experiences to users spread across different geographic locations. One effective way to address this is by placing read replicas closer to where your users are, allowing applications to serve read requests with significantly reduced latency and improved performance. What are cascading read replicas? With cascading read replicas, you can scale read‑intensive workloads more effectively, distribute read traffic efficiently, and support advanced deployment topologies such as globally distributed applications. Each read replica can act as a source for additional replicas, forming a tree‑like replication structure. For example, if your primary server is deployed in one region, you can create direct replicas in nearby regions and then cascade additional replicas to more distant locations. This approach helps spread read traffic evenly while minimizing latency for users around the world. We support up to 2 levels of replication with this feature. Level 1 will be all the read replicas and level 2 will be cascading read replicas. Why use cascading read replicas? Improved scalability Cascading read replicas support multi‑level replication, making it easier to handle high volumes of read traffic without overloading a single instance by scaling up to 30 read replicas. Geographic distribution By placing replicas closer to your global user base, you can significantly reduce read latency and deliver faster, more responsive application experiences. Efficient read traffic distribution Distributing read workloads across multiple replicas helps balance load, improving overall performance and reliability. Additionally, cascading read replicas offer operational flexibility. If you observe replication lag, you can perform a switchover operation between a cascading read replica with its source or intermediate replica, helping you maintain optimal performance and availability for your replicas. How does replication work with cascading read replicas? The primary server acts as a source for the read replica. Data is asynchronously replicated to these replicas. When we add cascading replicas, the previous replicas act as a data source for replication. In the diagram above, “primary-production-server” is the primary server with three read replicas. One of these replicas, “readreplica01”, serves as the source for another read replica, “readreplica11” which is a cascading read replica. With cascading read replicas, you can add up to five read replicas per source and replicate data across two levels, as shown in the diagram. This allows you to create up to 30 read replicas in total five read replicas directly from the primary server, and up to 25 additional replicas at the second level (each second-level replica can have up to five read replicas). If you notice replication lag between an intermediate read replica and a cascading read replica, you can use a switchover operation to swap “readreplica01” and “readreplica11”, helping reduce the impact of lag. To learn more about cascading read replicas, please refer to our documentation: Cascading read replicas Deploying cascading read replicas on Azure portal Navigate to the “Replication” tab and then click on “Create replica” highlighted in red as shown below: After creating a read replica as the below screenshot shows that you have 1 read replica that is attached to the primary instance. Click on the created replica and navigate to the replication tab, source server is “read-replica-01” and we will be creating a cascading read replica under this. Once cascading read replica is created you can see the role of “read-replica-01” has now changed to Source, Replica. You can perform site swap operation by clicking on the promote button for cascading read replica. Deploy cascading read replica with terraform: Before you start, make sure you have: An existing primary PostgreSQL Flexible Server At least one read replica already created from the primary AzureRM provider with latest version Proper permissions on the Azure subscription and resource group Configure the AzureRM Provider: Start by configuring the AzureRM provider in your Terraform project. terraform { required_providers { azurerm = { source = "hashicorp/azurerm" version = "~> 3.80" } } } provider "azurerm" { features {} } Reference the existing read replica server using the data block to reference the replica server. data "azurerm_postgresql_flexible_server" "source_replica" { name = "my-read-replica-1" resource_group_name = "my-resource-group" } Now create a new PostgreSQL Flexible Server and point it to the replica using create_source_server_id. resource "azurerm_postgresql_flexible_server" "cascading_replica" { name = "my-cascading-replica" resource_group_name = "my-resource-group" location = data.azurerm_postgresql_flexible_server.source_replica.location version = data.azurerm_postgresql_flexible_server.source_replica.version delegated_subnet_id = data.azurerm_postgresql_flexible_server.source_replica.delegated_subnet_id private_dns_zone_id = data.azurerm_postgresql_flexible_server.source_replica.private_dns_zone_id create_mode = "Replica" create_source_server_id = data.azurerm_postgresql_flexible_server.source_replica.id storage_mb = 32768 sku_name = "Standard_D4s_v3" depends_on = [ data.azurerm_postgresql_flexible_server.source_replica ] } Apply the Terraform Configuration terraform init terraform plan terraform apply Key Considerations Cascading read replicas allow for up to 5 read replicas and two levels of replication. Creating cascading read replicas is supported in PostgreSQL version 14 and above. Promote operation is not supported for intermediate read replicas with cascading read replicas. Conclusion Cascading read replicas in Azure Database for PostgreSQL offer a scalable way to distribute your read traffic across the same and different regions, reducing the read workload on primary database. For globally distributed applications, this can improve read latency as well as resilience and performance. This design supports horizontal scaling as your application demand grows, ensuring you can handle a high volume of read requests without compromising speed. Get started with this feature today and scale your read workloads.259Views1like0CommentsWhy Windows Should Adopt ReFS as a Bootable Filesystem
ReFS could become a bootable filesystem — it only needs a few missing layers. No need to copy NTFS, just implement what the Windows boot process requires. Key missing pieces: System‑level journaling (not only metadata) Full hardlink + extended attribute support EFS, ACLs, USN Journal for security + Windows Update Boot‑critical atomicity for safe system file updates Bootloader‑compatible APIs (BCD, BitLocker pre‑boot, WinRE, Secure Boot) Goals: Use NTFS as a reference map, add the missing capabilities to ReFS, and optimize them using ReFS features (copy‑on‑write, integrity streams, block cloning). Result: A modern, resilient filesystem that can finally boot Windows - without losing its benefits.134Views1like7CommentsHandling Unique Constraint Conflicts in Logical Replication
Authors: Ashutosh Sharma, Senior Software Engineer, and Gauri Kasar, Product Manager Logical replication can keep your PostgreSQL environments in sync, helping replicate selected tables with minimal impact on the primary workload. But what happens when your subscriber hits a duplicate key error and replication grinds to a halt? If you’ve seen a unique‑constraint violation while replicating between Azure Database for PostgreSQL servers, you’re not alone. This blog covers common causes, prevention tips, and practical recovery options. In PostgreSQL logical replication, the subscriber can fail with a unique-constraint error when it tries to apply a change that would create a duplicate key. duplicate key value violates unique constraint Understanding why this happens? When an INSERT or UPDATE would create a value that already exists in a column (or set of columns) protected by a UNIQUE constraint (including a PRIMARY KEY). In logical replication, this most commonly occurs because of local writes on the subscriber or if the table is being subscribed from multiple publishers. These conflicts are resolved on the subscriber side. Local writes on the subscriber: a row with the same primary key/unique key is inserted on the subscriber before the apply worker processes the corresponding change from the publisher. Multi-origin / multi-master without conflict-free keys: two origins generate (or replicate) the same unique key. Initial data synchronization issues: the subscriber already contains data when the subscription is created with initial copy enabled, resulting in duplicate inserts during the initial table sync. How to avoid this? Avoid local writes on subscribed tables (treat the subscriber as read-only for replicated relations). Avoid subscribing to the same table from multiple publishers unless you have explicit conflict handling and a conflict-free key design. Enabling server logs can help you identify and troubleshoot unique‑constraint conflicts more effectively. Refer to the official documentation to configure and access PostgreSQL logs. How to handle conflicts (recovery options) Option 1: Delete the conflicting row on the subscriber Use the subscriber logs to identify the key (or row) causing the conflict, then delete the row on the subscriber with a DELETE statement. Resume apply and repeat if more conflicts appear. Option 2: Use conflict logs and skip the conflicting transaction (PostgreSQL 17+) Starting with PostgreSQL 17, logical replication provides detailed conflict logging on the subscriber, making it easier to understand why replication stopped and which transaction caused the failure. When a replicated INSERT would violate a non‑deferrable unique constraint on the subscriber for example, when a row with the same key already exists the apply worker detects this as an insert_exists conflict and stops replication. In this case, PostgreSQL logs the conflict along with the transaction’s finish LSN, which uniquely identifies the failing transaction. ERROR: conflict detected on relation "public.t2": conflict=insert_exists ... in transaction 754, finished at 0/034F4090 ALTER SUBSCRIPTION <subscription_name> SKIP (lsn = '0/034F4090'); Option 3: Rebuild (re-sync) the table Rebuilding (re‑syncing) a table is the safest and most deterministic way to resolve logical replication conflicts caused by pre‑existing data differences or local writes on the subscriber. This approach is especially useful when a table repeatedly fails with unique‑constraint violations and it is unclear which rows are out of sync. Step 1 (subscriber): Disable the subscription. ALTER SUBSCRIPTION <subscription_name> DISABLE; Step 2 (subscriber): Remove the local copy of the table so it can be re-copied. TRUNCATE TABLE <conflicting_table>; Step 3 (publisher): Ensure the publication will (re)send the table (one approach is to recreate the publication entry for that table). ALTER PUBLICATION <pub_with_conflicting_table> DROP TABLE <conflicting_table>; CREATE PUBLICATION <pub_with_conflicting_table_rebuild> FOR TABLE <conflicting_table>; Step 4 (subscriber): Create a new subscription (or refresh the existing one) to re-copy the table. CREATE SUBSCRIPTION <sub_rebuild> CONNECTION '<connection_string>' PUBLICATION <pub_with_conflicting_table_rebuild>; Step 5 (subscriber): Re-enable the original subscription (if applicable). ALTER SUBSCRIPTION <subscription_name> ENABLE; Conclusion In most cases, these conflicts occur due to local changes on the subscriber or differences in data that existed before logical replication was fully synchronized. It is recommended to avoid direct modifications on subscribed tables and ensure that the replication setup is properly planned, especially when working with tables that have unique constraints.171Views1like0CommentsBuild a Fully Offline AI App with Foundry Local and CAG
A hands-on guide to building an on-device AI support agent using Context-Augmented Generation, JavaScript, and Foundry Local. You have probably heard the AI pitch: "just call our API." But what happens when your application needs to work without an internet connection? Perhaps your users are field engineers standing next to a pipeline in the middle of nowhere, or your organisation has strict data privacy requirements, or you simply want to build something that works without a cloud bill. This post walks you through how to build a fully offline, on-device AI application using Foundry Local and a pattern called Context-Augmented Generation (CAG). By the end, you will have a clear understanding of what CAG is, how it compares to RAG, and the practical steps to build your own solution. The finished application: a browser-based AI support agent that runs entirely on your machine. What Is Context-Augmented Generation? Context-Augmented Generation (CAG) is a pattern for making AI models useful with your own domain-specific content. Instead of hoping the model "knows" the answer from its training data, you pre-load your entire knowledge base into the model's context window at startup. Every query the model handles has access to all of your documents, all of the time. The flow is straightforward: Load your documents into memory when the application starts. Inject the most relevant documents into the prompt alongside the user's question. Generate a response grounded in your content. There is no retrieval pipeline, no vector database, and no embedding model. Your documents are read from disc, held in memory, and selected per query using simple keyword scoring. The model generates answers grounded in your content rather than relying on what it learnt during training. CAG vs RAG: Understanding the Trade-offs If you have explored AI application patterns before, you have likely encountered Retrieval-Augmented Generation (RAG). Both CAG and RAG solve the same core problem: grounding an AI model's answers in your own content. They take different approaches, and each has genuine strengths and limitations. CAG (Context-Augmented Generation) How it works: All documents are loaded at startup. The most relevant ones are selected per query using keyword scoring and injected into the prompt. Strengths: Drastically simpler architecture with no vector database, no embeddings, and no retrieval pipeline Works fully offline with no external services Minimal dependencies (just two npm packages in this sample) Near-instant document selection with no embedding latency Easy to set up, debug, and reason about Limitations: Constrained by the model's context window size Best suited to small, curated document sets (tens of documents, not thousands) Keyword scoring is less precise than semantic similarity for ambiguous queries Adding documents requires an application restart RAG (Retrieval-Augmented Generation) How it works: Documents are chunked, embedded into vectors, and stored in a database. At query time, the most semantically similar chunks are retrieved and injected into the prompt. Strengths: Scales to thousands or millions of documents Semantic search finds relevant content even when the user's wording differs from the source material Documents can be added or updated dynamically without restarting Fine-grained retrieval (chunk-level) can be more token-efficient for large collections Limitations: More complex architecture: requires an embedding model, a vector database, and a chunking strategy Retrieval quality depends heavily on chunking, embedding model choice, and tuning Additional latency from the embedding and search steps More dependencies and infrastructure to manage Want to compare these patterns hands-on? There is a RAG-based implementation of the same gas field scenario using vector search and embeddings. Clone both repositories, run them side by side, and see how the architectures differ in practice. When Should You Choose Which? Consideration Choose CAG Choose RAG Document count Tens of documents Hundreds or thousands Offline requirement Essential Optional (can run locally too) Setup complexity Minimal Moderate to high Document updates Infrequent (restart to reload) Frequent or dynamic Query precision Good for keyword-matchable content Better for semantically diverse queries Infrastructure None beyond the runtime Vector database, embedding model For the sample application in this post (20 gas engineering procedure documents on a local machine), CAG is the clear winner. If your use case grows to hundreds of documents or requires real-time ingestion, RAG becomes the better choice. Both patterns can run offline using Foundry Local. Foundry Local: Your On-Device AI Runtime Foundry Local is a lightweight runtime from Microsoft that downloads, manages, and serves language models entirely on your device. No cloud account, no API keys, no outbound network calls (after the initial model download). In this sample, your application is responsible for deciding which model to use, and it does that through the foundry-local-sdk . The app creates a FoundryLocalManager , asks the SDK for the local model catalogue, and then runs a small selection policy from src/modelSelector.js . That policy looks at the machine's available RAM, filters out models that are too large, ranks the remaining chat models by preference, and then returns the best fit for that device. Why does it work this way? Because shipping one fixed model would either exclude lower-spec machines or underuse more capable ones. A 14B model may be perfectly reasonable on a 32 GB workstation, but the same choice would be slow or unusable on an 8 GB laptop. By selecting at runtime, the same codebase can run across a wider range of developer machines without manual tuning. What makes it particularly useful for developers: No GPU required — runs on CPU or NPU, making it accessible on standard laptops and desktops Native SDK bindings — in-process inference via the foundry-local-sdk npm package, with no HTTP round-trips to a local server Automatic model management — downloads, caches, and loads models automatically Dynamic model selection — the SDK can evaluate your device's available RAM and pick the best model from the catalogue Real-time progress callbacks — ideal for building loading UIs that show download and initialisation progress The integration code is refreshingly minimal. Here is the core pattern: import { FoundryLocalManager } from "foundry-local-sdk"; // Create a manager and get the model catalogue const manager = FoundryLocalManager.create({ appName: "my-app" }); // Auto-select the best model for this device based on available RAM const models = await manager.catalog.getModels(); const model = selectBestModel(models); // Download if not cached, then load into memory if (!model.isCached) { await model.download((progress) => { console.log(`Download: ${progress.toFixed(0)}%`); }); } await model.load(); // Create a chat client for direct in-process inference const chatClient = model.createChatClient(); const response = await chatClient.completeChat([ { role: "system", content: "You are a helpful assistant." }, { role: "user", content: "How do I detect a gas leak?" } ]); That is it. No server configuration, no authentication tokens, no cloud provisioning. The model runs in the same process as your application. The download step matters for a simple reason: offline inference only works once the model files exist locally. The SDK checks whether the chosen model is already cached on the machine. If it is not, the application asks Foundry Local to download it once, store it locally, and then load it into memory. After that first run, the cached model can be reused, which is why subsequent launches are much faster and can operate without any network dependency. Put another way, there are two cooperating pieces here. Your application chooses which model is appropriate for the device and the scenario. Foundry Local and its SDK handle the mechanics of making that model available locally, caching it, loading it, and exposing a chat client for inference. That separation keeps the application logic clear whilst letting the runtime handle the heavy lifting. The Technology Stack The sample application is deliberately simple. No frameworks, no build steps, no Docker: Layer Technology Purpose AI Model Foundry Local + auto-selected model Runs locally via native SDK bindings; best model chosen for your device Back end Node.js + Express Lightweight HTTP server, everyone knows it Context Markdown files pre-loaded at startup No vector database, no embeddings, no retrieval step Front end Single HTML file with inline CSS No build step, mobile-responsive, field-ready The total dependency footprint is two npm packages: express and foundry-local-sdk . Architecture Overview The four-layer architecture, all running on a single machine. The system has four layers, all running in a single process on your device: Client layer: a single HTML file served by Express, with quick-action buttons and a responsive chat interface Server layer: Express.js starts immediately and serves the UI plus an SSE status endpoint; API routes handle chat (streaming and non-streaming), context listing, and health checks CAG engine: loads all domain documents at startup, selects the most relevant ones per query using keyword scoring, and injects them into the prompt AI layer: Foundry Local runs the auto-selected model on CPU/NPU via native SDK bindings (in-process inference, no HTTP round-trips) Building the Solution Step by Step Prerequisites You need two things installed on your machine: Node.js 20 or later: download from nodejs.org Foundry Local: Microsoft's on-device AI runtime: winget install Microsoft.FoundryLocal Foundry Local will automatically select and download the best model for your device the first time you run the application. You can override this by setting the FOUNDRY_MODEL environment variable to a specific model alias. Getting the Code Running # Clone the repository git clone https://github.com/leestott/local-cag.git cd local-cag # Install dependencies npm install # Start the server npm start Open http://127.0.0.1:3000 in your browser. You will see a loading overlay with a progress bar whilst the model downloads (first run only) and loads into memory. Once the model is ready, the overlay fades away and you can start chatting. Desktop view Mobile view How the CAG Pipeline Works Let us trace what happens when a user asks: "How do I detect a gas leak?" The query flow from browser to model and back. 1 Server starts and loads documents When you run npm start , the Express server starts on port 3000. All .md files in the docs/ folder are read, parsed (with optional YAML front-matter for title, category, and ID), and grouped by category. A document index is built listing all available topics. 2 Model is selected and loaded The model selector evaluates your system's available RAM and picks the best model from the Foundry Local catalogue. If the model is not already cached, it downloads it (with progress streamed to the browser via SSE). The model is then loaded into memory for in-process inference. 3 User sends a question The question arrives at the Express server. The chat engine selects the top 3 most relevant documents using keyword scoring. 4 Prompt is constructed The engine builds a messages array containing: the system prompt (with safety-first instructions), the document index (so the model knows all available topics), the 3 selected documents (approximately 6,000 characters), the conversation history, and the user's question. 5 Model generates a grounded response The prompt is sent to the locally loaded model via the Foundry Local SDK's native bindings. The response streams back token by token through Server-Sent Events to the browser. A response with safety warnings and step-by-step guidance The sources panel shows which documents were used Key Code Walkthrough Loading Documents (the Context Module) The context module reads all markdown files from the docs/ folder at startup. Each document can have optional YAML front-matter for metadata: // src/context.js export function loadDocuments() { const files = fs.readdirSync(config.docsDir) .filter(f => f.endsWith(".md")) .sort(); const docs = []; for (const file of files) { const raw = fs.readFileSync(path.join(config.docsDir, file), "utf-8"); const { meta, body } = parseFrontMatter(raw); docs.push({ id: meta.id || path.basename(file, ".md"), title: meta.title || file, category: meta.category || "General", content: body.trim(), }); } return docs; } There is no chunking, no vector computation, and no database. The documents are held in memory as plain text. Dynamic Model Selection Rather than hard-coding a model, the application evaluates your system at runtime: // src/modelSelector.js const totalRamMb = os.totalmem() / (1024 * 1024); const budgetMb = totalRamMb * 0.6; // Use up to 60% of system RAM // Filter to models that fit, rank by quality, boost cached models const candidates = allModels.filter(m => m.task === "chat-completion" && m.fileSizeMb <= budgetMb ); // Returns the best model: e.g. phi-4 on a 32 GB machine, // or phi-3.5-mini on a laptop with 8 GB RAM This means the same application runs on a powerful workstation (selecting a 14B parameter model) or a constrained laptop (selecting a 3.8B model), with no code changes required. This is worth calling out because it is one of the most practical parts of the sample. Developers do not have to decide up front which single model every user should run. The application makes that decision at startup based on the hardware budget you set, then asks Foundry Local to fetch the model if it is missing. The result is a smoother first-run experience and fewer support headaches when the same app is used on mixed hardware. The System Prompt For safety-critical domains, the system prompt is engineered to prioritise safety, prevent hallucination, and enforce structured responses: // src/prompts.js export const SYSTEM_PROMPT = `You are a local, offline support agent for gas field inspection and maintenance engineers. Behaviour Rules: - Always prioritise safety. If a procedure involves risk, explicitly call it out. - Do not hallucinate procedures, measurements, or tolerances. - If the answer is not in the provided context, say: "This information is not available in the local knowledge base." Response Format: - Summary (1-2 lines) - Safety Warnings (if applicable) - Step-by-step Guidance - Reference (document name + section)`; This pattern is transferable to any safety-critical domain: medical devices, electrical work, aviation maintenance, or chemical handling. Adapting This for Your Own Domain The sample project is designed to be forked and adapted. Here is how to make it yours in three steps: 1. Replace the documents Delete the gas engineering documents in docs/ and add your own markdown files. The context module handles any markdown content with optional YAML front-matter: --- title: Troubleshooting Widget Errors category: Support id: KB-001 --- # Troubleshooting Widget Errors ...your content here... 2. Edit the system prompt Open src/prompts.js and rewrite the system prompt for your domain. Keep the structure (summary, safety, steps, reference) and update the language to match your users' expectations. 3. Override the model (optional) By default the application auto-selects the best model. To force a specific model: # See available models foundry model list # Force a smaller, faster model FOUNDRY_MODEL=phi-3.5-mini npm start # Or a larger, higher-quality model FOUNDRY_MODEL=phi-4 npm start Smaller models give faster responses on constrained devices. Larger models give better quality. The auto-selector picks the largest model that fits within 60% of your system RAM. Building a Field-Ready UI The front end is a single HTML file with inline CSS. No React, no build tooling, no bundler. This keeps the project accessible to beginners and easy to deploy. Design decisions that matter for field use: Dark, high-contrast theme with 18px base font size for readability in bright sunlight Large touch targets (minimum 48px) for operation with gloves or PPE Quick-action buttons for common questions, so engineers do not need to type on a phone Responsive layout that works from 320px to 1920px+ screen widths Streaming responses via SSE, so the user sees tokens arriving in real time The mobile chat experience, optimised for field use. Visual Startup Progress with SSE A standout feature of this application is the loading experience. When the user opens the browser, they see a progress overlay showing exactly what the application is doing: Loading domain documents Initialising the Foundry Local SDK Selecting the best model for the device Downloading the model (with a percentage progress bar, first run only) Loading the model into memory This works because the Express server starts before the model finishes loading. The browser connects immediately and receives real-time status updates via Server-Sent Events. Chat endpoints return 503 whilst the model is loading, so the UI cannot send queries prematurely. // Server-side: broadcast status to all connected browsers function broadcastStatus(state) { initState = state; const payload = `data: ${JSON.stringify(state)}\n\n`; for (const client of statusClients) { client.write(payload); } } // During initialisation: broadcastStatus({ stage: "downloading", message: "Downloading phi-4...", progress: 42 }); This pattern is worth adopting in any application where model loading takes more than a few seconds. Users should never stare at a blank screen wondering whether something is broken. Testing The project includes unit tests using the built-in Node.js test runner, with no extra test framework needed: # Run all tests npm test Tests cover configuration, server endpoints, and document loading. Use them as a starting point when you adapt the project for your own domain. Ideas for Extending the Project Once you have the basics running, there are plenty of directions to explore: Conversation memory: persist chat history across sessions using local storage or a lightweight database Hybrid CAG + RAG: add a vector retrieval step for larger document collections that exceed the context window Multi-modal support: add image-based queries (photographing a fault code, for example) PWA packaging: make it installable as a standalone offline application on mobile devices Custom model fine-tuning: fine-tune a model on your domain data for even better answers Ready to Build Your Own? Clone the CAG sample, swap in your own documents, and have an offline AI agent running in minutes. Or compare it with the RAG approach to see which pattern suits your use case best. Get the CAG Sample Get the RAG Sample Summary Building a local AI application does not require a PhD in machine learning or a cloud budget. With Foundry Local, Node.js, and a set of domain documents, you can create a fully offline, mobile-responsive AI agent that answers questions grounded in your own content. The key takeaways: CAG is ideal for small, curated document sets where simplicity and offline capability matter most. No vector database, no embeddings, no retrieval pipeline. RAG scales further when you have hundreds or thousands of documents, or need semantic search for ambiguous queries. See the local-rag sample to compare. Foundry Local makes on-device AI accessible: native SDK bindings, in-process inference, automatic model selection, and no GPU required. The architecture is transferable. Replace the gas engineering documents with your own content, update the system prompt, and you have a domain-specific AI agent for any field. Start simple, iterate outwards. Begin with CAG and a handful of documents. If your needs outgrow the context window, graduate to RAG. Both patterns can run entirely offline. Clone the repository, swap in your own documents, and start building. The best way to learn is to get your hands on the code. This project is open source under the MIT licence. It is a scenario sample for learning and experimentation, not production medical or safety advice. local-cag on GitHub · local-rag on GitHub · Foundry LocalBuilding a Smart Building HVAC Digital Twin with AI Copilot Using Foundry Local
Introduction Building operations teams face a constant challenge: optimizing HVAC systems for energy efficiency while maintaining occupant comfort and air quality. Traditional building management systems display raw sensor data, temperatures, pressures, CO₂ levels—but translating this into actionable insights requires deep HVAC expertise. What if operators could simply ask "Why is the third floor so warm?" and get an intelligent answer grounded in real building state? This article demonstrates building a sample smart building digital twin with an AI-powered operations copilot, implemented using DigitalTwin, React, Three.js, and Microsoft Foundry Local. You'll learn how to architect physics-based simulators that model thermal dynamics, implement 3D visualizations of building systems, integrate natural language AI control, and design fault injection systems for testing and training. Whether you're building IoT platforms for commercial real estate, designing energy management systems, or implementing predictive maintenance for building automation, this sample provides proven patterns for intelligent facility operations. Why Digital Twins Matter for Building Operations Physical buildings generate enormous operational data but lack intelligent interpretation layers. A 50,000 square foot office building might have 500+ sensors streaming metrics every minute, zone temperatures, humidity levels, equipment runtimes, energy consumption. Traditional BMS (Building Management Systems) visualize this data as charts and gauges, but operators must manually correlate patterns, diagnose issues, and predict failures. Digital twins solve this through physics-based simulation coupled with AI interpretation. Instead of just displaying current temperature readings, a digital twin models thermal dynamics, heat transfer rates, HVAC response characteristics, occupancy impacts. When conditions deviate from expectations, the twin compares observed versus predicted states, identifying root causes. Layer AI on top, and operators get natural language explanations: "The conference room is 3 degrees too warm because the VAV damper is stuck at 40% open, reducing airflow by 60%." This application focuses on HVAC, the largest building energy consumer, typically 40-50% of total usage. Optimizing HVAC by just 10% through better controls can save thousands of dollars monthly while improving occupant satisfaction. The digital twin enables "what-if" scenarios before making changes: "What happens to energy consumption and comfort if we raise the cooling setpoint by 2 degrees during peak demand response events?" Architecture: Three-Tier Digital Twin System The application implements a clean three-tier architecture separating visualization, simulation, and state management: The frontend uses React with Three.js for 3D visualization. Users see an interactive 3D model of the three-floor building with color-coded zones indicating temperature and CO₂ levels. Click any equipment, AHUs, VAVs, chillers, to see detailed telemetry. The control panel enables adjusting setpoints, running simulation steps, and activating demand response scenarios. Real-time charts display KPIs: energy consumption, comfort compliance, air quality levels. The backend Node.js/Express server orchestrates simulation and state management. It maintains the digital twin state as JSON, the single source of truth for all equipment, zones, and telemetry. REST API endpoints handle control requests, simulation steps, and AI copilot queries. WebSocket connections push real-time updates to the frontend for live monitoring. The HVAC simulator implements physics-based models: 1R1C thermal models for zones, affinity laws for fan power, chiller COP calculations, CO₂ mass balance equations. Foundry Local provides AI copilot capabilities. The backend uses foundry-local-sdk to query locally running models. Natural language queries ("How's the lobby temperature?") get answered with building state context. The copilot can explain anomalies, suggest optimizations, and even execute commands when explicitly requested. Implementing Physics-Based HVAC Simulation Accurate simulation requires modeling actual HVAC physics. The simulator implements several established building energy models: // backend/src/simulator/thermal-model.js class ZoneThermalModel { // 1R1C (one resistance, one capacitance) thermal model static calculateTemperatureChange(zone, delta_t_seconds) { const C_thermal = zone.volume * 1.2 * 1000; // Heat capacity (J/K) const R_thermal = zone.r_value * zone.envelope_area; // Thermal resistance // Internal heat gains (occupancy, equipment, lighting) const Q_internal = zone.occupancy * 100 + // 100W per person zone.equipment_load + zone.lighting_load; // Cooling/heating from HVAC const airflow_kg_s = zone.vav.airflow_cfm * 0.0004719; // CFM to kg/s const c_p_air = 1006; // Specific heat of air (J/kg·K) const Q_hvac = airflow_kg_s * c_p_air * (zone.vav.supply_temp - zone.temperature); // Envelope losses const Q_envelope = (zone.outdoor_temp - zone.temperature) / R_thermal; // Net energy balance const Q_net = Q_internal + Q_hvac + Q_envelope; // Temperature change: Q = C * dT/dt const dT = (Q_net / C_thermal) * delta_t_seconds; return zone.temperature + dT; } } This model captures essential thermal dynamics while remaining computationally fast enough for real-time simulation. It accounts for internal heat generation from occupants and equipment, HVAC cooling/heating contributions, and heat loss through the building envelope. The CO₂ model uses mass balance equations: class AirQualityModel { static calculateCO2Change(zone, delta_t_seconds) { // CO₂ generation from occupants const G_co2 = zone.occupancy * 0.0052; // L/s per person at rest // Outdoor air ventilation rate const V_oa = zone.vav.outdoor_air_cfm * 0.000471947; // CFM to m³/s // CO₂ concentration difference (indoor - outdoor) const delta_CO2 = zone.co2_ppm - 400; // Outdoor ~400ppm // Mass balance: dC/dt = (G - V*ΔC) / Volume const dCO2_dt = (G_co2 - V_oa * delta_CO2) / zone.volume; return zone.co2_ppm + (dCO2_dt * delta_t_seconds); } } These models execute every simulation step, updating the entire building state: async function simulateStep(twin, timestep_minutes) { const delta_t = timestep_minutes * 60; // Convert to seconds // Update each zone for (const zone of twin.zones) { zone.temperature = ZoneThermalModel.calculateTemperatureChange(zone, delta_t); zone.co2_ppm = AirQualityModel.calculateCO2Change(zone, delta_t); } // Update equipment based on zone demands for (const vav of twin.vavs) { updateVAVOperation(vav, twin.zones); } for (const ahu of twin.ahus) { updateAHUOperation(ahu, twin.vavs); } updateChillerOperation(twin.chiller, twin.ahus); updateBoilerOperation(twin.boiler, twin.ahus); // Calculate system KPIs twin.kpis = calculateSystemKPIs(twin); // Detect alerts twin.alerts = detectAnomalies(twin); // Persist updated state await saveTwinState(twin); return twin; } 3D Visualization with React and Three.js The frontend renders an interactive 3D building view that updates in real-time as conditions change. Using React Three Fiber simplifies Three.js integration with React's component model: // frontend/src/components/BuildingView3D.jsx import { Canvas } from '@react-three/fiber'; import { OrbitControls } from '@react-three/drei'; export function BuildingView3D({ twinState }) { return ( {/* Render building floors */} {twinState.zones.map(zone => ( selectZone(zone.id)} /> ))} {/* Render equipment */} {twinState.ahus.map(ahu => ( ))} ); } function ZoneMesh({ zone, onClick }) { const color = getTemperatureColor(zone.temperature, zone.setpoint); return ( ); } function getTemperatureColor(current, setpoint) { const deviation = current - setpoint; if (Math.abs(deviation) < 1) return '#00ff00'; // Green: comfortable if (Math.abs(deviation) < 3) return '#ffff00'; // Yellow: acceptable return '#ff0000'; // Red: uncomfortable } This visualization immediately shows building state at a glance, operators see "hot spots" in red, comfortable zones in green, and can click any area for detailed metrics. Integrating AI Copilot for Natural Language Control The AI copilot transforms building data into conversational insights. Instead of navigating multiple screens, operators simply ask questions: // backend/src/routes/copilot.js import { FoundryLocalClient } from 'foundry-local-sdk'; const foundry = new FoundryLocalClient({ endpoint: process.env.FOUNDRY_LOCAL_ENDPOINT }); router.post('/api/copilot/chat', async (req, res) => { const { message } = req.body; // Load current building state const twin = await loadTwinState(); // Build context for AI const context = buildBuildingContext(twin); const completion = await foundry.chat.completions.create({ model: 'phi-4', messages: [ { role: 'system', content: `You are an HVAC operations assistant for a 3-floor office building. Current Building State: ${context} Answer questions about equipment status, comfort conditions, and energy usage. Provide specific, actionable information based on the current data. Do not speculate beyond provided information.` }, { role: 'user', content: message } ], temperature: 0.3, max_tokens: 300 }); res.json({ response: completion.choices[0].message.content, model: 'phi-4', timestamp: new Date().toISOString() }); }); function buildBuildingContext(twin) { const alerts = twin.alerts.filter(a => a.active); const zoneStatus = twin.zones.map(z => `${z.name}: ${z.temperature.toFixed(1)}°F (setpoint: ${z.setpoint}°F), CO₂: ${z.co2_ppm}ppm, ${z.occupancy} people` ); return ` Zones: ${zoneStatus.join('\n')} Active Alerts: ${alerts.length} ${alerts.map(a => `- ${a.severity}: ${a.message}`).join('\n')} Energy: ${twin.kpis.total_power_kw.toFixed(1)} kW Chiller: ${twin.chiller.percent_load}% load, ${twin.chiller.cop.toFixed(2)} COP `.trim(); } Example interactions: User: "Why is the third floor conference room warm?" AI: "The Executive Suite on Floor 3 is currently 75.2°F, 3 degrees above the 72°F setpoint. This is due to high occupancy (8 people) and VAV-F3-EXEC operating at reduced airflow (285 CFM vs 450 CFM capacity). AHU-2 is supplying adequate cool air at 55°F. Recommend checking VAV damper position—it may be partially stuck." User: "What's our current energy usage?" AI: "Total building power consumption is 47.3 kW. Breakdown: Chiller (24.1 kW, 51%), AHU fans (12.8 kW, 27%), Pumps (7.2 kW, 15%), Other (3.2 kW, 7%). Chiller is operating at 62% load with a COP of 4.8, which is efficient for current conditions." User: "Activate demand response level 2" AI: "I cannot directly control building systems for safety reasons. To activate DR Level 2, use the Control Panel and click 'Demand Response' → 'Level 2'. This will raise cooling setpoints by 3°F and reduce auxiliary loads, targeting 15% energy reduction." The AI provides grounded, specific answers citing actual equipment IDs and metrics. It refuses to directly execute control commands, instead guiding operators to explicit control interfaces, a critical safety pattern for building systems. Fault Injection for Testing and Training Real building operations experience equipment failures, stuck dampers, sensor drift, communication losses. The digital twin includes comprehensive fault injection capabilities to train operators and test control logic: // backend/src/simulator/fault-injector.js const FAULT_CATALOG = { chillerFailure: { description: 'Chiller compressor failure', apply: (twin) => { twin.chiller.status = 'FAULT'; twin.chiller.cooling_output = 0; twin.alerts.push({ id: 'chiller-fault', severity: 'CRITICAL', message: 'Chiller compressor failure - no cooling available', equipment: 'CHILLER-01' }); } }, stuckVAVDamper: { description: 'VAV damper stuck at current position', apply: (twin, vavId) => { const vav = twin.vavs.find(v => v.id === vavId); vav.damper_stuck = true; vav.damper_position_fixed = vav.damper_position; twin.alerts.push({ id: `vav-stuck-${vavId}`, severity: 'HIGH', message: `VAV ${vavId} damper stuck at ${vav.damper_position}%`, equipment: vavId }); } }, sensorDrift: { description: 'Temperature sensor reading 5°F high', apply: (twin, zoneId) => { const zone = twin.zones.find(z => z.id === zoneId); zone.sensor_drift = 5.0; zone.temperature_measured = zone.temperature_actual + 5.0; } }, communicationLoss: { description: 'Equipment communication timeout', apply: (twin, equipmentId) => { const equipment = findEquipmentById(twin, equipmentId); equipment.comm_status = 'OFFLINE'; equipment.stale_data = true; twin.alerts.push({ id: `comm-loss-${equipmentId}`, severity: 'MEDIUM', message: `Lost communication with ${equipmentId}`, equipment: equipmentId }); } } }; router.post('/api/twin/fault', async (req, res) => { const { faultType, targetEquipment } = req.body; const twin = await loadTwinState(); const fault = FAULT_CATALOG[faultType]; if (!fault) { return res.status(400).json({ error: 'Unknown fault type' }); } fault.apply(twin, targetEquipment); await saveTwinState(twin); res.json({ message: `Applied fault: ${fault.description}`, affectedEquipment: targetEquipment, timestamp: new Date().toISOString() }); }); Operators can inject faults to practice diagnosis and response. Training scenarios might include: "The chiller just failed during a heat wave, how do you maintain comfort?" or "Multiple VAV dampers are stuck, which zones need immediate attention?" Key Takeaways and Production Deployment Building a physics-based digital twin with AI capabilities requires balancing simulation accuracy with computational performance, providing intuitive visualization while maintaining technical depth, and enabling AI assistance without compromising safety. Key architectural lessons: Physics models enable prediction: Comparing predicted vs observed behavior identifies anomalies that simple thresholds miss 3D visualization improves spatial understanding: Operators immediately see which floors or zones need attention AI copilots accelerate diagnosis: Natural language queries get answers in seconds vs. minutes of manual data examination Fault injection validates readiness: Testing failure scenarios prepares operators for real incidents JSON state enables integration: Simple file-based state makes connecting to real BMS systems straightforward For production deployment, connect the twin to actual building systems via BACnet, Modbus, or MQTT integrations. Replace simulated telemetry with real sensor streams. Calibrate model parameters against historical building performance. Implement continuous learning where the twin's predictions improve as it observes actual building behavior. The complete implementation with simulation engine, 3D visualization, AI copilot, and fault injection system is available at github.com/leestott/DigitalTwin. Clone the repository and run the startup scripts to explore the digital twin, no building hardware required. Resources and Further Reading Smart Building HVAC Digital Twin Repository - Complete source code and simulation engine Setup and Quick Start Guide - Installation instructions and usage examples Microsoft Foundry Local Documentation - AI integration reference HVAC Simulation Documentation - Physics model details and calibration Three.js Documentation - 3D visualization framework ASHRAE Standards - Building energy modeling standardsNative Vulkan in Windows System Manifest
Vulkan’s been in Windows for ages as a loader and runtime, but the OS still doesn't use it as a native backend. It's wild because so many studios start with Vulkan now and only port to DirectX later, which just adds a ton of extra work for no reason. If Windows actually supported Vulkan natively, we’d get rid of those translation layers that cause CPU overhead and shaky frame times. In CPU-bound games, we’re talking 20–30% more performance just by cutting out the middleman. Plus, it would mean consistent performance across NVIDIA, AMD, and Intel. I really think Vulkan should be a priority in Windows, not just an add-on. What do you guys think? Is it worth pushing Microsoft on this, or are they too locked into DX12?" Community Call to Action If you believe Windows deserves a modern, efficient graphics backend — If you’ve ever seen performance lost to translation layers — If you want Vulkan to be treated as a first-class citizen inside Windows — Then speak up. Share your thoughts. Test, compare, and challenge the status quo. This isn’t just about games. It’s about the future of UI, recovery, and system performance. Let’s show Microsoft that the community is ready for native Vulkan. No translation. No compromise.133Views0likes3Comments