monitoring

10 Topics

Drasi is Fluent in GQL: Integrating the New Graph Query Standard
Drasi , the open-source Rust data change processing platform, simplifies the creation of change-driven systems through continuous queries, reactions, and clearly defined change semantics. Continuous queries enable developers to specify precisely what data changes matter, track these changes in real-time, and react immediately as changes occur. Unlike traditional database queries, which provide static snapshots of data, continuous queries constantly maintain an up-to-date view of query results, automatically notifying reactions of precise additions, updates, and deletions to the result set as they happen. To date, Drasi has supported only openCypher for writing continuous queries; openCypher is a powerful declarative graph query language. Recently, Drasi has added support for Graph Query Language (GQL), the new international ISO standard for querying property graphs. In this article, we describe what GQL means for writing continuous queries and describe how we implemented GQL Support. A Standardized Future for Graph Queries GQL is the first officially standardized database language since SQL in 1987. Published by ISO/IEC in April 2024, it defines a global specification for querying property graphs. Unlike the relational model that structures data into tables, the property graph model structures data inside of the database as a graph. With GQL support, Drasi enables users to benefit from a query language that we expect to be widely adopted across the database industry, ensuring compatibility with future standards in graph querying. Drasi continues to support openCypher, allowing users to select the query language that best fits their requirements and existing knowledge. With the introduction of GQL, Drasi users can now write continuous queries using the new international standard. Example GQL Continuous Query: Counting Unique Messages Event-driven architectures traditionally involve overhead for parsing event payloads, filtering irrelevant data, and managing contextual state to identify precise data transitions. Drasi eliminates much of this complexity through continuous queries, which maintain accurate real-time views of data and generate change notifications. Imagine a simple database with a message table containing the text of each message. Suppose you want to know, in real-time, how many times the same message has been sent. Traditionally, addressing these types of scenarios involves polling databases at set intervals, using middleware to detect state changes, and developing custom logic to handle reactions. It could also mean setting up change data capture (CDC) to feed a message broker and process events through a stream processing system. These methods can quickly become complex and difficult, especially when handling numerous or more sophisticated scenarios. Drasi simplifies this process by employing a change-driven architecture. Rather than relying on polling or other methods, Drasi uses continuous queries that actively monitor data for specific conditions. The moment a specified condition is met or changes, Drasi proactively sends notifications, ensuring real-time responsiveness. The following example shows the continuous query in GQL that counts the frequency of each unique message: MATCH  (m:Message) LET Message = m.Message RETURN Message, count(Message) AS Frequency You can explore this example in the Drasi Getting Started tutorial. Key Features of the GQL Language OpenCypher had a significant influence on GQL and there are many things in common between the two languages; however, there are also some important differences. A new statement introduced in GQL is NEXT, which enables linear composition of multiple statements. It forms a pipeline where each subsequent statement receives the working table resulting from the previous statement. One application for NEXT is the ability to filter results after an aggregation. For example, to find colors associated with more than five vehicles, the following query can be used: MATCH (v:Vehicle) RETURN v.color AS color, count(v) AS vehicle_count NEXT FILTER vehicle_count > 5 RETURN color, vehicle_count Equivalent openCypher: MATCH (v:Vehicle) WITH v.color AS color, count(v) AS vehicle_count WHERE vehicle_count > 5 RETURN color, vehicle_count GQL introduces additional clauses and statements: LET, YIELD, and FILTER. The LET statement allows users to define new variables or computed fields for every row in the current working table. Each LET expression can reference existing columns in scope, and the resulting variables are added as new columns. Example: MATCH (v:Vehicle) LET makeAndModel = v.make + ' ' + v.model RETURN makeAndModel, v.year Equivalent openCypher: MATCH (v:Vehicle) WITH v, v.make + ' ' + v.model AS makeAndModel RETURN makeAndModel, v.year The YIELD clause projects and optionally renames columns from the working table, limiting the set of columns available in scope. Only specified columns remain in scope after YIELD. Example: MATCH (v:Vehicle)-[e:LOCATED_IN]->(z:Zone) YIELD v.color AS vehicleColor, z.type AS location RETURN vehicleColor, location FILTER is a standalone statement that removes rows from the current working table based on a specified condition. While GQL still supports a WHERE clause for filtering during the MATCH phase, the FILTER statement provides additional flexibility by allowing results to be filtered after previous steps. It does not create a new table; instead, it updates the working table. Unlike openCypher’s WHERE clause, which is tied to a MATCH or WITH, GQL's FILTER can be applied independently at various points in the query pipeline. Example: MATCH (n:Person) FILTER n.age > 30 RETURN n.name, n.age GQL also provides control in how aggregations are grouped. The GROUP BY clause can be used to explicitly define the grouping keys, ensuring results are aggregated exactly as intended. MATCH (v:Vehicle)-[:LOCATED_IN]->(z:Zone) RETURN z.type AS zone_type, v.color AS vehicle_color, count(v) AS vehicle_count GROUP BY zone_type, vehicle_color If the GROUP BY clause is omitted, GQL defaults to an implicit grouping behavior, having all non-aggregated columns in the RETURN clause automatically used as the grouping keys. While many of the core concepts, like pattern matching, projections, and filtering, will feel familiar to openCypher users, GQL’s statements are distinct in their usage. Supporting these differences in Drasi required design changes, described in the following section, that led to multiple query languages within the platform. Refactoring Drasi for Multi-Language Query Support Instead of migrating Drasi from openCypher to GQL, we saw this as an opportunity to address multi-language support in the system. Drasi's initial architecture was designed exclusively for openCypher. In this model, the query parser generated an Abstract Syntax Tree (AST) for openCypher. The execution engine was designed to process this AST format, executing the query it represented to produce the resulting dataset. Built‑in functions (such as toUpper() for string case conversion) followed openCypher naming and were implemented within the same module as the engine. This created an architectural challenge for supporting additional query languages, such as GQL. To enable multi-language support, the system was refactored to separate the parsing, execution, and function management. A key insight was that the existing AST structure, originally created for openCypher, was flexible enough to be used for GQL. Although GQL and openCypher are different languages, their core operations, matching patterns, filtering data, and projecting results, could be represented by this AST. The diagram shows the dependencies within this new architecture, highlighting the separation and interaction between the components. The language-specific function modules for openCypher and GQL provide the functions to the execution engine. The language-specific parsers for openCypher and GQL produce an AST, and the execution engine operates on this AST. The engine only needs to understand this AST format, making it language-agnostic. The AST structure is based on a sequence of QueryPart objects. Each QueryPart represents a distinct stage of the query, containing clauses for matching, filtering, and returning data. The execution engine processes these QueryParts sequentially. pub struct QueryPart { pub match_clauses: Vec<MatchClause>, pub where_clauses: Vec<Expression>, pub return_clause: ProjectionClause, } The process begins when a query is submitted in either GQL or openCypher. The query is first directed to its corresponding language-specific parser, which handles the lexical analysis and transforms the raw query string into the standardized AST. When data changes occur in the graph, the execution engine uses the MATCH clauses from the first QueryPart to find affected graph patterns and captures the matched data. This matched data then flows through each QueryPart in sequence. The WHERE portion of the AST filters out data that does not meet the specified conditions. The RETURN portion transforms the data by selecting specific fields, computing new values, or performing aggregations. Each QueryPart's output becomes the next one's input, creating a pipeline that incrementally produces query results as the underlying graph changes. To support functions from multiple languages in this AST, we introduced a function registry to abstract a function's name from its implementation. Function names can differ (e.g., toUpper() in openCypher versus Upper() in GQL). For any given query, language-specific modules populate this registry, mapping each function name to its corresponding behavior. Functions with shared logic can be implemented once in the engine and registered under multiple names in specific function crates, preventing code duplication. Meanwhile, language-exclusive functions can be registered and implemented separately within their respective modules. When processing an AST, the engine uses the registry attached to that query to resolve and execute the correct function. The separate function modules allow developers to introduce their own function registry, supporting custom implementations or names. Conclusion By adding support for GQL, Drasi now offers developers a choice between openCypher and the new GQL standard. This capability ensures that teams can use the syntax that best fits their skills and project requirements. In addition, the architectural changes set the foundation for additional query languages. You can check out the code on our GitHub organization, dig into the technical details on our documentation site, and join our developer community on Discord.
CollinBrian
Oct 09, 2025 Place Linux and Open Source Blog
271Views
1like
0Comments
The Open-Source Paradox: How Microsoft is giving back
Microsoft is sponsoring All Things Open 2025 to address open-source sustainability by showcasing solutions and fostering community collaboration.
LachlanEvenson
Oct 02, 2025 Place Linux and Open Source Blog
393Views
1like
0Comments
Troubleshooting Network Issues with Retina
An overview of how Retina solves key challenges of performing packet captures in a Kubernetes environment, and additional debug tools which are provided within the Retina Shell.
kamilp
Aug 22, 2025 Place Linux and Open Source Blog
881Views
2likes
0Comments
Responding to the Absence of Change in Change-Driven Systems
Drasi, an open-source Data Change Processing Platform, simplifies the creation of change-driven systems because it provides a consistent way of thinking about, detecting, and reacting to change. Sometimes, you need to detect and react when data doesn’t change. Drasi provides an approach to detecting the absence of change and makes building such systems easy.  When there is no change  In the world of change-driven systems, certain scenarios challenge conventional response mechanisms. Among these challenges is the subtle yet complex problem of responding to the absence of change rather than the arrival of an individual event. This nuanced requirement often arises in monitoring systems, IoT devices, and other applications where a condition must persist for a given duration to warrant a reaction.  Consider an example: a freezer’s temperature sensor emits an event when the temperature changes, and at one point, the temperature registers above 32°F. While this measurement is significant, the system should only react if the freezer’s temperature remains above 32°F for at least 15 minutes. There is, however, no explicit event that confirms this persistence. The difficulty lies in establishing a reliable mechanism to track and respond to sustained states without direct event notification of their continuity.  We’ll describe Polling and Timers, which are traditional solutions, and then describe how Drasi solves this problem. Traditional solutions Polling To solve this, polling often serves as a standard approach. In this method, the system would periodically scan the last 15 minutes of data to determine if the temperature was above the threshold continuously for 15 minutes. This approach is inherently limited by its non-real-time nature, as the system only identifies qualifying conditions during scheduled intervals. Consequently, there may be delays in detecting and responding to critical conditions, especially in scenarios where timely action is paramount. Furthermore, polling can lead to increased computational overhead, especially in large-scale systems, as it requires frequent queries to ensure no conditions are missed. Timers An alternative to polling involves leveraging the initial event that triggers a state change to start a timer. In this approach, the system initiates a countdown the moment a condition arises, such as the temperature rising above 32°F. If the condition persists for the defined threshold (15 minutes for the freezer), the system initiates the required response. Conversely, if the condition is resolved before the timer expires, the timer is canceled. While this approach addresses some limitations of polling by introducing real-time responsiveness, it introduces its own complexities and overhead. Managing timers at scale is not trivial, particularly in distributed systems with thousands of tracked conditions. Each timer must be initiated, monitored, and terminated. To implement initiation, monitoring, and termination effectively, a specialized timer management service must be built or adopted. This service needs to handle the management of timers, ensure high reliability, and scale to volumes. Ensuring failover and recovery mechanisms for timers, particularly in distributed systems, introduces further complexity. For example, if a node managing active timers fails, the system must ensure that no timer is lost or incorrectly reset, which often requires sophisticated state replication and recovery strategies. Ultimately, this timer-based approach necessitates the deployment and management of custom-built services. These services bring inherent costs not only in terms of development and maintenance but also in operational overhead. As such, while this method can deliver superior responsiveness compared to polling, its implementation comes with a steep tradeoff in system complexity and costs. Drasi to detect the absence of change Central to Drasi is the Continuous Query Pattern, implemented using the openCypher graph query language. A Continuous Query runs perpetually, fed by change logs from one or more data sources, maintaining the current query result set and generating notifications when those results change. Unlike producer-defined event streams, this pattern empowers consumers to specify the relevant properties and their relationships using a familiar database-type query. Drasi solves the “absence of change” problem through a suite of “future” functions, within a Continuous Query. Verifying Sustained Conditions with Drasi: A Freezer Monitoring Example The freezer example can be expressed as a simple openCypher query, using the “trueFor” function unique to Drasi. The “trueFor” function takes an expression that must evaluate to “true” for the duration specified, if this expression holds true for the entire length of the duration specified, the WHERE clause will resolve to true and only then will a notification be emitted that a new item has been added to the result set. MATCH     (f:Freezer)  WHERE drasi.trueFor(f.temp > 32, duration( { minutes: 15 } ))  RETURN    f.id AS id,    f.temp AS temp  Under the hood To achieve this, internally Drasi leverages a specialized priority queue with unique access patterns that is ordered by future timestamps. When the WHERE clause is first evaluated, some metadata about the associated graph elements is pushed into the priority queue, this metadata can later be used to surgically re-evaluate a given condition using cached indexes. The position in the queue will be determined by the future timestamp at which the condition can be re-evaluated. The "trueFor" function takes a condition and a duration of how long the condition needs to be true. The function will only return ‘true’ when the condition has held true continuously for the specified duration. Let's consider the freezer example with the following temperature changes: At 12:00 - The freezer temp is 35  At 12:01 - The freezer temp is 36  At 12:02 - The freezer temp is 30  At 12:14 - The freezer temp is 34 Given the value of 30 at 12:02 and the value of 34 at 12:14, the alert should not fire until 12:29. To achieve this, the time at which the freezer crosses 32 degrees needs to be tracked so that it can be determined if the condition has been true for at least 15 minutes. When the query engine first evaluates this function, it will test the “temp > 32” expression passed to it. If the condition resolves true, then the element metadata is added to the queue, only if it is not already on the queue. If the condition resolves false, and if that metadata is already on the queue, it is removed from the queue, because continuity has been broken. If that metadata reaches the head of the queue and its timestamp elapses, the element is reprocessed through the query, and the function returns a “true” result which triggers a reaction. The priority queue would look as follows for each change (where "f1" represents the metadata for "Freezer 1"): Future-Time Evaluation with Drasi: A Payment Authorization Example The continuity feature of the “trueFor” function may not be desired in every use case. Take another example: an online payment system, where a payment is first authorized and the customer funds are put on hold to secure an order. If the order is not completed within fifteen minutes, then the funds must be released, and the reserved inventory must be made available again. This example can also be expressed as a simple openCypher query, using the “trueLater” function. This function takes an expression that must evaluate to “true” at a given future time. If it evaluates to “true” at the given future time, the WHERE clause will resolve to true and only then will a notification be emitted that a new item has been added to the result set. MATCH (p:Payment) WHERE drasi.trueLater(p.status = ‘auth’, p.exipres_at) RETURN p.id, p.amount, p.customer Under the hood When the WHERE clause is first evaluated, if the timestamp provided to the function is in the future, the function will push the element metadata to the priority queue and return an "AWAITING" result, which is the equivalent of false, and in the payment example the WHERE clause filters out this potential result. If the provided timestamp is in the past, the function will return the result of evaluating the condition. Try out the “Absence of Change” tutorial to see these functions in action. Conclusion Detecting the absence of change in change-driven systems is a subtle yet critical challenge, often complicated by the inefficiencies of traditional approaches like polling or the complexities of managing timers at scale. Drasi revolutionizes this process with the Continuous Query Pattern and powerful functions like "trueFor" and "trueLater", enabling developers to build responsive, scalable systems with ease. By leveraging familiar openCypher queries, Drasi eliminates the need for cumbersome custom services, delivering real-time reactions with minimal overhead. Drasi offers a streamlined, elegant solution. Ready to simplify your change-driven systems? Explore Drasi today, experiment with its Continuous Queries, and join the conversation to share your insights! Further reading: Reference | Drasi Docs Join the Drasi community If you're a developer interested in solving real-world problems, exploring modern architectures, or just looking to contribute to something meaningful, we’d love to have you onboard. You can check out the code on our GitHub organization, dig into the technical details on our documentation site, and join our developer community on Discord.
CollinBrian
Jul 22, 2025 Place Linux and Open Source Blog
213Views
2likes
0Comments
Ubuntu Pro FIPS 22.04 LTS on Azure: Secure, compliant, and optimized for regulated industries
Organizations across government (including local and federal agencies and their contractors), finance, healthcare, and other regulated industries running workloads on Microsoft Azure now have a streamlined path to meet rigorous FIPS 140-3 compliance requirements. Canonical is pleased to announce the availability of Ubuntu Pro FIPS 22.04 LTS on the Azure Marketplace, featuring newly certified cryptographic modules. This offering extends the stability and comprehensive security features of Ubuntu Pro, tailored for state agencies, federal contractors, and industries requiring a FIPS-validated foundation on Azure. It provides the enterprise-grade Ubuntu experience, optimized for performance on Azure in collaboration with Microsoft, and enhanced with critical compliance capabilities. For instance, if you are building a Software as a Service (SaaS) application on Azure that requires FedRAMP authorization, utilizing Ubuntu Pro FIPS 22.04 LTS can help you meet specific controls like SC-13 (Cryptographic Protection), as FIPS 140-3 validated modules are a foundational requirement. This significantly streamlines your path to achieving FedRAMP compliance. What is FIPS 140-3 and why does it matter? FIPS 140-3 is the latest iteration of the benchmark U.S. government standard for validating cryptographic module implementations, superseding FIPS 140-2. Managed by NIST, it's essential for federal agencies and contractors and is a recognized best practice in many regulated industries like finance and healthcare. Using FIPS-validated components helps ensure cryptography is implemented correctly, protecting sensitive data in transit and at rest. Ubuntu Pro FIPS 22.04 LTS includes FIPS 140-3 certified versions of the Linux kernel and key cryptographic libraries (like OpenSSL, Libgcrypt, GnuTLS) pre-enabled, which are drop-in replacements for the standard packages, greatly simplifying deployment for compliance needs. The importance of security updates (fips-updates) A FIPS certificate applies to a specific module version at its validation time. Over time, new vulnerabilities (CVEs) are discovered in these certified modules. Running code with known vulnerabilities poses a significant security risk. This creates a tension between strict certification adherence and maintaining real-world security. Recognizing this, Canonical provides security fixes for the FIPS modules via the fips-updates stream, available through Ubuntu Pro. We ensure these security patches do not alter the validated cryptographic functions. This approach aligns with modern security thinking, including recent FedRAMP guidance, which acknowledges the greater risk posed by unpatched vulnerabilities compared to solely relying on the original certified binaries. Canonical strongly recommends all users enable the fips-updates repository to ensure their systems are both compliant and secure against the latest threats. FIPS 140-3 vs 140-2 The new FIPS 140-3 standard includes modern ciphers such as TLS v1.3, as well as deprecating older algorithms like MD5. If you are upgrading systems and workloads to FIPS 140-3, it will be necessary to perform rigorous testing to ensure that applications continue to work correctly. Compliance tooling Included Ubuntu Pro FIPS also includes access to Canonical's Ubuntu Security Guide (USG) tooling, which assists with automated hardening and compliance checks against benchmarks like CIS and DISA-STIG, a key requirement for FedRAMP deployments. How to get Ubuntu Pro FIPS on Azure You can leverage Ubuntu Pro FIPS 22.04 LTS on Azure in two main ways: Deploy the Marketplace Image: Launch a new VM directly from the dedicated Ubuntu Pro FIPS 22.04 LTS listing on the Azure Marketplace. This image comes with the FIPS modules pre-enabled for immediate use. Enable on an Existing Ubuntu Pro VM: If you already have an Ubuntu Pro 22.04 LTS VM running on Azure, you can enable the FIPS modules using the Ubuntu Pro Client (pro enable fips-updates). Upgrading standard Ubuntu: If you have a standard Ubuntu 22.04 LTS VM on Azure, you first need to attach Ubuntu Pro to it. This is a straightforward process detailed in the Azure documentation for getting Ubuntu Pro. Once Pro is attached, you can enable FIPS as described above. Learn More Ubuntu Pro FIPS provides a robust, maintained, and compliant foundation for your sensitive workloads on Azure. Watch Joel Sisko from Microsoft speak with Ubuntu experts in this webinar Explore all features of Ubuntu Pro on Azure Read details on the FIPS 140-3 certification for Ubuntu 22.04 LTS Official NIST certification link
shreyabaheti
May 20, 2025 Place Linux and Open Source Blog
344Views
2likes
0Comments
Optimizing Change-Driven Architectures with Drasi
By: Allen Jones, Principal Software Engineer, Azure Incubations, Microsoft The need to detect and react to data changes is pervasive across modern software systems. Whether it’s Kubernetes adapting to resource updates, building management systems responding to sensor shifts, or enterprise applications tracking business state, the ability to act precisely when data in databases or stateful systems changes is critical. Event-driven architectures, underpinned by technologies like Apache Kafka, Azure Event Hubs, and serverless functions, are the go-to solution. Examples include processing IoT telemetry to trigger alerts or handling user activity streams in real-time analytics. Yet, while these systems excel at generic event processing, they demand significant additional effort when the goal is to isolate and respond to specific data changes—a specialized subset of this paradigm we call change-driven architecture. Drasi, the open-source Data Change Processing platform, is an exciting new option for building change-driven solutions easily. This article explores how Drasi’s unique capabilities—centered around clearly defined change semantics, the continuous query pattern, and the reaction pattern—provide a more efficient and effective alternative to traditional event-driven implementations. By codifying consistent patterns and reducing complexity, Drasi empowers solution architects and developers to build responsive systems faster with greater precision and less complexity, which makes them less brittle and easier to update as solution requirements change. The Limits of Generic Event Processing Event-driven architectures are powerful options for handling high-volume data streams to support data transfer, aggregation, and analytics use cases, but they fall short when the task is to react to specific, often complex and contextual, data transitions. Developers must bridge the gap between generic events and actionable changes, often by: Parsing general purpose event payloads to infer state changes e.g., decoding Kubernetes pod events to detect a "Running" to "Failed" transition. Because the implementor of the event generating system does not know all possible consumer use cases, they often produce very generic events (e.g. xxxAdded, xxxUpdated, xxxDeleted), or large all-encompassing event documents to accommodate as many uses cases as possible, Filtering irrelevant events to isolate meaningful updates e.g., sifting through audit logs to identify compliance violations. Most event generating system don’t produce just the events consumers need, they produce a firehose of events covering everything in the system, leaving it to the consumer to filter this down to what they need. Maintaining state to track context across events e.g., aggregating inventory transactions to detect threshold breaches. Most events relate to a single thing occurring. In the case of updates, they sometimes provide before and after versions of the changed data, but they lack any ability to relate the change to other elements of the system. The consumer must create and maintain an accurate set of state. Polling, a common alternative, wastes resources querying unchanged data and risks delaying, or even missing entirely, important changes due to the polling intervals, while custom change log processing requires intricate logic to map low-level updates to high-level conditions. For instance, detecting when a database replication lag exceeds 30 seconds might involve correlating logs from multiple systems. These options may be tolerable occasionally but become costly to build, support, and maintain when repeated organization-wide many times for many scenarios. Changes vs. Events Drasi distinguishes changes from events. Events signal that something happened, but their producer-defined structure and ambiguous semantics require consumers to interpret intent—e.g. a "UserUpdated" event might not clarify what changed. In contrast, a change in Drasi has precise semantics: a set of additions, updates, and/or deletions to a continuous query result set occurring because of changes to the source data. The structure of the change is defined by the author of the query (i.e. the consumer), not the designer of the source system. This consumer-driven data model reduces dependency on producer assumptions and simplifies engineering and downstream processing. Continuous Query Pattern Central to Drasi is the continuous query pattern, implemented using the openCypher graph query language. A continuous query runs perpetually, fed by change logs from one or more data sources, maintaining the current set of query results and generating notifications when those results change. Unlike producer-defined event streams, this pattern empowers consumers to specify the relevant properties and their relationships using a familiar database-type query. Consider this example monitoring infrastructure health: MATCH (s:Server)-[:HOSTS]->(a:Application)-[:DEPENDS_ON]->(db:Database) WHERE s.status = 'active' AND db.replicationLag > 30 RETURN s.id AS serverId, s.location AS datacenter, a.name AS appName, db.replicationLag AS lagSeconds At any time, this continuous query result contains the answer to the question “Which applications are experiencing a DB replication lag of 30 seconds or more?”. The MATCH clause spans servers, applications, and databases—potentially from distinct systems like an inventory database and monitoring tool. The WHERE clause specifies the condition (active servers with lagging databases), and the RETURN clause shapes the result set. Drasi ensures this result set stays current, notifying subscribers only when it changes. With strict temporal semantics, historical result sets can also be retrieved, aiding audits or debugging. Reaction Pattern The reaction pattern leverages the consistency of change notifications. Because a query defines inclusion semantics, state transitions are clear. For example, in a building management system: MATCH (r:Room)-[:HAS_SENSOR]->(t:TemperatureSensor), (r:Room)-[:LOCATED_IN]->(b:Building) WHERE b.id = 'HQ1' AND t.currentReading < 18 AND r.occupied = true RETURN r.id AS roomId, t.currentReading AS temp, b.id AS buildingId A room enters the result set when its temperature drops below 18°C while occupied, triggering heating. It exits when conditions change, stopping the action. The standard structure of change notifications, either room additions or removals, ensures straightforward reaction logic. Real-World Applications Drasi’s change-driven approach shines across many domains. For resource optimization, this continuous query identifies underutilized VMs to trigger cost-saving reactions: MATCH (vm:VirtualMachine)-[:DEPLOYED_IN]->(s:Subscription) WHERE vm.status = 'running' AND vm.utilizationPercent < 10 AND duration.between(vm.lastHighUtilization, datetime()) > duration('P7D') RETURN vm.id AS vmId, vm.utilizationPercent AS usage, s.owner AS owner For business exceptions, this continuous query flags delayed orders due to stock shortages: MATCH (o:Order)-[:HAS_ITEM]->(i:OrderItem)-[:REQUIRES]->(p:Product) WHERE o.status = 'processing' AND p.stockLevel < i.quantity AND duration.between(o.placementDate, datetime()) > duration('P2D') RETURN o.id AS orderId, p.id AS productId, p.stockLevel AS stock In both cases, Drasi simplifies detection and reaction, avoiding the custom logic required in event-based systems. Addressing Misconceptions and Scaling Efficiently Because Drasi is fundamentally a new type of data processing service, it’s easy to underestimate its value, which can come from underestimating the complexity of change detection or assuming event-driven tools suffice. While building a solution with events or polling might work as a one-off solution, repeating the same approach multiple times across an organization is inefficient and expensive. The solutions are also complex and brittle, making them difficult to maintain and update to address changing requirements. Drasi’s standardized approach, rooted in its change-driven architecture and theoretical underpinnings, streamlines change-driven systems, enabling teams to focus on defining queries and reactions rather than rebuilding infrastructure. Conclusion For change-driven architectures, Drasi is superior to generic event-based technologies because it targets the specific challenge of reacting to data changes. Its continuous query pattern, precise change semantics, and streamlined reaction pattern deliver cleaner, more efficient solutions than the traditional approaches. Available on GitHub under the Apache 2.0 License, Drasi offers solution architects and developers a robust, scalable foundation to transform complex change detection into declarative, cost-effective implementations—making it the optimal choice for responsive, data-centric systems.
CollinBrian
Apr 17, 2025 Place Linux and Open Source Blog
2.4KViews
5likes
0Comments
eBPF-Powered Observability Beyond Azure: A Multi-Cloud Perspective with Retina
Kubernetes simplifies container orchestration but introduces observability challenges due to dynamic pod lifecycles and complex inter-service communication. eBPF technology addresses these issues by providing deep system insights and efficient monitoring. The open-source Retina project leverages eBPF for comprehensive, cloud-agnostic network observability across AKS, GKE, and EKS, enhancing troubleshooting and optimization through real-world demo scenarios.
Simone_Rodigari
Apr 17, 2025 Place Linux and Open Source Blog
900Views
9likes
0Comments
Azure Image Testing for Linux (AITL)
As cloud and AI evolve at an unprecedented pace, the need to deliver high-quality, secure, and reliable Linux VM images has never been more essential. Azure Image Testing for Linux (AITL) is a self-service validation tool designed to help developers, ISVs, and Linux distribution partners ensure their images meet Azure’s standards before deployment. With AITL, partners can streamline testing, reduce engineering overhead, and ensure compliance with Azure’s best practices, all in a scalable and automated manner. Let’s explore how AITL is redefining image validation and why it’s proving to be a valuable asset for both developers and enterprises. Before AITL, image validation was largely a manual and repetitive process, engineers were often required to perform frequent checks, resulting in several key challenges: Time-Consuming: Manual validation processes delayed image releases. Inconsistent Validation: Each distro had different methods for testing, leading to varying quality levels. Limited Scalability: Resource constraints restricted the ability to validate a broad set of images. AITL addresses these challenges by enabling partners to seamlessly integrate image validation into their existing pipelines through APIs. By executing tests within their own Azure subscriptions prior to publishing, partners can ensure that only fully validated, high-quality Linux images are promoted to production in the Azure environment. How AITL Works? AITL is powered by LISA, which is a test framework and a comprehensive opensource tool contains 400+ test cases. AITL provides a simple, yet powerful workflow run LISA test cases: Registration: Partners register their images in AITL’s validation framework. Automated Testing: AITL runs a suite of predefined validation tests using LISA. Detailed Reporting: Developers receive comprehensive results highlighting compliance, performance, and security areas. All test logs are available to access. Self-Service Fixes: Any detected issues can be addressed by the partner before submission, eliminating delays and back-and-forth communication. Final Sign-Off: Once tests pass, partners can confidently publish their images, knowing they meet Azure’s quality standards. Benefits of AITL AITL is a transformative tool that delivers significant benefits across the Linux and cloud ecosystem: Self-Service Capability: Enables developers and ISVs to independently validate their images without requiring direct support from Microsoft. Scalable by Design: Supports concurrent testing of multiple images, driving greater operational efficiency. Consistent and Standardized Testing: Offers a unified validation framework to ensure quality and consistency across all endorsed Linux distributions. Proactive Issue Detection: Identifies potential issues early in the development cycle, helping prevent costly post-deployment fixes. Seamless Pipeline Integration: Easily integrates with existing CI/CD workflows to enable fully automated image validation. Use Cases for AITL AITL designed to support a diverse set of users across the Linux ecosystem: Linux Distribution Partners: Organizations such as Canonical, Red Hat, and SUSE can validate their images prior to publishing on the Azure Marketplace, ensuring they meet Azure’s quality and compliance standards. Independent Software Vendors (ISVs): Companies providing custom Linux Images can verify that their custom Linux-based solutions are optimized for performance and reliability on Azure. Enterprise IT Teams: Businesses managing their own Linux images on Azure can use AITL to validate updates proactively, reducing risk and ensuring smooth production deployments. Current Status and Future Roadmap AITL is currently in private preview, with five major Linux distros and select ISVs actively integrating it into their validation workflows. Microsoft plans to expand AITL’s capabilities by adding: Support for Private Test Cases: Allowing partners to run custom tests within AITL securely. Kernel CI Integration: Enhancing low-level kernel validation for more robust testing and results for community. DPDK and Specialized Validation: Ensuring network and hardware performance for specialized SKU (CVM, HPC) and workloads How to Get Started? For developers and partners interested in AITL, following the steps to onboard. Register for Private Preview AITL is currently hidden behind a preview feature flag. You must first register the AITL preview feature with your subscription so that you can then access the AITL Resource Provider (RP). These are one-time steps done for each subscription. Run the “az feature register” command to register the feature: az feature register --namespace Microsoft.AzureImageTestingForLinux --name JobandJobTemplateCrud Sign Up for Private Preview – Contact Microsoft’s Linux Systems Group to request access. Private Preview Sign Up To confirm that your subscription is registered, run the above command and check that properties.state = “Registered” Register the Resource Provider Once the feature registration has been approved, the AITL Resource Provider can be registered by running the “az provider register” command: az provider register --namespace Microsoft.AzureImageTestingForLinux *If your subscription is not registered to Microsoft.Compute/Network/Storage, please do so. These are also prerequisites to using the service. This can be done for each namespace (Microsoft.Compute, Microsoft.Network, Microsoft.Storage) through this command: az provider register --namespace Microsoft.Compute Setup Permissions The AITL RP requires a permission set to create test resources, such as the VM and storage account. The permissions are provided through a custom role that is assigned to the AITL Service Principal named AzureImageTestingForLinux. We provide a script setup_aitl.py to make it simple. It will create a role and grant to the service principal. Make sure the active subscription is expected and download the script to run in a python environment. https://raw.githubusercontent.com/microsoft/lisa/main/microsoft/utils/setup_aitl.py You can run the below command: python setup_aitl.py -s "/subscriptions/xxxx" Before running this script, you should check if you have the permission to create role definition in your subscription. *Note, it may take up to 20 minutes for the permission to be propagated. Assign an AITL jobs access role If you want to use a service principle or registration application to call AITL APIs. The service principle or App should be assigned a role to access AITL jobs. This role should include the following permissions: az role definition create --role-definition '{ "Name": "AITL Jobs Access Role", "Description": "Delegation role is to read and write AITL jobs and job templates", "Actions": [ "Microsoft.AzureImageTestingForLinux/jobTemplates/read", "Microsoft.AzureImageTestingForLinux/jobTemplates/write", "Microsoft.AzureImageTestingForLinux/jobTemplates/delete", "Microsoft.AzureImageTestingForLinux/jobs/read", "Microsoft.AzureImageTestingForLinux/jobs/write", "Microsoft.AzureImageTestingForLinux/jobs/delete", "Microsoft.AzureImageTestingForLinux/operations/read", "Microsoft.Resources/subscriptions/read", "Microsoft.Resources/subscriptions/operationresults/read", "Microsoft.Resources/subscriptions/resourcegroups/write", "Microsoft.Resources/subscriptions/resourcegroups/read", "Microsoft.Resources/subscriptions/resourcegroups/delete" ], "IsCustom": true, "AssignableScopes": [ "/subscriptions/01d22e3d-ec1d-41a4-930a-f40cd90eaeb2" ] }' You can create a custom role using the above command in the cloud shell, and assign this role to the service principle or the App. All set! Please go through a quick start to try AITL APIs. Download AITL wrapper AITL is served by Azure management API. You can use any REST API tool to access it. We provide a Python wrapper for better experience. The AITL wrapper is composed of a python script and input files. It calls “az login” and “az rest” to provide similar experience like the az CLI. The input files are used for creating test jobs. Make sure az CLI and python 3 are installed. Clone LISA code, or only download files in the folder. lisa/microsoft/utils/aitl at main · microsoft/lisa (github.com). Use the command below to check the help text. python -m aitl job –-help python -m aitl job create --help Create a job Job creation consists of two entities: A job template and an image. The quickest way to get started with the AITL service is to create a Job instance with your job template properties in the request body. Replace placeholders with the real subscription id, resource group, job name to start a test job. This example runs 1 test case with a marketplace image using the tier0.json template. You can create a new json file to customize the test job. The name is optional. If it’s not provided, AITL wrapper will generate one. python -m aitl job create -s {subscription_id} -r {resource_group} -n {job_name} -b ‘@./tier0.json’ The default request body is: { "location": "westus3", "properties": { "jobTemplateInstance": { "selections": [ { "casePriority": [ 0 ] } ] } } } This example runs the P0 test cases with the default image. You can choose to add fields to the request, such as image to test. All possible fields are described in the API Specification – Jobs section. The “location” property is a required field that represents the location where the test job should be created, it doesn’t affect the location of VMs. AITL supports “westus”, “westus2”, or “westus3”. The image object in the request body json is where the image type to be used for testing is detailed, as well as the CPU architecture and VHD Generation. If the image object is not included, LISA will pick a Linux marketplace image that meets the requirements for running the specified tests. When an image type is specified, additional information will be required based on the image type. Supported image types are VHD, Azure Marketplace image, and Shared Image Gallery. - VHD requires the SAS URL. - Marketplace image requires the publisher, offer, SKU, and version. - Shared Image Gallery requires the gallery name, image definition, and version. Example of how to include the image object for shared image gallery. (<> denotes placeholder): { "location": "westus3", “properties: { <...other properties from default request body here>, "image": { "type": "shared_gallery", "architecture": "x64", "vhdGeneration": 2, "gallery": "<Example: myAzureComputeGallery>", "definition": "<Example: myImage1>", "version": "<Example: 1.0.1>" } } } Check Job Status & Test Results A job is an asynchronous operation that is updated throughout the job’s lifecycle with its operation and ongoing tests status. A job has 6 provisioning states – 4 are non-terminal states and 2 are terminal states. Non-terminal states represent ongoing operation stages and terminal states represent the status at completion. The job’s current state is reflected in the `properties.provisioningState` property located in the response body. The states are described below: Operation States State Type Description Accepted Non-Terminal state Initial ARM state describing the resource creation is being initialized. Queued Non-Terminal state The job has been queued by AITL to run LISA using the provided job template parameters. Scheduling Non-Terminal state The job has been taken off the queue and AITL is preparing to launch LISA. Provisioning Non-Terminal state LISA is creating your VM within your subscription using the default or provided image. Running Non-Terminal state LISA is running the specified tests on your image and VM configuration. Succeeded Terminal state LISA completed the job run and has uploaded the final test results to the job. There may be failed test cases. Failed Terminal state There was a failure during the job’s execution. Test results may be present and reflect the latest status for each listed test. Test results are updated in near real-time and can be seen in the ‘properties.results’ property in the response body. Results will begin to get updated during the “Running” state and the final set of result updates will happen prior to reaching a terminal state (“Completed” or “Failed”). For a complete list of possible test result properties, go to the API Specification – Test Results section. Run below command to get detailed test results. python -m aitl job get -s {subscription_id} -r {resource_group} -n {job_name} The query argument can format or filter results by JMESquery. Please refer to help text for more information. For example, List test results and error messages. python -m aitl job get -s {subscription_id} -r {resource_group} -n {job_name} -o table -q 'properties.results[].{name:testName,status:status,message:message}' Summarize test results. python -m aitl job get -s {subscription_id} -r {resource_group} -n {job_name} -q 'properties.results[].status|{TOTAL:length(@),PASSED:length([?@==`"PASSED"`]),FAILED:length([?@==`"FAILED"`]),SKIPPED:length([?@==`"SKIPPED"`]),ATTEMPTED:length([?@==`"ATTEMPTED"`]),RUNNING:length([?@==`"RUNNING"`]),ASSIGNED:length([?@==`"ASSIGNED"`]),QUEUED:length([?@==`"QUEUED"`])}' Access Job Logs To access logs and read from Azure Storage, the AITL user must have “Storage Blob Data Owner” role. You should check if you have the permission to create role definition in your subscription, likely with your administrator. For information on this role and instructions on how to add this permission, see this Azure documentation. To access job logs, send a GET request with the job name and use the logUrl in the response body to retrieve the logs, which are stored in Azure storage container. For more details on interpreting logs, refer to the LISA documentation on troubleshooting test failures. To quickly view logs online (note that file size limitations may apply), select a .log Blob file and click "edit" in the top toolbar of the Blob menu. To download the log, click the download button in the toolbar. Conclusion AITL represents a forward-looking approach to Linux image validation bringing automation, scalability, and consistency to the forefront. By shifting validation earlier in the development cycle, AITL helps reduce risk, accelerate time to market, and ensure a reliable, high-quality Linux experience on Azure. Whether you're a developer, a Linux distribution partner, or an enterprise managing Linux workloads on Azure, AITL offers a powerful way to modernize and streamline your validation workflows. To learn more or get started with AITL or more details and access to AITL, reach out to Microsoft Linux Systems Group
KashanK
Apr 10, 2025 Place Linux and Open Source Blog
859Views
0likes
0Comments
Automating the Linux Quality Assurance with LISA on Azure
Introduction Building on the insights from our previous blog regarding how MSFT ensures the quality of Linux images, this article aims to elaborate on the open-source tools that are instrumental in securing exceptional performance, reliability, and overall excellence of virtual machines on Azure. While numerous testing tools are available for validating Linux kernels, guest OS images and user space packages across various cloud platforms, finding a comprehensive testing framework that addresses the entire platform stack remains a significant challenge. A robust framework is essential, one that seamlessly integrates with Azure's environment while providing the coverage for major testing tools, such as LTP and kselftest and covers critical areas like networking, storage and specialized workloads, including Confidential VMs, HPC, and GPU scenarios. This unified testing framework is invaluable for developers, Linux distribution providers, and customers who build custom kernels and images. This is where LISA (Linux Integration Services Automation) comes into play. LISA is an open-source tool specifically designed to automate and enhance the testing and validation processes for Linux kernels and guest OS images on Azure. In this blog, we will provide the history of LISA, its key advantages, the wide range of test cases it supports, and why it is an indispensable resource for the open-source community. Moreover, LISA is available under the MIT License, making it free to use, modify, and contribute. History of LISA LISA was initially developed as an internal tool by Microsoft to streamline the testing process of Linux images and kernel validations on Azure. Recognizing the value it could bring to the broader community, Microsoft open-sourced LISA, inviting developers and organizations worldwide to leverage and enhance its capabilities. This move aligned with Microsoft's growing commitment to open-source collaboration, fostering innovation and shared growth within the industry. LISA serves as a robust solution to validate and certify that Linux images meet the stringent requirements of modern cloud environments. By integrating LISA into the development and deployment pipeline, teams can: Enhance Quality Assurance: Catch and resolve issues early in the development cycle. Reduce Time to Market: Accelerate deployment by automating repetitive testing tasks. Build Trust with Users: Deliver stable and secure applications, bolstering user confidence. Collaborate and Innovate: Leverage community-driven improvements and share insights. Benefits of Using LISA Scalability: Designed to run large-scale test cases, from 1 test case to 10k test cases in one command. Multiple platform orchestration: LISA is created with modular design, to support run the same test cases on various platforms including Microsoft Azure, Windows HyperV, BareMetal, and other cloud-based platforms. Customization: Users can customize test cases, workflow, and other components to fit specific needs, allowing for targeted testing strategies. It’s like building kernels on-the-fly, sending results to custom database, etc. Community Collaboration: Being open source under the MIT License, LISA encourages community contributions, fostering continuous improvement and shared expertise. Extensive Test Coverage: It offers a rich suite of test cases covering various aspects of compatibility of Azure and Linux VMs, from kernel, storage, networking to middleware. How it works Infrastructure LISA is designed to be componentized and maximize compatibility with different distros. Test cases can focus only on test logic. Once test requirements (machines, CPU, memory, etc) are defined, just write the test logic without worrying about environment setup or stopping services on different distributions. Orchestration. LISA uses platform APIs to create, modify and delete VMs. For example, LISA uses Azure API to create VMs, run test cases, and delete VMs. During the test case running, LISA uses Azure API to collect serial log and can hot add/remove data disks. If other platforms implement the same serial log and data disk APIs, the test cases can run on the other platforms seamlessly. Ensure distro compatibility by abstracting over 100 commands in test cases, allowing focus on validation logic rather than distro compatibility. Pre-processing workflow assists in building the kernel on-the-fly, installing the kernel from package repositories, or modifying all test environments. Test matrix helps one run to test all. For example, one run can test different vm sizes on Azure, or different images, even different VM sizes and different images together. Anything is parameterizable, can be tested in a matrix. Customizable notifiers enable the saving of test results and files to any type of storage and database. Agentless and low dependency LISA operates test systems via SSH without requiring additional dependencies, ensuring compatibility with any system that supports SSH. Although some test cases require installing extra dependencies, LISA itself does not. This allows LISA to perform tests on systems with limited resources or even different operating systems. For instance, LISA can run on Linux, FreeBSD, Windows, and ESXi. Getting Started with LISA Ready to dive in? Visit the LISA project at aka.ms/lisa to access the documentation. Install: Follow the installation guide provided in the repository to set up LISA in your testing environment. Run: Follow the instructions to run LISA on local machine, Azure or existing systems. Extend: Follow the documents to extend LISA by test cases, data sources, tools, platform, workflow, etc. Join the Community: Engage with other users and contributors through forums and discussions to share experiences and best practices. Contribute: Modify existing test cases or create new ones to suit your needs. Share your contributions with the community to enhance LISA's capabilities. Conclusion LISA offers open-source collaborative testing solutions designed to operate across diverse environments and scenarios, effectively narrowing the gap between enterprise demands and community-led innovation. By leveraging LISA, customers can ensure their Linux deployments are reliable and optimized for performance. Its comprehensive testing capabilities, combined with the flexibility and support of an active community, make LISA an indispensable tool for anyone involved in Linux quality assurance and testing. Your feedback is invaluable, and we would greatly appreciate your insights.
KashanK
Jan 28, 2025 Place Linux and Open Source Blog
522Views
1like
0Comments
Enhancing Observability with Inspektor Gadget
Thorough observability is essential to a pain free cloud experience. Azure provides many general-purpose observability tools, but you may want to create custom tooling . Inspektor Gadget is an open-source framework that makes customizable data collection easy. Microsoft recently contributed new features to Inspektor Gadget that further enhance its modular framework, making it even easier to meet your specific systems inspection needs. Of course, we also made it easy for Azure Kubernetes Service (AKS) users to use.
chriskuehl
Oct 08, 2024 Place Linux and Open Source Blog
1.1KViews
0likes
0Comments