graph
3 TopicsBuild a Knowledge Graph in Azure HorizonDB with AI Functions and Apache AGE
Knowledge graphs appear in every AI architecture diagram, every conference keynote and every AI strategy deck. Yet the most common question we hear from customers and engineers alike is: "What does a knowledge graph actually do for me?" That is a fair question, and one worth answering clearly, because most teams already have a knowledge graph problem and do not realize it. The connections your relational tables cannot surface Picture this: five incident tickets land over a week. One says the auth service returned 503s after an API gateway update, which broke checkout. Another says the payment service lost connectivity to fraud detection through a DNS failure. A third says auth got rate-limited by that same API gateway after a config change. Each ticket makes sense on its own. But no one in your postmortem can answer: "What upstream services most commonly trigger failures that reach checkout?" That question requires tracing relationships across tickets, teams, services, and root causes. Your relational tables store the facts. They do not store the connections between them. That is a knowledge graph problem. What becomes queryable once you have a knowledge graph Once you build a graph from those tickets, every node is an entity (a service, a team, an incident) and every edge is a relationship (CAUSED_FAILURE_IN, OPERATES_ON, INVOLVES). The graph does not just store data differently. It makes a new class of questions answerable: What is the most common upstream cause of checkout failures? Which team resolves the most cross-service incidents? Show me every cascading failure chain that touched the payment service in the last 90 days. What is the timeline of incidents involving the same shared service? Each of these questions can be answered with a single Cypher query, without nested subqueries, recursive CTEs, or manually correlating data across spreadsheets. Why graph-augmented RAG needs a knowledge graph first Traditional RAG retrieves chunks of text by vector similarity. It works well when the answer lives in a single document. It falls apart when the answer requires connecting facts across multiple documents. Ask "does this contract conflict with existing obligations?" and vector search returns a relevant clause. But it cannot follow links across regions, obligation types, and counterparties to prove a real conflict. Graph-augmented RAG combines vector search, semantic ranking, and graph traversal into one retrieval pipeline. The graph provides the structural context that vector search alone cannot: the actual chain of cause and effect, not just the five most similar paragraphs. But here is the catch most people miss: you cannot run graph-augmented RAG without a knowledge graph. And building the graph has always been the hard part. That is exactly what the new tutorial solves. Building a knowledge graph in five steps inside Azure HorizonDB We published a hands-on tutorial on Microsoft Learn that takes you from raw incident tickets to a connected, queryable knowledge graph. No external NLP pipelines. No separate graph database. Just SQL. Here is the pipeline: Extract entities and relationships from unstructured text with azure_ai.extract(). The LLM parses services, teams, root causes, and relationship triples in one SQL call. Deduplicate entities with azure_ai.generate() using structured JSON output. "API gateway," "api-gateway," and "the gateway service" collapse into one canonical node. Load into an Apache AGE graph using Cypher MERGE in PL/pgSQL loops. The tutorial builds service nodes, team nodes, incident hub nodes, all six relationship types, and a timeline chain linking incidents chronologically. Query with Cypher traversals. Variable-length path patterns like *1..3 trace cascading failure chains up to three hops deep. Visualize results in the PostgreSQL extension for VS Code, which renders Cypher output as an interactive node-edge graph. The tutorial walks through every SQL statement, explains the tricky parts (like why EXECUTE format() is needed for parameterized Cypher, and how CROSS JOIN LATERAL expands team-service pairs correctly), and shows the exact output at each step. The same pipeline applied to any domain The tutorial uses incident tickets to keep things concrete. But the pipeline applies to any domain: Domain Key Entities Question It Answers Contract intelligence Parties, clauses, obligations Does this new vendor contract conflict with existing obligations? E-commerce product catalog Products, categories, customers, orders What do customers who bought X typically buy next? Fraud detection Accounts, transactions, devices, IP addresses Which accounts are connected through shared devices and circular transfers? Healthcare clinical data Patients, medications, conditions, providers Does this new prescription conflict with existing medications? Codebase dependency analysis Tables, functions, views, triggers If I alter this table, which downstream views and functions break? Supply chain Suppliers, components, facilities Which tier-2 suppliers are single points of failure? Research knowledge base Papers, authors, concepts What evidence chain supports this treatment for condition X? Data lineage and ETL Sources, transformations, dashboards If this source schema changes, which dashboards break? Identity and access management Users, groups, roles, resources Which users have transitive access to production through nested groups? Regulatory compliance Regulations, controls, systems If this regulation changes, which controls need updating? Customer 360 Customers, interactions, campaigns What sequence of touchpoints leads to churn for enterprise accounts? Insurance claims Claimants, policies, events, providers Which claims share overlapping parties or event timelines? M&A due diligence Companies, IP assets, contracts, liabilities What hidden liabilities are linked to this acquisition target? In every case, the shape is the same: azure_ai.extract() discovers the entities, azure_ai.generate() deduplicates them, and AGE stores and traverses the graph. Get started Tutorial: Build a knowledge graph from unstructured text using AI Functions and Apache AGE Knowledge graph enhanced search: Graph-augmented RAG patterns for Azure HorizonDB Solution accelerators: GraphRAG Legal Research Copilot, GraphRAG with Docker and AI Agents We would love to hear what you build. Share your feedback on the PostgreSQL Hub developer forum. Thank you!59Views0likes0CommentsGeneral Availability of Graph Database Support in Azure Database for PostgreSQL
We are excited to announce the general availability of the Apache AGE extension for Azure Database for PostgreSQL! This marks a significant milestone in empowering developers and businesses to harness the potential of graph data directly within their PostgreSQL environments, offering fully managed graph database service. Unlocking Graph Data Capabilities Apache AGE (A Graph Extension) is a powerful PostgreSQL extension. It allows users to store and query graph data within Postgres seamlessly, enabling advanced insights through intuitive graph database queries via the openCypher query language. Graph data is instrumental in applications such as social networks, recommendation systems, fraud detection, network analysis, and knowledge graphs. By integrating Apache AGE into Azure Database for PostgreSQL, developers can now benefit from a unified platform that supports both relational and graph data models, unlocking deeper insights and streamlining data workflows. Benefits of Using Apache AGE in Azure Database for PostgreSQL The integration of Apache AGE (AGE) in Azure Database for PostgreSQL brings numerous benefits to developers and businesses looking to leverage graph processing capabilities: Enterprise-grade Managed Graph Database Service: AGE in Azure Database for PostgreSQL provides a fully managed graph database solution, eliminating infrastructure management while delivering built-in security, updates, and high availability. Simplified Data Management: AGE's ability to integrate graph and relational data simplifies data management tasks, reducing the need for separate graph database solutions. Enhanced Data Analysis: With AGE, you can perform complex graph analyses directly within your PostgreSQL database, gaining deeper insights into relationships and patterns in your data. Cost Efficiency: By utilizing AGE within Azure Database for PostgreSQL, you can consolidate your database infrastructure, lowering overall costs and reducing the complexity of your data architecture. Security and Compliance: Leverage Azure's industry-leading security and compliance features, ensuring your graph data is protected and meets regulatory requirements. Index Support: Index graph properties with BTREE and GIN indexes. Real-World Applications Apache AGE opens up a range of possibilities for graph-powered applications. Here are just a few examples: Social Networks: Model and analyze complex relationships, such as user connections and interactions. Fraud Detection: Identify suspicious patterns and connections in financial transactions. Recommendation Systems: Leverage graph data to deliver personalized product or content recommendations. Knowledge Graphs: Structure facts and concepts as nodes and relationships, enabling AI-driven search and data discovery. In the following example, we need to provide Procurement with an updated status of all statements of work (SOW) by vendor, including their invoice status. With AGE and Postgres, this once complex task becomes quite simple. We’ll start by creating the empty graph. SELECT ag_catalog.create_graph('vendor_graph'); Then, we’ll create all the ‘vendor’ nodes from the vendors table. SELECT * FROM ag_catalog.cypher( 'vendor_graph', $$ UNWIND $rows AS v CREATE (:vendor { id: v.id, name: v.name }) $$, ARRAY( SELECT jsonb_build_object('id', id, 'name', name) FROM vendors ) ); Next, we’ll create all the ‘sow’ nodes. SELECT * FROM ag_catalog.cypher( 'vendor_graph', $$ UNWIND $rows AS s CREATE (:sow { id: s.id, number: s.number }) $$, ARRAY( SELECT jsonb_build_object('id', id, 'number', number) FROM sows ) ); Then, we’ll create the ‘has_invoices’ relationships (edges). SELECT * FROM ag_catalog.cypher( 'vendor_graph', $$ UNWIND $rows AS r MATCH (v:vendor { id: r.vendor_id }) MATCH (s:sow { id: r.sow_id }) CREATE (v)-[:has_invoices { payment_status: r.payment_status, amount: r.invoice_amount }]->(s) $$, ARRAY( SELECT jsonb_build_object( 'vendor_id', vendor_id, 'sow_id', sow_id, 'payment_status', payment_status, 'invoice_amount', amount ) FROM invoices ) ); Now that we’ve completed these steps, we have a fully populated vendor_graph with vendor nodes, sow nodes, and has_invoices edges with the invoice attributes. We’re ready to query the graph to start our report for Procurement. SELECT * FROM ag_catalog.cypher('vendor_graph' , $$ MATCH (v:vendor)-[rel:has_invoices]->(s:sow) RETURN v.id AS vendor_id, v.name AS vendor_name, s.id AS sow_id, s.number AS sow_number, rel.payment_status AS payment_status, rel.amount AS invoice_amount $$) AS graph_query(vendor_id BIGINT, vendor_name TEXT, sow_id BIGINT, sow_number TEXT, payment_status TEXT, invoice_amount FLOAT); This statement invokes Apache AGE’s Cypher engine that treats our graph as a relational table: ag_catalog.cypher('vendor_graph', $$ … $$) executes the Cypher query against the graph named “vendor_graph.” The inner Cypher fragment, MATCH (v:vendor)-[rel:has_invoices]->(s:sow) RETURN v.id AS vendor_id, v.name AS vendor_name, s.id AS sow_id, s.number AS sow_number, rel.payment_status AS payment_status, rel.amount AS invoice_amount finds every vendor node with outgoing has_invoices edges to SOW nodes projects each vendor’s ID/name, the target sow’s ID/number, and invoice attributes. Wrapping that in … ) AS graph_query( vendor_id BIGINT, vendor_name TEXT, sow_id BIGINT, sow_number TEXT, payment_status TEXT, invoice_amount FLOAT ); tells PostgreSQL how to map each returned column into a regular SQL result set with proper types. The result? You get a standard table of rows—one per invoice edge—with those six columns populated and ready for further SQL joins, filters, aggregates, etc. Performance notes for this example: AGE will scan all “vendor–has_invoices–sow” paths in the graph. If the graph is large, consider an index on the vendor or sow label properties or filter by additional predicates. You can also push WHERE clauses into the Cypher fragment for more selective matching. Scaling to Large Graphs with AGE The Apache AGE extension in Azure Database for PostgreSQL enables seamless scaling to large graphs. Indexing plays a pivotal role in enhancing query performance, particularly for complex graph analyses. Effective Indexing Strategies To optimize graph queries, particularly those involving joins or range queries, implementing the following indexes is recommended: BTREE Index: Ideal for exact matches and range queries. For vertex tables, create an index on the unique identifier column (e.g., id). CREATE INDEX ON graph_name."VLABEL" USING BTREE (id); GIN Index: Designed for efficient searches within JSON fields, such as the properties column in vertex tables. CREATE INDEX ON graph_name."VLABEL" USING GIN (properties); Edge Table Indexes: For relationship traversal, use BTREE indexes on start_id and end_id columns. CREATE INDEX ON graph_name."ELABEL" USING BTREE (start_id); CREATE INDEX ON graph_name."ELABEL" USING BTREE (end_id); Example: Targeted Key-Value Indexing For targeted queries that focus on specific attributes within the JSON field, a smaller BTREE index can be created for precise filtering. CREATE INDEX ON graph_name.label_name USING BTREE (agtype_access_operator(VARIADIC ARRAY[properties, '"KeyName"'::agtype])); Using these indexing strategies ensures efficient query execution, even when scaling large graphs. Additionally, leveraging the EXPLAIN command helps validate index utilization and optimize query plans for production workloads. How to Get Started Enabling Apache AGE in Azure Database for PostgreSQL is simple: 1. Update Server Parameters Within the Azure Portal, navigate to the PostgreSQL Flexible Server instance and select the Server Parameters option. Adjust the following settings: azure.extensions: In the parameter filter, search for and enable AGE among the available extensions. shared_preload_libraries: In the parameter filter, search for and enable AGE. Click Save to apply these changes. The server will restart automatically to activate the AGE extension. Note: Failure to enable the shared_preload_libraries will result in the following error when you first attempt to use the AGE schema in a query. “ERROR: unhandled cypher(cstring) function call error on first cypher query” 2. Enable AGE Within PostgreSQL Once the server restart is complete, connect to the PostgreSQL instance using the psql interpreter. Execute the following command to enable AGE: CREATE EXTENSION IF NOT EXISTS AGE CASCADE; 3. Configure Schema Paths AGE adds a schema called ag_catalog, which is essential for handling graph data. Ensure this schema is included in the search path by executing: SET search_path=ag_catalog,"$user",public; That’s it! You’re ready to create your first graph within PostgreSQL on Azure. Ready to dive in? Experience the power of graph data with Apache AGE on Azure Database for PostgreSQL. Visit AGE on Azure Database for PostgreSQL Overview for more details, and explore how this extension can transform your data analysis and application development. Get started for free with an Azure free account1.2KViews2likes6CommentsIntroducing support for Graph data in Azure Database for PostgreSQL (Preview)
We are excited to announce the addition of Apache AGE extension in Azure Database for PostgreSQL, a significant advancement that provides graph processing capabilities within the PostgreSQL ecosystem. This new extension brings a powerful toolset for developers looking to leverage a graph database with the robust enterprise features of Azure Database for PostgreSQL.9.9KViews6likes7Comments