Blog Post

Microsoft Blog for PostgreSQL
4 MIN READ

Build a Knowledge Graph in Azure HorizonDB with AI Functions and Apache AGE

Aditi_Gupta's avatar
Aditi_Gupta
Icon for Microsoft rankMicrosoft
Jun 11, 2026

Knowledge graphs appear in every AI architecture diagram, every conference keynote and every AI strategy deck. Yet the most common question we hear from customers and engineers alike is: "What does a knowledge graph actually do for me?"

That is a fair question, and one worth answering clearly, because most teams already have a knowledge graph problem and do not realize it.

The connections your relational tables cannot surface

Picture this: five incident tickets land over a week. One says the auth service returned 503s after an API gateway update, which broke checkout. Another says the payment service lost connectivity to fraud detection through a DNS failure. A third says auth got rate-limited by that same API gateway after a config change.

Each ticket makes sense on its own. But no one in your postmortem can answer: "What upstream services most commonly trigger failures that reach checkout?" That question requires tracing relationships across tickets, teams, services, and root causes. Your relational tables store the facts. They do not store the connections between them. That is a knowledge graph problem.

What becomes queryable once you have a knowledge graph

Once you build a graph from those tickets, every node is an entity (a service, a team, an incident) and every edge is a relationship (CAUSED_FAILURE_IN, OPERATES_ON, INVOLVES). The graph does not just store data differently. It makes a new class of questions answerable:

  • What is the most common upstream cause of checkout failures?
  • Which team resolves the most cross-service incidents?
  • Show me every cascading failure chain that touched the payment service in the last 90 days.
  • What is the timeline of incidents involving the same shared service?

Each of these questions can be answered with a single Cypher query, without nested subqueries, recursive CTEs, or manually correlating data across spreadsheets.

Why graph-augmented RAG needs a knowledge graph first

Traditional RAG retrieves chunks of text by vector similarity. It works well when the answer lives in a single document. It falls apart when the answer requires connecting facts across multiple documents. Ask "does this contract conflict with existing obligations?" and vector search returns a relevant clause. But it cannot follow links across regions, obligation types, and counterparties to prove a real conflict.

Graph-augmented RAG combines vector search, semantic ranking, and graph traversal into one retrieval pipeline. The graph provides the structural context that vector search alone cannot: the actual chain of cause and effect, not just the five most similar paragraphs.

But here is the catch most people miss: you cannot run graph-augmented RAG without a knowledge graph. And building the graph has always been the hard part. That is exactly what the new tutorial solves.

Building a knowledge graph in five steps inside Azure HorizonDB

We published a hands-on tutorial on Microsoft Learn that takes you from raw incident tickets to a connected, queryable knowledge graph. No external NLP pipelines. No separate graph database. Just SQL.

Here is the pipeline:

  1. Extract entities and relationships from unstructured text with azure_ai.extract(). The LLM parses services, teams, root causes, and relationship triples in one SQL call.
  2. Deduplicate entities with azure_ai.generate() using structured JSON output. "API gateway," "api-gateway," and "the gateway service" collapse into one canonical node.
  3. Load into an Apache AGE graph using Cypher MERGE in PL/pgSQL loops. The tutorial builds service nodes, team nodes, incident hub nodes, all six relationship types, and a timeline chain linking incidents chronologically.
  4. Query with Cypher traversals. Variable-length path patterns like *1..3 trace cascading failure chains up to three hops deep.
  5. Visualize results in the PostgreSQL extension for VS Code, which renders Cypher output as an interactive node-edge graph.

The tutorial walks through every SQL statement, explains the tricky parts (like why EXECUTE format() is needed for parameterized Cypher, and how CROSS JOIN LATERAL expands team-service pairs correctly), and shows the exact output at each step.

The same pipeline applied to any domain

The tutorial uses incident tickets to keep things concrete. But the pipeline applies to any domain:

Domain

Key Entities

Question It Answers

Contract intelligence

Parties, clauses, obligations

Does this new vendor contract conflict with existing obligations?

E-commerce product catalog

Products, categories, customers, orders

What do customers who bought X typically buy next?

Fraud detection

Accounts, transactions, devices, IP addresses

Which accounts are connected through shared devices and circular transfers?

Healthcare clinical data

Patients, medications, conditions, providers

Does this new prescription conflict with existing medications?

Codebase dependency analysis

Tables, functions, views, triggers

If I alter this table, which downstream views and functions break?

Supply chain

Suppliers, components, facilities

Which tier-2 suppliers are single points of failure?

Research knowledge base

Papers, authors, concepts

What evidence chain supports this treatment for condition X?

Data lineage and ETL

Sources, transformations, dashboards

If this source schema changes, which dashboards break?

Identity and access management

Users, groups, roles, resources

Which users have transitive access to production through nested groups?

Regulatory compliance

Regulations, controls, systems

If this regulation changes, which controls need updating?

Customer 360

Customers, interactions, campaigns

What sequence of touchpoints leads to churn for enterprise accounts?

Insurance claims

Claimants, policies, events, providers

Which claims share overlapping parties or event timelines?

M&A due diligence

Companies, IP assets, contracts, liabilities

What hidden liabilities are linked to this acquisition target?

In every case, the shape is the same: azure_ai.extract() discovers the entities, azure_ai.generate() deduplicates them, and AGE stores and traverses the graph.

Get started

We would love to hear what you build. Share your feedback on the PostgreSQL Hub developer forum.

Thank you!

Updated Jun 03, 2026
Version 1.0