Graph RAG for Security: Insights from a Microsoft Intern

Former Employee

Jul 30, 2025

As a software engineering intern at Microsoft Security, I had the exciting opportunity to explore how Graph Retrieval-Augmented Generation (Graph RAG) can enhance data security investigations. This blog post shares my learning journey and insights from working with this evolving technology.

In Gartner’s 2024 Impact Radar, Knowledge Graphs were highlighted as a transformative technology—on par with Generative AI. With Gartner predicting that AI agents will augment or automate 50% of business decisions by 2027, I was curious to see how graph-based reasoning could be applied to cybersecurity.

What is Retrieval Augmented Generation (RAG)?

Retrieval-Augmented Generation (RAG) is a process that enhances large language models (LLMs) by including relevant information from a knowledge base in its responses. RAG is especially useful when reasoning over company-specific or internal documents.

How retrieval augmented generation (RAG) works in three steps.

Why Graph RAG?

Graph RAG builds upon baseline RAG by incorporating knowledge graphs in the search. Instead of retrieving text, Graph RAG retrieves nodes and edges from a graph, allowing the model to reason over connections like:

Which users are linked to which IOCs?

How are malicious domains connected to specific systems?

What patterns emerge across multiple incidents?

How graph RAG works in four steps.

What I Discovered

I worked on a prompt that asked us to identify IOCs—like malicious IPs, domains, file hashes, and suspicious email addresses. I ran this prompt over a subset of the publicly available Enron email dataset. Here’s what stood out:

Traditional RAG provided isolated facts. For example, it flagged a suspicious vehicle in a parking garage and a compliance issue.

Graph RAG revealed connections between encoded strings, phishing URLs, malware files, and compromised email accounts.

Comparison Chart: Traditional RAG vs Graph RAG Response

The table below illustrates the difference in context richness between the two methods:

Traditional RAG Response Snippet

Graph RAG Response Snippet

The provided data does not contain explicit indicators of compromise (IOCs). Below is a summary of relevant information:

Suspicious vehicle reported in parking garage

Rude response from building security

Noncompliance issue

Below is a detailed breakdown of the IOCs and their associations:

Encoded strings linked to malware and may represent obfuscated data or signatures

Malicious URLs and domains used in phishing campaigns

Compromised email addresses linked to emails containing viruses and associated with shared credentials

File artifacts include faxed image of a contract and executable file distributed via malicious URL

IP addresses accessing secure infrastructure, potentially indicating unauthorized activity

Key Takeaways

Traditional RAG focuses on surface-level keyword matching and returns isolated facts or concerns.

Graph RAG maps IOCs to affected systems, users, and network nodes—offering a more holistic view of threats with added context.

Final Thoughts

Although graph RAG is more computationally expensive than baseline RAG, it offers a remarkable improvement to multi-hop reasoning. By traversing a knowledge graph, it can find more relevant search results from context data.

I hope this post inspires you to explore the potential of graph-powered security. Graph RAG is an open-source repository on Github maintained by Microsoft that you can experiment with on your own.

Updated Jul 31, 2025

Version 2.0

Former Employee

Joined June 16, 2025

View Profile