As a software engineering intern at Microsoft Security, I had the exciting opportunity to explore how Graph Retrieval-Augmented Generation (Graph RAG) can enhance data security investigations. This blog post shares my learning journey and insights from working with this evolving technology.
In Gartner’s 2024 Impact Radar, Knowledge Graphs were highlighted as a transformative technology—on par with Generative AI. With Gartner predicting that AI agents will augment or automate 50% of business decisions by 2027, I was curious to see how graph-based reasoning could be applied to cybersecurity.
What is Retrieval Augmented Generation (RAG)?
Retrieval-Augmented Generation (RAG) is a process that enhances large language models (LLMs) by including relevant information from a knowledge base in its responses. RAG is especially useful when reasoning over company-specific or internal documents.
How retrieval augmented generation (RAG) works in three steps.
Why Graph RAG?
Graph RAG builds upon baseline RAG by incorporating knowledge graphs in the search. Instead of retrieving text, Graph RAG retrieves nodes and edges from a graph, allowing the model to reason over connections like:
- Which users are linked to which IOCs?
- How are malicious domains connected to specific systems?
- What patterns emerge across multiple incidents?
How graph RAG works in four steps.
What I Discovered
I worked on a prompt that asked us to identify IOCs—like malicious IPs, domains, file hashes, and suspicious email addresses. I ran this prompt over a subset of the publicly available Enron email dataset. Here’s what stood out:
- Traditional RAG provided isolated facts. For example, it flagged a suspicious vehicle in a parking garage and a compliance issue.
- Graph RAG revealed connections between encoded strings, phishing URLs, malware files, and compromised email accounts.
Comparison Chart: Traditional RAG vs Graph RAG Response
The table below illustrates the difference in context richness between the two methods:
Traditional RAG Response Snippet |
Graph RAG Response Snippet |
The provided data does not contain explicit indicators of compromise (IOCs). Below is a summary of relevant information:
|
Below is a detailed breakdown of the IOCs and their associations:
|
Key Takeaways
- Traditional RAG focuses on surface-level keyword matching and returns isolated facts or concerns.
- Graph RAG maps IOCs to affected systems, users, and network nodes—offering a more holistic view of threats with added context.
Final Thoughts
Although graph RAG is more computationally expensive than baseline RAG, it offers a remarkable improvement to multi-hop reasoning. By traversing a knowledge graph, it can find more relevant search results from context data.
I hope this post inspires you to explore the potential of graph-powered security. Graph RAG is an open-source repository on Github maintained by Microsoft that you can experiment with on your own.