By Khye Wei (Azure AI Search) & Amna Mubashar (Haystack)
We’re excited to announce the integration of Haystack with Azure AI Search! To demonstrate its capabilities, we’ll walk you through building an interactive review agent to efficiently retrieve and analyze customer reviews. By combining Azure AI Search’s hybrid retrieval with Haystack’s flexible pipeline architecture, this agent provides deeper insights through sentiment analysis and intelligent summarization tools.
Why Use Azure AI Search with Haystack?
Azure AI Search offers an enterprise-grade retrieval system with battle-tested AI search technology, built for high performance GenAI applications at any scale:
- Hybrid Search: Combining keyword-based BM25 and vector-based searches with reciprocal rank fusion (RRF).
- Semantic Ranking: Enhancing retrieval results using deep learning models.
- Scalability: Supporting high-performance GenAI applications.
- Secure, Enterprise-ready Foundation: Powering interactive experiences at scale on a trusted foundation.
Haystack complements Azure AI Search by providing an end-to-end framework that enables:
- Modular Architecture: Easily swap or configure components like document retrieval, language models, and pipelines to build customized AI applications.
- Flexible Pipeline Design: Adapt pipelines to various data flows and use cases.
- Scalable and Reproducible: Ensure consistent performance across deployments with reliable and scalable pipelines.
- Tools and Agentic Pipelines: Build sophisticated pipelines that let AI models interact with external functions and structured tools.
Indexing and Retrieval with Azure AI Search and Haystack
In this blog, we’ll demonstrate how to create an end-to-end pipeline that combines Haystack with Azure AI Search to process customer reviews. By enabling semantic search and leveraging Haystack Tools for interactive sentiment analysis and summarization, you can quickly uncover deeper insights on your data.
You can find the full working example and code in the linked recipe from our cookbook.
We’ll use an open-source customer reviews dataset from Kaggle (link to the dataset). The process includes:
- Converting the dataset to Haystack Documents and preparing it using Haystack preprocessors.
- Indexing the documents using AzureAISearchDocumentStore, with semantic search enabled.
- Building a query pipeline that leverages Azure AI Search’s hybrid retrieval.
- Creating an interactive review assistant that uses a custom sentiment analysis tool to provide insights.
Data Preparation
First, we read the data and convert it to JSON for efficient indexing:
import pandas as pd
from json import loads
path = "<path to dataset file>"
df = pd.read_csv(path, encoding='latin1',nrows=200)
df.rename(columns={'review-label': 'rating'}, inplace=True)
df['year'] = pd.to_datetime(df['year'], format='%Y %H:%M:%S').dt.year
# Convert DataFrame to JSON
json_data = {"reviews": loads(df.to_json(orient="records"))}
Next, we use Haystack’s JSONConverter to extract reviews as Haystack Documents, choosing which columns will be stored as metadata:
from haystack.components.converters import JSONConverter
from haystack.dataclasses import ByteStream
from json import dumps
converter = JSONConverter(
jq_schema=".reviews[]",
content_key="review",
extra_meta_fields={"store_location", "date", "month", "year", "rating"}
)
source = ByteStream.from_string(dumps(json_data))
documents = converter.run(sources=[source])['documents']
We apply Haystack’s DocumentCleaner to ensure the text is ASCII-only and remove any unwanted characters:
from haystack.components.preprocessors import DocumentCleaner
cleaner = DocumentCleaner(ascii_only=True, remove_regex="i12i12i12")
cleaned_documents = cleaner.run(documents=documents)
With the data loaded and cleaned, we’re ready to move on to indexing these documents in Azure AI Search.
Indexing Documents with Azure AI Search
Initialize the AzureAISearchDocumentStore with the desired metadata fields and semantic configuration enabled.
from azure.search.documents.indexes.models import (
SemanticConfiguration, SemanticField, SemanticPrioritizedFields, SemanticSearch
)
from haystack_integrations.document_stores.azure_ai_search import AzureAISearchDocumentStore
semantic_config = SemanticConfiguration(
name="my-semantic-config",
prioritized_fields=SemanticPrioritizedFields(
content_fields=[SemanticField(field_name="content")]
)
)
semantic_search = SemanticSearch(configurations=[semantic_config])
document_store = AzureAISearchDocumentStore(
index_name="customer-reviews-analysis",
azure_endpoint="https://your-search-service.search.windows.net",
api_key="YOUR_AZURE_API_KEY",
embedding_dimension=1536,
metadata_fields={"month": int, "year": int, "rating": int, "store_location": str},
semantic_search=semantic_search
)
Now, we build an indexing pipeline that uses AzureAIDocumentEmbeddder to generate document embeddings and store them in the index. These embeddings are essential for hybrid retrieval, which combines vector retrieval with semantic search.
from haystack import Pipeline
from haystack.components.embedders import AzureOpenAIDocumentEmbedder
from haystack.components.writers import DocumentWriter
indexing_pipeline = Pipeline()
indexing_pipeline.add_component("document_embedder", AzureOpenAIDocumentEmbedder())
indexing_pipeline.add_component(instance=DocumentWriter(document_store=document_store), name="doc_writer")
indexing_pipeline.connect("document_embedder", "doc_writer")
indexing_pipeline.run({"document_embedder": {"documents": cleaned_documents["documents"]}})
Querying with Hybrid Retrieval
After indexing, we can query our data with a hybrid retrieval pipeline that uses the AzureAISearchHybridRetriever. The query is embedded with the AzureOpenAITextEmbedder and the embeddings are passed to the retriever along with the semantic configuration.
from haystack_integrations.components.retrievers.azure_ai_search import AzureAISearchHybridRetriever
from haystack.components.embedders import AzureOpenAITextEmbedder
query_pipeline = Pipeline()
query_pipeline.add_component("text_embedder", AzureOpenAITextEmbedder())
query_pipeline.add_component("retriever", AzureAISearchHybridRetriever(
document_store=document_store,
query_type="semantic",
semantic_configuration_name="my-semantic-config"
))
query_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")
query = "Which reviews are positive?"
result = query_pipeline.run({"text_embedder": {"text": query}, "retriever": {"query": query}})
print(result["retriever"]["documents"])
Building an Interactive Feedback Review Agent with Tools
After retrieving the relevant documents, we can create an interactive agent-based workflow using Haystack Tools. Our feedback review agent includes two specialized tools:
- An Aspect-Based Sentiment Analysis (ABSA) Tool (review_analysis): This tool examines specific aspects of reviews—such as product quality, shipping, customer service, and pricing—using the VADER sentiment analyzer. It computes sentiment scores, normalizes them to a 1–5 scale, and compares them against the original user-provided rating. Finally, it generates a visualization that highlights these comparisons.
- A Summarization Tool (review_summarization): This tool leverages Latent Semantic Analysis (LSA) to extract key sentences from each review, enabling quick scanning of main ideas and recurring themes to generate summaries.
The agent decides which tool to invoke based on user requests, making the review analysis process more flexible, automated, and intuitive. In the code sample below, we demonstrate how to configure these tools, integrate them into a Haystack pipeline, and create the above agentic review assistant.
Configuring Sentiment Analysis Tool
from haystack.tools import Tool
from haystack.components.tools import ToolInvoker
from typing import Dict, List
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
def analyze_sentiment(reviews: List[Dict]) -> Dict:
"""
Function that performs aspect-based sentiment analysis.
For each review that mentions keywords related to a specific topic, the function computes
sentiment scores using VADER and categorizes the sentiment as 'positive', 'negative', or 'neutral'.
It also normalizes the compound score to a 1-5 scale and then displays a bar chart comparing
the normalized analyzer rating to the original review rating.
"""
topics = {
"product_quality": [],
"shipping": [],
"customer_service": [],
"pricing": []
}
# Define keywords for each topic
keywords = {
"product_quality": ["quality", "material", "design", "fit", "size", "color", "style"],
"shipping": ["shipping", "delivery", "arrived"],
"customer_service": ["service", "support", "help"],
"pricing": ["price", "cost", "expensive", "cheap"]
}
# Store the sentiment distribution based on ratings
sentiments = {"positive": 0, "negative": 0, "neutral": 0}
for review in reviews:
rating = review.get("rating", 3)
if rating >= 4:
sentiments["positive"] += 1
elif rating <= 2:
sentiments["negative"] += 1
else:
sentiments["neutral"] += 1
# Initialize the VADER sentiment analyzer
analyzer = SentimentIntensityAnalyzer()
for review in reviews:
text = review.get("review", "").lower()
for topic, words in keywords.items():
if any(word in text for word in words):
# Compute sentiment scores using VADER
sentiment_scores = analyzer.polarity_scores(text)
compound = sentiment_scores['compound']
# Normalize compound score from [-1, 1] to [1, 5]
normalized_score = ((compound + 1) / 2) * 4 + 1
if compound >= 0.05:
sentiment_label = 'positive'
elif compound <= -0.05:
sentiment_label = 'negative'
else:
sentiment_label = 'neutral'
# Append the review along with its sentiment analysis result
topics[topic].append({
"review": review,
"sentiment": {
"analyzer_rating": normalized_score,
"label": sentiment_label
}
})
# Create the aspect-based sentiment analysis tool
sentiment_tool = Tool(
name="review_analysis",
description="Aspect based sentiment analysis tool that compares the sentiment of reviews by analyzer and rating",
function=analyze_sentiment,
parameters={
"type": "object",
"properties": {
"reviews": {
"type": "array",
"items": {
"type": "object",
"properties": {
"review": {"type": "string"},
"rating": {"type": "integer"},
"date": {"type": "string"}
}
}
},
},
"required": ["reviews"]
}
)
Configuring Summarization Tool
from sumy.parsers.plaintext import PlaintextParser
from sumy.nlp.tokenizers import Tokenizer
from sumy.summarizers.lsa import LsaSummarizer
from typing import Dict, List
def summarize_reviews(reviews: List[Dict]) -> Dict:
"""
Summarize the reviews by extracting key sentences.
"""
summaries = []
summarizer = LsaSummarizer()
for review in reviews:
text = review.get("review", "")
parser = PlaintextParser.from_string(text, Tokenizer("english"))
summary = summarizer(parser.document, 2) # Extract 2 sentences, adjust as needed
summary_text = " ".join(str(sentence) for sentence in summary)
summaries.append({"review": text, "summary": summary_text})
return {"summaries": summaries}
# Create the text summarization tool
summarization_tool = Tool(
name="review_summarization",
description="Tool to summarize customer reviews by extracting key sentences.",
function=summarize_reviews,
parameters={
"type": "object",
"properties": {
"reviews": {
"type": "array",
"items": {
"type": "object",
"properties": {
"review": {"type": "string"},
"rating": {"type": "integer"},
"date": {"type": "string"}
}
}
},
},
"required": ["reviews"]
}
)
Creating Interactive Review Agent
Using Haystack’s chat architecture, we register both tools—review_analysis and review_summarization—within AzureOpenAIChatGenerator. The agent then autonomously decides which tool to invoke based on the user’s query. If someone asks for a summary, it calls the review_summarization tool; if they want deeper, aspect-based sentiment insights, it calls the review_analysis tool.
from haystack.dataclasses import ChatMessage
from haystack.components.generators.chat import AzureOpenAIChatGenerator
def create_review_agent():
"""Creates an interactive review analysis assistant"""
chat_generator = AzureOpenAIChatGenerator(
tools=[sentiment_tool, summarization_tool]
)
system_message = ChatMessage.from_system(
"""
You are a customer review analysis expert. Your task is to perform aspect based sentiment analysis on customer reviews.
You can use two tools to get insights:
- review_analysis: to get the sentiment of reviews by analyzer and rating
- review_summarization: to get the summary of reviews.
Depending on the user's question, use the appropriate tool to get insights and explain them in a helpful way.
"""
)
return chat_generator, system_message
tool_invoker = ToolInvoker(tools=[sentiment_tool, summarization_tool])
# Example of how you might set up an interactive loop
chat_generator, system_message = create_review_agent()
messages = [system_message]
# Pseudocode for an interactive session
while True:
user_input = input("\n\nwaiting for input (type 'exit' or 'quit' to stop)\n: ")
if user_input.lower() in ["exit", "quit"]:
break
messages.append(ChatMessage.from_user(user_input))
print("🧑: ", user_input)
# This example references some 'retrieved_reviews' in the prompt
user_prompt = ChatMessage.from_user(f"""
{user_input}
Here are the reviews:
{{retrieved_reviews}}
analysis_type: "topics"
""")
messages.append(user_prompt)
while True:
print("⌛ iterating...")
replies = chat_generator.run(messages=messages)["replies"]
messages.extend(replies)
# Check for any tool calls
if not replies[0].tool_calls:
break
tool_calls = replies[0].tool_calls
# Print tool calls
for tc in tool_calls:
print("\n TOOL CALL:")
print(f"\t{tc.tool_name}")
# Execute the tool calls
tool_messages = tool_invoker.run(messages=replies)["tool_messages"]
messages.extend(tool_messages)
# Print the final AI response
print(f"🤖: {messages[-1].text}")
Get started today
Interested in building your own agentic RAG applications? Follow these steps:
- Explore Azure AI Search: Learn more about all the latest features. Try Azure AI Search for free - Azure AI Search | Microsoft Learn
- Dive into Haystack Documentation: Familiarize yourself with the framework’s capabilities and learn how to develop custom AI pipelines with agentic behavior.
- Try Haystack - Azure AI Search Integration: Combine the strengths of both platforms to create innovative, AI-driven search applications. Check out the integration docs.
- Engage with the Community: Join Haystack Discord channel and Azure developer community to share insights, ask questions, and collaborate on projects.