Blog Post

AI - Azure AI services Blog
4 MIN READ

From Vector Databases to Integrated Vector Databases: Revolutionizing AI-Powered Search

srikantan's avatar
srikantan
Icon for Microsoft rankMicrosoft
Jan 14, 2025

This post explores how Integrated Vector Databases revolutionize AI-powered search by seamlessly combining structured and unstructured data, enabling real-time hybrid analytics. It also highlights the power of building autonomous agents using LangGraph, showcasing their ability to deliver seamless, intelligent user experiences.

Semantic Search and Vector Search have been pivotal capabilities powering AI Assistants driven by Generative AI. They excel when dealing with unstructured data—such as PDF documents, text files, or Word documents—where embeddings can unlock contextually rich and meaningful search results.

But what happens when the data ecosystem is more complex? Imagine structured data like customer feedback ratings for timeliness, cleanliness, and professionalism intertwined with unstructured textual comments. To extract actionable insights, such as identifying service quality improvements across centers, traditional vector search alone won’t suffice. Enter Integrated Vector Databases.

What Makes Integrated Vector Databases a Game-Changer?

Unlike traditional vector databases that require frequent incremental updates of indexes stored separately from the original data, integrated vector databases seamlessly combine structured and unstructured data within the same environment. This integration eliminates the need for periodic indexing runs, enabling real-time search and analytics with reduced overhead. Furthermore, data and embeddings co-reside, streamlining workflows and improving query performance.

Major cloud providers, including Azure, now offer managed Integrated Vector Databases such as Azure SQL Database, Azure PostgreSQL Database, and Azure Cosmos DB. This evolution is critical for scenarios that require hybrid search capabilities across both structured and unstructured data.

A Real-World Scenario: Hybrid Feedback Analysis

To showcase the power of Integrated Vector Databases, let’s dive into a practical application: customer feedback analysis for a service business.

Here’s what this entails:

  1. Structured Data: Ratings on aspects like overall work quality, timeliness, politeness, and cleanliness.
  2. Unstructured Data: Free-flowing textual feedback from customers.

Using Python, the feedback is stored in an Azure SQL Database, with embeddings generated for the textual comments via Azure OpenAI’s embedding model. The data is then inserted into the database using a stored procedure, combining the structured ratings with vectorized embeddings for efficient retrieval and analysis.

Key Code Highlights

1. Generating Embeddings: The get_embedding function interfaces with Azure OpenAI to convert textual feedback from Customer input into vector embeddings:

def get_embedding(text):
    url = f"{az_openai_endpoint}openai/deployments/{az_openai_embedding_deployment_name}/embeddings?api-version=2023-05-15"
    response = requests.post(url, headers={"Content-Type": "application/json", "api-key": az_openai_key}, json={"input": text})
    return response.json()["data"][0]["embedding"] if response.status_code == 200 else raise Exception("Embedding failed")

2. Storing Feedback: A stored procedure inserts both structured ratings and text embeddings into the database:

 # Call the stored procedure
        stored_procedure = """
        EXEC InsertServiceFeedback ?, ?, ?, ?, ?, ?, ?, ?, ?, ?
        """
        cursor.execute(
            stored_procedure,
            (
                schedule_id,
                customer_id,
                feedback_text,
                json.dumps(json.loads(str(get_embedding(feedback_text)))),
                rating_quality_of_work,
                rating_timeliness,
                rating_politeness,
                rating_cleanliness,
                rating_overall_experience,
                feedback_date,
            ),
        )
        connection.commit()
        print("Feedback inserted successfully.")
        response_message = (
            "Service feedback captured successfully for the schedule_id: " + str(schedule_id)
        )

Building an Autonomous Agent with LangGraph

The next step is building an intelligent system that automates operations based on customer input. Here’s where LangGraph, a framework for Agentic Systems, shines. The application we’re discussing empowers customers to:

  1. View available service appointment slots.
  2. Book service appointments.
  3. Submit feedback post-service.
  4. Search for information using an AI-powered search index over product manuals.

What Makes This Agent Special?

This agent exhibits autonomy through:

  • Tool Calling: Based on customer input and context, it decides which tools to invoke without manual intervention.
  • State Awareness: The agent uses a centralized state object to maintain context (e.g., customer details, past service records, current datetime) for dynamic tool execution.
  • Natural Interactions: Customer interactions are processed naturally, with no custom logic required to integrate data or format inputs.

For example, when a customer provides feedback, the agent autonomously:

  • Prompts for all necessary details.
  • Generates embeddings for textual feedback.
  • Inserts the data into the Integrated Vector Database after confirming the input.

 

Code Walkthrough: Creating the Agent

1. Define Tools: Tools are the building blocks of the agent, enabling operations like fetching service slots or storing feedback:

tools = [
    store_service_feedback,
    fetch_customer_information,
    get_available_service_slots,
    create_service_appointment_slot,
    perform_search_based_qna,
]

2. Define State: State ensures the agent remembers user context across interactions:

class State(TypedDict):
    messages: list[AnyMessage]
    customer_info: str
    current_datetime: str



# fetch the customer information from the database and load that into the context in the State
def customer_info(state: State):
    if state.get("customer_info"):
        return {"customer_info": state.get("customer_info")}
    else:
        state["customer_info"] = fetch_customer_information.invoke({})
        return {"customer_info": state.get("customer_info")}

 

3. Build the Graph: LangGraph’s state graph defines how tools, states, and prompts interact:

builder = StateGraph(State)
builder.add_node("chatbot", Assistant(service_scheduling_runnable))
builder.add_node("fetch_customer_info", customer_info)
builder.add_edge("fetch_customer_info", "chatbot")
builder.add_node("tools", tool_node)
builder.add_edge(START, "fetch_customer_info")
builder.add_edge("tools", "chatbot")
graph = builder.compile()

There is no custom code required to invoke the tools. It is automatically done based on the intent in the Customer input.

 

4. Converse with the Agent: The application seamlessly transitions between tools based on user input and state:

def stream_graph_updates(user_input: str):
    events = graph.stream(
        {"messages": [("user", user_input)]},
        config,
        subgraphs=True,
        stream_mode="values",
    )
    l_events = list(events)
    msg = list(l_events[-1])
    r1 = msg[-1]["messages"]
    # response_to_user = msg[-1].messages[-1].content

    print(r1[-1].content)


while True:
    try:
        user_input = input("User: ")
        if user_input.lower() in ["quit", "exit", "q"]:
            print("Goodbye!")
            break

        stream_graph_updates(user_input)
    except Exception as e:
        print("An error occurred:", e)
        traceback.print_exc()
        # stream_graph_updates(user_input)
        break

Agent Demo

See a demo of this app in action here:

The source code of this Agent App is available in this GitHub Repo

Conclusion

The fusion of Integrated Vector Databases with LangGraph’s agentic capabilities unlocks a new era of AI-powered applications. By unifying structured and unstructured data in a single system and empowering agents to act autonomously, organizations can streamline workflows and gain deeper insights from their data.

This approach demonstrates the power of evolving from simple vector search to hybrid, integrated systems—paving the way for smarter, more autonomous AI solutions.

Updated Jan 14, 2025
Version 1.0
No CommentsBe the first to comment