azure openai service
135 TopicsIntroducing the GPT-4o-Audio-Preview: A New Era of Audio-Enhanced AI Interaction
We are thrilled to announce the release of audio support accessible via Chat Completions API featuring the new GPT-4o-Audio preview Model, now available in preview. Building on to our recent launch of GPT-4o-Realtime-Preview, this groundbreaking addition to the GPT-4o family introduces support for audio prompts and the ability to generate spoken audio responses. This expansion enhances the potential for AI applications in text and voice-based interactions and audio analysis. Starting today, developers can unlock immersive, voice-driven experiences by harnessing the advanced capabilities of GPT-4o-Audio-Preview, now in public preview. Key Benefits of GPT-4o-Audio-Preview Chat Completions API with GPT-4o-Audio Preview model is designed to transform the way users interact with AI by incorporating natural audio elements, adding depth to applications that require nuanced understanding and response generation. Engaging Spoken Summaries: GPT-4o-Audio-Preview can generate spoken summaries from text content, offering a dynamic, engaging way to present information. This feature is ideal for applications that benefit from audio-based delivery, such as digital assistants, interactive training modules, and accessibility solutions. Sentiment Analysis from Audio: With the ability to detect sentiment in audio recordings, this model can analyze vocal nuances and translate them into meaningful, text-based insights. This is particularly valuable for customer service and support applications, where understanding tone and mood can enhance user satisfaction and personalize responses. Asynchronous Speech-In, Speech-Out Interactions: GPT-4o-Audio-Preview enables seamless asynchronous voice interactions, supporting applications where users can submit spoken queries or commands and receive spoken responses at a later time. This capability enhances user convenience and opens up possibilities for hands-free, voice-enabled applications in diverse environments. Exploring Real-World Application of GPt-4o-Audio-Preview 1. Create Immersive Stories from Existing Text With the GPT-4o-Audio-Preview model, businesses can revolutionize content delivery by converting text articles into engaging spoken summaries. This feature caters to users who prefer listening over reading, creating a more immersive storytelling experience. For example, news websites can offer audio summaries of their articles, allowing users to stay informed while driving, exercising, or multitasking. 2. Improve Customer Support via Audio Analysis Understanding customer sentiment is crucial for enhancing service quality and user satisfaction. GPT-4o-Audio-Preview can analyze recorded customer conversations to detect sentiment and emotional nuances. This capability helps businesses identify areas of improvement, personalize responses, and develop more effective customer support strategies. For instance, a call center can use this technology to assess the mood of customers during interactions and adjust their approach accordingly. 3. Enhance Interactive Education and Training Modules Educational institutions and corporations can leverage GPT-4o-Audio-Preview to create interactive and dynamic training modules. This model can generate spoken explanations, quizzes, and feedback, making learning more engaging and accessible. For example, an online course platform can offer audio-based lessons and assessments that cater to auditory learners, enhancing the overall educational experience. Comparing Realtime API to Chat Completions API The GPT 4o models associated with Realtime API and Chat Completions API both support audio and speech capabilities, each offering unique functionalities for AI-driven user experiences. However, they serve distinct purposes: Realtime API with model GPT-4o-Realtime-Preview: Optimized for real-time, low-latency conversations, focusing on enabling natural back-and-forth interactions with minimal delay, ideal for chatbots and conversational AI systems. Chat Completions API with model GPT-4o-Audio-Preview: Tailored for processing and generating audio content, supporting advanced features like speech recognition and audio synthesis, making it ideal for asynchronous speech-in, speech-out interactions and audio sentiment analysis. Ready to get started? Learn more about Azure OpenAI Service Try it out with Azure AI Foundry260Views1like0CommentsWorking with the Realtime API of gpt-4o in python
This post is organized into sections that cover how to: Connect to the Realtime API Handle audio conversations Handle text conversations Handle tool calling The sample web application is built using Chainlit. Connecting to the Realtime API Refer to the code snippet below to establish a WebSocket connection to the Server (API). After establishing that: 1. Implement the receive function to accept responses from the Server. It is used to handle the response content from the server, be it audio or text. More details on this function are provided later in the post, under each section. url = f"{base_url}openai/realtime?api-version={api_version}&deployment={model_name}&api-key={api_key}" async def connect(self): """Connects the client using a WS Connection to the Realtime API.""" if self.is_connected(): # raise Exception("Already connected") self.log("Already connected") self.ws = await websockets.connect( url, additional_headers={ "Authorization": f"Bearer {api_key}", "OpenAI-Beta": "realtime=v1", }, ) print(f"Connected to realtime API....") asyncio.create_task(self.receive()) await self.update_session() 2. Send a client event - update session, to set session level configurations like the system prompt the model should use, the choice of using text or speech or both during the conversation, the neural voice to use in the response, and so forth. self.system_prompt = system_prompt self.event_handlers = defaultdict(list) self.session_config = { "modalities": ["text", "audio"], "instructions": self.system_prompt, "voice": "shimmer", "input_audio_format": "pcm16", "output_audio_format": "pcm16", "input_audio_transcription": {"model": "whisper-1"}, "turn_detection": { "type": "server_vad", "threshold": 0.5, "prefix_padding_ms": 300, "silence_duration_ms": 500, # "create_response": True, ## do not enable this attribute, since it prevents function calls from being detected }, "tools": tools_list, "tool_choice": "auto", "temperature": 0.8, "max_response_output_tokens": 4096, } Handling audio conversation 1. Capture user voice input Chainlit provides events to capture the user voice input from the microphone. .on_audio_chunk async def on_audio_chunk(chunk: cl.InputAudioChunk): openai_realtime: RTWSClient = cl.user_session.get("openai_realtime") if openai_realtime: if openai_realtime.is_connected(): await openai_realtime.append_input_audio(chunk.data) else: print("RealtimeClient is not connected") 2. Process the user voice input a) Convert the audio input received in the previous step to a base64 encoded string. Send the Client event input_audio_buffer.append to the Server, with this audio payload. async def append_input_audio(self, array_buffer): # Check if the array buffer is not empty and send the audio data to the input buffer if len(array_buffer) > 0: await self.send( "input_audio_buffer.append", { "audio": array_buffer_to_base64(np.array(array_buffer)), }, ) b) Once the Server is done receiving the audio chunks, it sends an input_audio_buffer.committed event. Once this event is picked up in the receive function, c) send a Client Event response.create to the Server to elicit a response. async def receive(self): async for message in self.ws: event = json.loads(message) ................................ elif event["type"] == "input_audio_buffer.committed": # user has stopped speaking. The audio delta input from the user captured till now should now be processed by the server. # Hence we need to send a 'response.create' event to signal the server to respond await self.send("response.create", {"response": self.response_config}) ................................. 3. Receiving the response audio Once the response audio events start flowing in from the server: Handle the Server event response.audio.delta, by converting the audio chunks from a base64 encoded string to bytes. Relay this to the UI to play the audio chunks over the speaker. The dispatch function is used to raise this event (see snippet below). async def receive(self): async for message in self.ws: event = json.loads(message) ............................ if event["type"] == "response.audio.delta": # response audio delta events received from server that need to be relayed # to the UI for playback delta = event["delta"] array_buffer = base64_to_array_buffer(delta) append_values = array_buffer.tobytes() _event = {"audio": append_values} # send event to chainlit UI to play this audio self.dispatch("conversation.updated", _event) elif event["type"] == "response.audio.done": # server has finished sending back the audio response to the user query # let the chainlit UI know that the response audio has been completely received self.dispatch("conversation.updated", event) .......................... Play the received audio chunks The Chainlit UI then plays this audio out over the speaker. async def handle_conversation_updated(event): """Used to play the response audio chunks as they are received from the server.""" _audio = event.get("audio") if _audio: await cl.context.emitter.send_audio_chunk( cl.OutputAudioChunk( mimeType="pcm16", data=_audio, track=cl.user_session.get("track_id") ) ) Handling text conversation 1. Capture user text input Apart from handling audio conversation, we can handle the associated transcripts from the audio response, so that the user can have a 'multi modal' way of interacting with the AI Assistant. Chainlit provides events to capture the user input from the chat interface .on_message async def on_message(message: cl.Message): openai_realtime: RTWSClient = cl.user_session.get("openai_realtime") if openai_realtime and openai_realtime.is_connected(): await openai_realtime.send_user_message_content( [{"type": "input_text", "text": message.content}] ) else: await cl.Message( content="Please activate voice mode before sending messages!" ).send() 2. Process the user text input With the user text input received above: 1. Send a Client Event conversation.item.create to the Server with the user text input in the payload. 2. Follow that up with a Client Event response.create event to the Server to elicit a response. 3. Raise a custom event 'conversation.interrupted' to the UI so that it can stop playing any audio response from the previous user query. async def send_user_message_content(self, content=[]): if content: await self.send( "conversation.item.create", { "item": { "type": "message", "role": "user", "content": content, } }, ) # this is the trigger to the server to start responding to the user query await self.send("response.create", {"response": self.response_config}) # raise this event to the UI to pause the audio playback, in case it is doing so already, # when the user submits a query in the chat interface _event = {"type": "conversation_interrupted"} # signal the UI to stop playing audio self.dispatch("conversation.interrupted", _event) 3. Receiving the text response Use the Server Event response.audio_transcript.delta to get the stream of the text data response. This is a transcription of what is already playing as audio on the UI. Relay this data to the UI through a custom event, to populate the chat conversation. The response text gets streamed and displayed in the Chainlit UI. async def receive(self): async for message in self.ws: .................................. elif event["type"] == "response.audio_transcript.delta": # this event is received when the transcript of the server's audio response to the user has started to come in. # send this to the UI to display the transcript in the chat window, even as the audio of the response gets played delta = event["delta"] item_id = event["item_id"] _event = {"transcript": delta, "item_id": item_id} # signal the UI to display the transcript of the response audio in the chat window self.dispatch("conversation.text.delta", _event) elif ( event["type"] == "conversation.item.input_audio_transcription.completed" ): ............................... Handling Tool calling As a part of the Session Update event discussed earlier, we pass a payload of the tools (functions) that this Assistant has access to. In this application, I am using a search function implemented using Tavily. self.session_config = { "modalities": ["text", "audio"], "instructions": self.system_prompt, "voice": "shimmer", ..................... "tools": tools_list, "tool_choice": "auto", "temperature": 0.8, "max_response_output_tokens": 4096, } The function definition and implementation used in this sample application: tools_list = [ { "type": "function", "name": "search_function", "description": "call this function to bring upto date information on the user's query when it pertains to current affairs", "parameters": { "type": "object", "properties": {"search_term": {"type": "string"}}, "required": ["search_term"], }, } ] # Function to perform search using Tavily def search_function(search_term: str): print("performing search for the user query > ", search_term) return TavilySearchResults().invoke(search_term) available_functions = {"search_function": search_function} Handling the response from tool calling When a user request entails a function call, the Server Event response.done does not return an audio. It instead returns the functions that match the intent, along with the arguments to invoke it. In the 'receive' function, check for function call hints in the response. Get the function name and arguments from the response Invoke the function and get the response Send Client Event conversation.item.create to the server with the function call output Follow that up with Client Event response.create to elicit a response from the Server that will then be played out as audio and text. async def receive(self): async for message in self.ws: ........................................................... elif event["type"] == "response.done": ........................................... if "function_call" == output_type: function_name = ( event.get("response", {}) .get("output", [{}])[0] .get("name", None) ) arguments = json.loads( event.get("response", {}) .get("output", [{}])[0] .get("arguments", None) ) tool_call_id = ( event.get("response", {}) .get("output", [{}])[0] .get("call_id", None) ) function_to_call = available_functions[function_name] # invoke the function with the arguments and get the response response = function_to_call(**arguments) print( f"called function {function_name}, and the response is:", response, ) # send the function call response to the server(model) await self.send( "conversation.item.create", { "item": { "type": "function_call_output", "call_id": tool_call_id, "output": json.dumps(response), } }, ) # signal the model(server) to generate a response based on the function call output sent to it await self.send( "response.create", {"response": self.response_config} ) ............................................... Reference links: Watch a short video of this sample application here The Documentation on the Realtime API is available here The GitHub Repo for the application in this post is available here806Views1like3CommentsExciting Update: Abstractive Summarization in Azure AI Language Now Powered by Phi-3.5-mini! 🎉
We’re thrilled to announce that the summarization capability in Azure AI Language service has started transitioning to industry widely accepted Small Language Model (abbr. SLM) and Large Language Model (abbr. LLM) models, starting with transitioning document and text abstractive summarization to Phi-3.5-mini! Why It Matters This upgrade marks a significant advancement in fully embracing rapidly evolving Gen AI technologies. It is aligned with our commitment to enabling customers to concentrate on their core business needs while delegating the complex tasks of model maintenance, finetuning and engineering to our services, and interact with our service with strong typed APIs. We remain committed to updating the base models and ensuring our customers always receive the best performance for summarization tasks, including GPT-4o models and OpenAI’s o3 models. We are confident that this transition will offer significant advantages, and we eagerly anticipate your feedback as we continue to enhance our services. What Does This Mean for You Enhanced Performance: A fine-tuned Phi-3.5-mini model boosted the production performance by 9%. Key highlights include enhanced understanding capability with improved common-sense reasoning, as well as smoother and more reliable summary generation. Resource Optimization: By leveraging the compact yet powerful Phi-3.5-mini, we ensure better utilization of resources while maintaining quality, and availability in container. More Enterprise and Compliance Features: lower hallucinations, stronger Responsible AI, enhanced scaling support, etc. Notably the new production model largely reduced hallucination by 78%. Stay tuned for more updates as we continue this transition to all summarization tasks, and bring additional enhancements to our services! How To Utilize This Advancement Using it in Cloud, you don’t need to do anything special to benefit from this advancement. Your document and text abstractive summarization requests will be served with the finetuned Phi-3.5 mini model. Using container, please download the latest version on the mcr.microsoft.com container registry syndicate. The fully qualified container image name is mcr.microsoft.com/azure-cognitive-services/textanalytics/summarization. Thank you for your continued trust in our products, and we welcome your feedback as we strive to continuously improve our services. For more details and resources, please explore the following links: - Learn more about our summarization solution in Documentation - Get started with the summarization container by visiting Documentation - Try it out with AI Foundry for a code-free experience - Explore Azure AI Language and its various capabilities in Documentation380Views1like0CommentsDify work with Microsoft AI Search
Please refer to my repo to get more AI resources, wellcome to star it: https://github.com/xinyuwei-david/david-share.git This article if from one of my repo: https://github.com/xinyuwei-david/david-share/tree/master/LLMs/Dify-With-AI-Search Dify work with Microsoft AI Search Dify is an open-source platform for developing large language model (LLM) applications. It combines the concepts of Backend as a Service (BaaS) and LLMOps, enabling developers to quickly build production-grade generative AI applications. Dify offers various types of tools, including first-party and custom tools. These tools can extend the capabilities of LLMs, such as web search, scientific calculations, image generation, and more. On Dify, you can create more powerful AI applications, like intelligent assistant-type applications, which can complete complex tasks through task reasoning, step decomposition, and tool invocation. Dify works with AI Search Demo Till now, Dify could not integrate with Microsoft directly via default Dify web portal. Let me show how to achieve it. Please click below pictures to see my demo video on Yutube: https://www.youtube.com/watch?v=20GjS6AtjTo Dify works with AI Search Configuration steps Configure on AI search Create index, make sure you could get the result from AI search index: Run dify on VM via docker: root@a100vm:~# docker ps |grep -i dify 5d6c32a94313 langgenius/dify-api:0.8.3 "/bin/bash /entrypoi…" 3 months ago Up 3 minutes 5001/tcp docker-worker-1 264e477883ee langgenius/dify-api:0.8.3 "/bin/bash /entrypoi…" 3 months ago Up 3 minutes 5001/tcp docker-api-1 2eb90cd5280a langgenius/dify-sandbox:0.2.9 "/main" 3 months ago Up 3 minutes (healthy) docker-sandbox-1 708937964fbb langgenius/dify-web:0.8.3 "/bin/sh ./entrypoin…" 3 months ago Up 3 minutes 3000/tcp docker-web-1 Create customer tool in Dify portal,set schema: schema details: { "openapi": "3.0.0", "info": { "title": "Azure Cognitive Search Integration", "version": "1.0.0" }, "servers": [ { "url": "https://ai-search-eastus-xinyuwei.search.windows.net" } ], "paths": { "/indexes/wukong-doc1/docs": { "get": { "operationId": "getSearchResults", "parameters": [ { "name": "api-version", "in": "query", "required": true, "schema": { "type": "string", "example": "2024-11-01-preview" } }, { "name": "search", "in": "query", "required": true, "schema": { "type": "string" } } ], "responses": { "200": { "description": "Successful response", "content": { "application/json": { "schema": { "type": "object", "properties": { "@odata.context": { "type": "string" }, "value": { "type": "array", "items": { "type": "object", "properties": { "@search.score": { "type": "number" }, "chunk_id": { "type": "string" }, "parent_id": { "type": "string" }, "title": { "type": "string" }, "chunk": { "type": "string" }, "text_vector": { "type": "SingleCollection" }, } } } } } } } } } } } } } Set AI Search AI key: Do search test: Input words: Create a workflow on dify: Check AI search stage: Check LLM stage: Run workflow: Get workflow result:828Views0likes0CommentsIntroducing Azure AI Agent Service
Introducing Azure AI Agent Service at Microsoft Ignite 2024 Discover how Azure AI Agent Service is revolutionizing the development and deployment of AI agents. This service empowers developers to build, deploy, and scale high-quality AI agents tailored to business needs within hours. With features like rapid development, extensive data connections, flexible model selection, and enterprise-grade security, Azure AI Agent Service sets a new standard in AI automation43KViews9likes2CommentsAnnouncing Data Zones for Azure OpenAI Service Batch
In Nov 2024, we announced Data Zones on Azure OpenAI Service. Today, we’re excited to expand that support to Azure OpenAI Service Batch. Data Zone Batch deployments. They enable you to utilize Azure’s global infrastructure to dynamically route traffic to data centers within the Microsoft- defined data zones, ensuring optimal availability for each request. Azure OpenAI Data Zones is a new deployment option that provides enterprises with even more flexibility and control over their data privacy and residency needs. Tailored for organizations in the United States and European Union, Data Zones allow customers to process and store their data within specific geographic boundaries, ensuring compliance with regional data residency requirements while maintaining optimal performance. By spanning multiple regions within these areas, Data Zones offer a balance between the cost-efficiency of global deployments and the control of regional deployments, making it easier for enterprises to manage their AI applications without sacrificing security or speed. Models supported at launch Model Version gpt-4o 2024-08-06 gpt4o-mini 2024-07-18 Support for newer models will be continuously added. Pricing Data Zone Batch will have 50% discount on Data Zone standard pricing. Get started Ready to try Data Zone support in Azure OpenAI Service Batch API? Take it for a spin here. Learn more Deployment types in Azure OpenAI Service Azure OpenAI Service Batch815Views0likes0CommentsUnlock Multimodal Data Insights with Azure AI Content Understanding: New Code Samples Available
We are excited to share code samples that leverage the Azure AI Content Understanding service to help you extract insights from your images, documents, videos, and audio content. These code samples are available on GitHub and cover the following: Azure AI integrations Visual Document Search: Leverage Azure Document Intelligence, Content Understanding, Azure Search, and Azure OpenAI to unlock natural language search of document contents for a complex document with pictures of charts and diagrams. Video Chapter Generation: Generate video chapters using Azure Content Understanding and Azure OpenAI. This allows you to break long videos into smaller, labeled parts with key details, making it easier to find, share, and access the most relevant content. Video Content Discovery: Learn how to use Content Understanding, Azure Search, and Azure OpenAI models to process videos and create a searchable index for AI-driven content discovery. Content Understanding Operations Analyzer Templates: An Analyzer enables you to tailor Content Understanding to extract valuable insights from your content based on your specific needs. Start quickly with these ready-made templates. Content Extraction: Learn how Content Understanding API can extract semantic information from various files including performing OCR to recognize tables in documents, transcribing audio files, and analyzing faces in videos. Field Extraction: This example demonstrates how to extract specific fields from your content. For instance, you can identify the invoice amount in a document, capture names mentioned in an audio file, or generate a summary of a video. Analyzer Training: For document scenarios, you can further enhance field extraction performance by providing a few labeled samples. Analyzer management: Create a minimal analyzer, list all analyzers in your resource, and delete any analyzers you no longer need. Azure AI Content Understanding: Turn Multimodal Content into Structured Data Azure AI Content Understanding is a cutting-edge Azure AI offering designed to help businesses seamlessly extract insights from various content types. Built with and for Generative AI, it empowers organizations to seamlessly develop GenAI solutions using the latest models, without needing advanced AI expertise. Content Understanding simplifies the processing of unstructured data stores of documents, images, videos, and audio—transforming them into structured, actionable insights. It is versatile and adaptable across numerous industries and, use case scenarios, offering customization and support for input from multiple data types. Here are a few example use cases: Retrieval Augmented Generation (RAG): Enhance and integrate content from any format to power effective content searches or provide answers to frequent questions in scenarios like customer service or enterprise-wide data retrieval. Post-call analytics: Organizations use Content Understanding to analyze call center or meeting recordings, extracting insights like sentiment, speaker details, and topics discussed, including names, companies, and other relevant data. Insurance claims processing: Automate time-consuming processes like analyzing and handling insurance claims or other low-latency batch processing tasks. Media asset management and content creation: Extract essential features from images and videos to streamline media asset organization and enable entity-based searches for brands, settings, key products, and people. Resources & Documentation To begin extracting valuable insights from your multimodal content, explore the following resources: Azure Content Understanding Overview Azure Content Understanding in Azure AI Foundry FAQs Want to get in touch? We’d love to hear from you! Send us an email at cu_contact@microsoft.com460Views0likes0CommentsEnhancing Workplace Safety and Efficiency with Azure AI Foundry's Content Understanding
Discover how Azure AI Foundry’s Content Understanding service, featuring the Video Shot Analysis template, revolutionizes workplace safety and efficiency. By leveraging Generative AI to analyze video data, businesses can gain actionable insights into worker actions, posture, safety risks, and environmental conditions. Learn how this cutting-edge tool transforms operations across industries like manufacturing, logistics, and healthcare.445Views2likes0CommentsFrom Vector Databases to Integrated Vector Databases: Revolutionizing AI-Powered Search
Semantic Search and Vector Search have been pivotal capabilities powering AI Assistants driven by Generative AI. They excel when dealing with unstructured data—such as PDF documents, text files, or Word documents—where embeddings can unlock contextually rich and meaningful search results. But what happens when the data ecosystem is more complex? Imagine structured data like customer feedback ratings for timeliness, cleanliness, and professionalism intertwined with unstructured textual comments. To extract actionable insights, such as identifying service quality improvements across centers, traditional vector search alone won’t suffice. Enter Integrated Vector Databases. What Makes Integrated Vector Databases a Game-Changer? Unlike traditional vector databases that require frequent incremental updates of indexes stored separately from the original data, integrated vector databases seamlessly combine structured and unstructured data within the same environment. This integration eliminates the need for periodic indexing runs, enabling real-time search and analytics with reduced overhead. Furthermore, data and embeddings co-reside, streamlining workflows and improving query performance. Major cloud providers, including Azure, now offer managed Integrated Vector Databases such as Azure SQL Database, Azure PostgreSQL Database, and Azure Cosmos DB. This evolution is critical for scenarios that require hybrid search capabilities across both structured and unstructured data. A Real-World Scenario: Hybrid Feedback Analysis To showcase the power of Integrated Vector Databases, let’s dive into a practical application: customer feedback analysis for a service business. Here’s what this entails: Structured Data: Ratings on aspects like overall work quality, timeliness, politeness, and cleanliness. Unstructured Data: Free-flowing textual feedback from customers. Using Python, the feedback is stored in an Azure SQL Database, with embeddings generated for the textual comments via Azure OpenAI’s embedding model. The data is then inserted into the database using a stored procedure, combining the structured ratings with vectorized embeddings for efficient retrieval and analysis. Key Code Highlights 1. Generating Embeddings: The get_embedding function interfaces with Azure OpenAI to convert textual feedback from Customer input into vector embeddings: def get_embedding(text): url = f"{az_openai_endpoint}openai/deployments/{az_openai_embedding_deployment_name}/embeddings?api-version=2023-05-15" response = requests.post(url, headers={"Content-Type": "application/json", "api-key": az_openai_key}, json={"input": text}) return response.json()["data"][0]["embedding"] if response.status_code == 200 else raise Exception("Embedding failed") 2. Storing Feedback: A stored procedure inserts both structured ratings and text embeddings into the database: # Call the stored procedure stored_procedure = """ EXEC InsertServiceFeedback ?, ?, ?, ?, ?, ?, ?, ?, ?, ? """ cursor.execute( stored_procedure, ( schedule_id, customer_id, feedback_text, json.dumps(json.loads(str(get_embedding(feedback_text)))), rating_quality_of_work, rating_timeliness, rating_politeness, rating_cleanliness, rating_overall_experience, feedback_date, ), ) connection.commit() print("Feedback inserted successfully.") response_message = ( "Service feedback captured successfully for the schedule_id: " + str(schedule_id) ) Building an Autonomous Agent with LangGraph The next step is building an intelligent system that automates operations based on customer input. Here’s where LangGraph, a framework for Agentic Systems, shines. The application we’re discussing empowers customers to: View available service appointment slots. Book service appointments. Submit feedback post-service. Search for information using an AI-powered search index over product manuals. What Makes This Agent Special? This agent exhibits autonomy through: Tool Calling: Based on customer input and context, it decides which tools to invoke without manual intervention. State Awareness: The agent uses a centralized state object to maintain context (e.g., customer details, past service records, current datetime) for dynamic tool execution. Natural Interactions: Customer interactions are processed naturally, with no custom logic required to integrate data or format inputs. For example, when a customer provides feedback, the agent autonomously: Prompts for all necessary details. Generates embeddings for textual feedback. Inserts the data into the Integrated Vector Database after confirming the input. Code Walkthrough: Creating the Agent 1. Define Tools: Tools are the building blocks of the agent, enabling operations like fetching service slots or storing feedback: tools = [ store_service_feedback, fetch_customer_information, get_available_service_slots, create_service_appointment_slot, perform_search_based_qna, ] 2. Define State: State ensures the agent remembers user context across interactions: class State(TypedDict): messages: list[AnyMessage] customer_info: str current_datetime: str # fetch the customer information from the database and load that into the context in the State def customer_info(state: State): if state.get("customer_info"): return {"customer_info": state.get("customer_info")} else: state["customer_info"] = fetch_customer_information.invoke({}) return {"customer_info": state.get("customer_info")} 3. Build the Graph: LangGraph’s state graph defines how tools, states, and prompts interact: builder = StateGraph(State) builder.add_node("chatbot", Assistant(service_scheduling_runnable)) builder.add_node("fetch_customer_info", customer_info) builder.add_edge("fetch_customer_info", "chatbot") builder.add_node("tools", tool_node) builder.add_edge(START, "fetch_customer_info") builder.add_edge("tools", "chatbot") graph = builder.compile() There is no custom code required to invoke the tools. It is automatically done based on the intent in the Customer input. 4. Converse with the Agent: The application seamlessly transitions between tools based on user input and state: def stream_graph_updates(user_input: str): events = graph.stream( {"messages": [("user", user_input)]}, config, subgraphs=True, stream_mode="values", ) l_events = list(events) msg = list(l_events[-1]) r1 = msg[-1]["messages"] # response_to_user = msg[-1].messages[-1].content print(r1[-1].content) while True: try: user_input = input("User: ") if user_input.lower() in ["quit", "exit", "q"]: print("Goodbye!") break stream_graph_updates(user_input) except Exception as e: print("An error occurred:", e) traceback.print_exc() # stream_graph_updates(user_input) break Agent Demo See a demo of this app in action here: The source code of this Agent App is available in this GitHub Repo Conclusion The fusion of Integrated Vector Databases with LangGraph’s agentic capabilities unlocks a new era of AI-powered applications. By unifying structured and unstructured data in a single system and empowering agents to act autonomously, organizations can streamline workflows and gain deeper insights from their data. This approach demonstrates the power of evolving from simple vector search to hybrid, integrated systems—paving the way for smarter, more autonomous AI solutions.459Views0likes0Comments