Memory Management for AI Agents

Microsoft

Apr 22, 2025

When we think about how humans function daily, memory plays a critical role beyond mere cognition. The brain has two primary types of memory: short-term and long-term. Short-term memory allows us to temporarily hold onto information, such as conversations or names, while long-term memory is where important knowledge and skills—like learning to walk or recalling a conversation from two weeks ago—are stored.

Memory operates by strengthening neural connections between events, facts, or concepts. These connections are reinforced by relevance and frequency of use, making frequently accessed memories easier to recall. Over time, we might forget information we no longer use because the brain prunes unused neural pathways, prioritizing the memories we frequently rely on. This can explain why recalling long-forgotten details can sometimes feel like an uphill battle.

With that in mind, when we discuss building intelligent Agents that can converse with humans, it becomes apparent very quickly that for such agents to provide an acceptable level of intelligence, they need to have some sort of memory. This is primarily why in today's conversational implementations, we maintain context and send it back to a language model. Language models on their own are stateless and have no context of prior messages that get sent to them, unless we explicitly provide the history of a conversation along with the latest user message. That sounds like a solution to the problem then, just keep appending past messages from the conversation with every request to provide language models with memory, no? While in theory that is correct, it doesn't take a lot of experiments to realize why that quickly becomes an issue, and here are some challenges with this approach:

1) Context length limits

The first issue you'll realize is after some time, the length of past conversations will end up exceeding a model's context length limit, at which point the model won't be able to consume anymore past conversations. Instead, you'll have to implement a rolling buffer that drops old conversations as you approach a model's context length. If we ignore all other factors, this approach prevents you from maintaining a long-term memory, due to the constant dropping of old conversations.

2) Model understanding of context decreases with long context

The second issue you'll quickly realize if you stack up all past conversations as context in a request is the model's inability to interpret all details from past conversations. While models are engineered to have a certain context length, research has shown that each model has a true context length that is far less than its context length limit. The true context length is the maximum length at which a model can still maintain a full understanding of the context provided, and is typically between 32k to 64k tokens, depending on the model. Thus, if you provide context beyond a mode's true context length, you'll start to observe a decrease in quality of memory.

3) Cost

If you've managed to pass through the first 2 challenges, it won't take you long to observe the cost of requests piling up as your context grows. Imaging each request using 32k tokens of context, that means every request would cost around 10 cents with GPT4o, which will end up being a cost burden very quickly.

The Solution

To avoid the challenges we discussed above, the best approach is to create an implementation that extracts key information from past conversations and stores them for future retrieval, which creates an efficient memory. Such an implementation should have the following capabilities:

Extract key information from past interactions and conversations
Avoid duplication of items stored in memory
Append past memory with new facts
Update/change information stored in memory based on recent interactions
Prioritize memory based on frequency of access

This is where memory management frameworks come in handy, and there is a handful of them with varying specialties around short-term and long-term memory. In this blog we'll focus on Mem0, which simplifies and handles all aspects of memory management and provides the features discussed above, all while having a direct integration with Azure AI Search and Azure Open AI. Mem0 takes care of all LLM and search requests required to store data in memory and retrieve data from memory, making it very simple to manage memory for multiple users and agents in one place. Let's take a look at how to get Mem0 working with Azure

Setting Up Mem0

In this blog, I'm going to show you how to quickly set up Mem0 with Azure Open AI and Azure AI Search so you can start experimenting on your own. For a more details on setting up Mem0, my colleague Farzad Sunavala wrote a great detailed article on setting up Mem0 with Azure AI Search and goes as far as building a travel planning assistant that has memory. Checkout his blog here if you're looking for a full code sample.

Configuring Azure environment variables

Ensure that your Azure Open AI and Azure AI Search keys are stored as system environment variables and make sure the variable name matches the name in your code. Next, update your other configuration variables and create an Azure OpenAI client so that you're set to get started with AOAI and Azure AI Search. Since memory operations add latency, you'll want to use a small model like 4.1-nano or 4o-mini for AOAI.

import os
from openai 
import AzureOpenAI

# Load Azure OpenAI configuration
AZURE_OPENAI_ENDPOINT = "INSERT AOAI Endpoint"
AZURE_OPENAI_API_KEY = os.getenv("AZURE_OPENAI_KEY")
AZURE_OPENAI_CHAT_COMPLETION_DEPLOYED_MODEL_NAME = "INSERT AOAI Language Model Name"
AZURE_OPENAI_EMBEDDING_DEPLOYED_MODEL_NAME = "INSERT AOAI Embedding Model Name"
AZURE_OPENAI_API_VERSION = "2024-10-21"

# Load Azure AI Search configuration
SEARCH_SERVICE_ENDPOINT = "INSERT AI Search Service Endpoint"
SEARCH_SERVICE_API_KEY = os.getenv("AZURE_SEARCH_ADMIN_KEY")
SEARCH_SERVICE_NAME = "INSERT AI Search Service Name"

# Create Azure OpenAI client
azure_openai_client = AzureOpenAI(
    azure_endpoint=AZURE_OPENAI_ENDPOINT,
    api_key=AZURE_OPENAI_API_KEY,
    api_version=AZURE_OPENAI_API_VERSION
)

Configuring Mem0 with Azure AI Search

Mem0 requires 3 things:

Embedder - to create embeddings (vector representations) of the memory to be stored
Vector Store - where the embeddings will be stored
LLM - which it uses for language understanding of new and existing memory

The code snippet below configures all three.

# Configure Mem0 with Azure AI Search
memory_config = {
    "vector_store": {
        "provider": "azure_ai_search",
        "config": {
            "service_name": SEARCH_SERVICE_NAME,
            "api_key": SEARCH_SERVICE_API_KEY,
            "collection_name": "memories",
            "embedding_model_dims": 1536,
        },
    },
    "embedder": {
        "provider": "azure_openai",
        "config": {
            "model": AZURE_OPENAI_EMBEDDING_DEPLOYED_MODEL_NAME,
            "embedding_dims": 1536,
            "azure_kwargs": {
                "api_version": "2024-10-21",
                "azure_deployment": AZURE_OPENAI_EMBEDDING_DEPLOYED_MODEL_NAME,
                "azure_endpoint": AZURE_OPENAI_ENDPOINT,
                "api_key": AZURE_OPENAI_API_KEY,
            },
        },
    },
    "llm": {
        "provider": "azure_openai",
        "config": {
            "model": AZURE_OPENAI_CHAT_COMPLETION_DEPLOYED_MODEL_NAME,
            "temperature": 0.1,
            "max_tokens": 2000,
            "azure_kwargs": {
                "azure_deployment": AZURE_OPENAI_CHAT_COMPLETION_DEPLOYED_MODEL_NAME,
                "api_version": AZURE_OPENAI_API_VERSION,
                "azure_endpoint": AZURE_OPENAI_ENDPOINT,
                "api_key": AZURE_OPENAI_API_KEY,
            },
        },
    },
    "version": "v1.1",
}

# Initialize memory
memory = Memory.from_config(memory_config)
print("Mem0 initialized with Azure AI Search")

Using Mem0

Storing Memories

You can store any sentence in a conversation as a memory and optionally add metadata to the memory stored.

memory.add(
    "I have 4 individuals in my household that need internet, 2 of which are students.",
    user_id="demo_user",
    metadata={"category": "personal_profile"},
)

You can also store entire conversations, as follows:

conversation = [
    {"role": "user", "content": "I'm planning a trip to Syria this summer, what are my options?."},
    {"role": "assistant", "content": "You have the option to fly in through a layover in Abu Dhabi, Doha, and Cairo, are any of these options suitable?"},
    {"role": "user", "content": "I always prefer flying through Abu Dhabi or Dubai"}
]

memory.add(conversation, user_id="demo_user")

Searching Memories

search_results = memory.search( "Does demo_user have students in the household?", user_id="demo_user", limit=3 ) for i, result in enumerate(search_results['results'], 1): print(f"{i}. {result['memory']} (Score: {result['score']:.4f})")

Conclusion

In conclusion, effective memory management is crucial for building intelligent agents that can provide meaningful and contextually relevant interactions. By understanding the limitations of context length, model comprehension, and cost, we can develop more efficient solutions. Mem0 offers a robust approach to memory management by extracting key information from past interactions, avoiding duplication, and updating stored information based on recent interactions. Its seamless integration with Azure AI Search and Azure OpenAI simplifies the process, making it easier to manage memory for multiple users and agents.

As we continue to advance in the field of AI, tools like Mem0 will play a pivotal role in enhancing the capabilities of conversational agents, ensuring they can maintain context and deliver high-quality responses. By leveraging these technologies, we can create more intuitive and responsive AI systems that better serve our needs.

Thank you for reading, and I hope this guide helps you in setting up and utilizing Mem0 for your projects. If you have any questions or need further assistance, feel free to reach out. Happy experimenting!

Updated Apr 22, 2025

Version 1.0

Microsoft

Joined July 03, 2024

View Profile

Azure AI Foundry Blog

Follow this blog board to get notified when there's new activity