Forum Discussion

Cwill83247's avatar
Cwill83247
Copper Contributor
May 30, 2024

Managing Data for RAG Chatbot

I have up and running a POC for a RAG Chatbot, data has been indexed/vectored.  

 

If I need to remove data because the information is now outdated, or say for example products/prices/description changes etc.. What is the best way to achieve this ?

Would I manually need to search through the Vector DB and remove the entries ?

Then add my new data ? 

Then would I have to reindex all of my data ?  

 

any help/pointers would be much appreciated 

Also unsure how the overlapping will be affected in terms of context if remove a chunk of data 

1 Reply

  • PeterMc's avatar
    PeterMc
    Copper Contributor

    Cwill83247 I have the exact same question. I have a job scheduled to update my index by referencing a data store pointing to blob storage. I see that it adds data to the index when I have added additional documents to blob storage.  However, when I remove documents (or a folder and all its documents) from blob storage, the data remains in the index. This is unexpected behavior.  It seems the index is cumulative.

     

    I do not want to create a new index whenever there is a change, so I am not sure what to do.