Forum Discussion
Managing Data for RAG Chatbot
I have up and running a POC for a RAG Chatbot, data has been indexed/vectored.
If I need to remove data because the information is now outdated, or say for example products/prices/description changes etc.. What is the best way to achieve this ?
Would I manually need to search through the Vector DB and remove the entries ?
Then add my new data ?
Then would I have to reindex all of my data ?
any help/pointers would be much appreciated
Also unsure how the overlapping will be affected in terms of context if remove a chunk of data
1 Reply
- PeterMcCopper Contributor
Cwill83247 I have the exact same question. I have a job scheduled to update my index by referencing a data store pointing to blob storage. I see that it adds data to the index when I have added additional documents to blob storage. However, when I remove documents (or a folder and all its documents) from blob storage, the data remains in the index. This is unexpected behavior. It seems the index is cumulative.
I do not want to create a new index whenever there is a change, so I am not sure what to do.