adrianhara the main point of the Retrieval-Agumented Generation pattern discussed in this post and implemented in the linked sample is to work around the context length limits. Instead of fine-tuning models, we combine the model with retrieval-augmentation, where we pull a tiny subset of the knowledge using a retriever, and then only feed that into the prompt. This allows us to have an arbitrarily large knowledge base and still use the model to answer questions. Of course the quality of answer now also depends on the quality of the retriever and its ranking steps. I don't think fine-tuning is a practical approach to this since data changes often and you'll want to see changes quickly, and it would have hard to enforce other constraints such as not everyone being allowed to see all the documents (i.e. doc-level access control). Fine-tuning is useful in different scenarios such as when you want to teach the model certain interaction patterns or want to specialize on a very specific domain.