How Do We Know AI Isn’t Lying? The Art of Evaluating LLMs in RAG Systems