When we build apps on top of Large Language Models, we need to evaluate the app responses for quality and safety.
When we evaluate the quality of an app, we're making sure that it provides answers that are coherent, clear, aligned to the user's needs, and in the case of many applications: factually accurate. I'v...