Building Safer AI Applications: A Practical Approach

Copper Contributor

Nov 13, 2024

Building AI-powered applications requires careful attention to responsible development practices. This blog shares our experience implementing AI safety measures while developing a hotel search application with Microsoft Azure services, highlighting practical approaches for developers.

RoomRadar.Ai aims to provide an AI-powered search experience for hotels and was built as part of a UCL industry exchange network (IXN) project with Microsoft (Figure 1). It was built upon Microsoft Azure AI services including the Azure OpenAI Service, Azure AI Search and Azure AI Content Safety Services, and it aims to showcase how developers can build around Microsoft Azure tooling. Throughout the process of building this application, we aimed to implement strategies to adhere to the Microsoft Responsible AI Principles - which provide a foundation for those building AI-enabled applications like ours. This blog will highlight the steps we took to implement responsible AI within our application, and the key learnings that can be followed by software developers building around AI.

Figure 1: RoomRadar.Ai main interface.

Implementation and Validation

We integrated Azure AI Content Safety Prompt Shields and Text Moderation services into the chatbots featured within the application.

The Text Moderation service analyzes user inputs across four harm categories, assigning severity scores that enable targeted content filtering, and outputs metrics identifying categories of harmful output. It enables users to set acceptable thresholds to appropriately filter harmful content inputted so that it does not reach the AI model itself. Through iterative testing, we calibrated these thresholds for the filters to effectively block harmful content while maintaining the system's ability to handle legitimate hotel-related queries. This calibration process highlighted the need to balance robust safety measures and preserving the system's core functionality as a helpful travel assistant, which required finding an acceptable level of false positives / negatives when it comes to AI implementation.

The Prompt Shields service, by contrast, acts to prevent malign attempts to modify model behavior (‘jailbreak’ attempts) from reaching the model itself. An example refusal due to Prompt Shields detecting a potential jailbreak can be seen in Figure 2 below.

Figure 2: An example refusal of a malicious jailbreak attempt.

In addition to integrating responsible AI services available from the Azure platform, we also implemented further mitigations directed at the AI model itself in order to facilitate responsible human-AI interaction. The AI Concierge was instructed by Microsoft's comprehensive safety system message, establishing clear guidelines around harmful content prevention, avoiding incorrect information, groundedness, and copyright respect.

Transparent user communication was incorporated, including clear labeling of AI-generated content and explicit acknowledgment of system limitations ensuring that users know when they are interacting with AI-generated content. Furthermore, strategies were adopted to enhance the groundedness of output, thus reducing the incidence of falsehoods. For example, when the system cannot confidently answer a query about hotel details, it defaults to directing users to contact the hotel directly rather than risk providing inaccurate information.

The model's output quality and safety underwent iterative validation through hands-on, manual testing by our development team. This qualitative evaluation process involved challenging the model with test cases to assess its performance and behavior. While we used manual testing for this project, developers can also leverage tools like the Azure AI Evaluation SDK for a more automated, systematic, metrics-driven assessment of their applications.

Learnings and Reflections

There are learnings that can be drawn from this project to be taken forward for developers looking to implement responsible AI principles when building in this domain. Our red-teaming process was non-structured and iterative for this small-scale application. Developers working on large-scale, production-ready applications may look to implement more systematic strategies and technologies for red-teaming such as Azure's PyRIT system. For more information on incorporating red-teaming into your applications, see the Microsoft Learn article on red-teaming.

Closely related to red-teaming is model evaluation, which instead aims to provide metrics of AI model output. For evaluation of potential model harm, risk and safety metrics are available within the Azure AI Studio, which acts as an off-the-shelf-solution, and enables evaluation of model output across a variety of domains (Figure 3).

Figure 3: The Automated evaluations interface within Azure AI Studio.

Mitigation is where the RoomRadar.Ai responsible AI development time was focused, with the integration of Azure AI Content safety services such as Prompt Shields and Text Moderation, as well as the Microsoft safety system message. In future, I would also look to adopt Image Moderation where AI models featured within applications included the input or output of image content.

Although red-teaming, evaluation and mitigation steps do help minimize harmful output, unusual edge cases and expert malicious users may still be able to find routes to bypass the safeguards you have constructed. If RoomRadar.Ai or a similar application was deployed in production, robust monitoring procedures would be necessary. Continuous monitoring within the Azure OpenAI Service is available for the monitoring of applications in production - and should be considered by developers.

For AI applications in production, a structured approach to building generative AI applications responsibly is essential. The generative AI development lifecycle shown in Figure 4 offers a framework that complements the mitigation strategies discussed in this blog. By iteratively incorporating red-teaming, measurement, mitigation, and operational oversight, this approach can help manage risks effectively.

Figure 4: The generative AI development lifecycle incorporates red-teaming, measurement, mitigation techniques and operation.

Conclusion

As AI technologies continue to advance, they will become increasingly embedded in the software we use every day. As a result, responsible AI principles and practices will become increasingly crucial for developing trustworthy systems that users can rely on. Through careful attention to safety, transparency, and ethical considerations at each stage of the development process, developers can build AI systems that both innovate and protect users from harm. Developers should look to adopt a systematic approach grounded in the iterative AI governance cycle to successfully navigate these challenges.

Blog Post

Building Safer AI Applications: A Practical Approach

Building AI-powered applications requires careful attention to responsible development practices. This blog shares our experience implementing AI safety measures while developing a hotel search application with Microsoft Azure services, highlighting practical approaches for developers.

Implementation and Validation

Learnings and Reflections

Conclusion

Further Reading