The Evolution of GenAI Application Deployment Strategy: From PoC to MVP

Microsoft

Jun 06, 2024

Initiating the Minimum Viable Product Phase:

Proof of Concept (PoC)/ Proof of Value (PoV) / Proof of Technology (PoT) allows customers to validate and demonstrate the suitability and adaptability of GenAI use cases for their business. However, transitioning from an initial experiment i.e. building custom co-pilot (either low code or code first approach) as PoC to the go-to-market phase involves building a Minimal Viable Product (MVP). The MVP serves as the foundation and then incorporating core functionalities from the PoC along with additional layers enabled by Microsoft community build accelerators.

Microsoft offers a variety of accelerators that aid in the development of GenAI-powered application, effectively addressing use cases and delivering business value. However, it’s crucial to understand that the code from these accelerators, which originates from the Proof of Concept (PoC), is not ready for production. This implies that additional safeguards, known as production guardrails, need to be incorporated to protect the applications or products.

These extra layers of components or services are necessary to ensure governance, security, continuous development, and configuration. A practical strategy is to augment the accelerators used during the PoC with these layers. These enhancements can take the form of new features, customized or improved user interfaces, security measures like (enhanced authentication & authorisation), suitable infrastructure and network topology, content filters, and comprehensive logging and monitoring.

MVP Approach:

Let’s delve into the methodology for constructing a conceptual architecture that can progress from the PoC deployment to the MVP stage. The initial Custom Co-pilot’s (low-code or hybrid) conceptual architecture, which is based on our accelerator in the PoC phase, would look like this:

The above Proof of Concept (PoC) reference design includes basic foundational components to demonstrate its value during the PoC phase. However, it does not incorporate all the essential elements needed for a live deployment. All the components or services are deployed as a unified entity to interact with the relevant LLM Models.

Now, let’s progress and refine the PoC reference design into a Minimum Viable Product (MVP) by expanding and incorporating additional layers. These would be,

· Components:

The first step involves identifying and logically grouping components that offer similar services. This is a crucial starting point. Let’s begin at the network component level. For example:

Apps VNet: This is a virtual network that houses application-related services, providing isolation from other services.
Backend VNet: This is a virtual network dedicated to backend business orchestration, system integration and workload management.
Data VNet: This is responsible for managing data storage and access to production-grade source data.
Service VNet: This connects and interfaces with LLM and other AI services (such as AI Search, Cognitive Services, and Computer Vision). Orchestration frameworks like LangChain, Semantic Kernel, or AutoGen can be employed to abstract the configuration and manage the services based on the use case.

LLMOps:

At the MVP stage, it’s advisable to incorporate Large Language Model Operations (LLMOps) throughout the entire lifecycle from the outset. LLMOps refers to specialized practices and workflows that facilitate the development, deployment, and management of AI models, specifically large language models (LLMs). We have LLMOps approach articulates the how it can be accommodated in the development lifecycle.

The key to success when working with Large Language Models (LLMs) lies in the effective management of the prompt, the application of prompt engineering techniques, and the tracking of versions. Tools like PromptFlow facilitate this process by providing features for effective prompt management and version control.

· LLM Models:

The Proof of Concept (PoC) stage allows us to verify the suitability of LLM Models (type and version) for our needs. However, when we move to the Minimum Viable Product (MVP) stage, we have the opportunity to explore beyond the base model (the LLM base model is a standard for everyone). We can fine-tune the models to better understand our domain-specific dataset, which can enhance performance and optimize interactions. Additionally, we also have the option to consider other open-source models through the Azure AI Studio model catalog.

· Orchestration Framework:

The selection of orchestration, development tools, and their respective frameworks is contingent on the customer’s preferences (tech stack) and the capabilities required to address specific use cases. While the orchestration approach may need to be tailored for different use cases, the underlying hosting infrastructure can be reused.

· Infrastructure & Environment:

During the MVP phase, you could transition from a development to a production subscription, implementing a single-region rollout to cater to a select group of users, both internal and external. To enhance the efficiency of the overall CI/CD process, you might want to consider adopting an Infrastructure as Code (IaC) approach for deployment automation.

· Additional Supporting Services:

You might want to think about incorporating security measures such as managed identities, role management, access controls, and content monitoring/moderation, content safety, prompt shields to ensure responsible AI usage.

· Patterns:

Azure accelerators offer a comprehensive set of patterns and tools for working with Large Language Models (LLMs). These accelerators cover a wide range of use cases and provide valuable support for technical solutions.

Keep in mind that moving from a PoC to a MVP entails the improvement and fine-tuning of both the requirements and the product, ensuring they align with market demands and end user expectations.

You should now contemplate the level of services for the LLM Model that will facilitate deployment and support for your use case. Azure AI essentially offers two distinct levels of service:

Pay As You Go (PAYG): This model allows customers to pay based on their actual usage. It’s ideal for Proof of Concept (PoC) scenarios, situations with high latency, and non-critical use cases.
Provisional Throughput Unit (PTU): This is a fixed-term commitment pricing model. PTU is recommended for scenarios requiring low latency and critical use cases. It’s also suggested for MVP workloads.

You can also manage your application traffic according to your business requirements by utilizing both PAYG and PTU. For instance, if your traffic exceeds the peak limit and becomes unpredictable, you can divert the excess to PAYG. Check below reference for more info.

Regardless of the level of service you choose, it’s vital to analyse the volumetrics of your workload to accurately estimate costs. At this stage, it’s crucial to gather feedback from a wider user base, including both internal and a limited number of external users.

MVP Criteria: Here are some critical factors to consider during the development of a MVP:

PoC Outcome: Evaluate the success of the Proof of Concept (PoC) and confirm the adoption of LLM Models in the MVP phase.
Actual Production Dataset: Use a valid, curated dataset that reflects real-world conditions.
Use Case: Comprehend the specific requirements, such as queries, analysis, contextual and search criteria.
Feedback Loop: Collect user feedback on features, improvements, and limitations.
LLM Model Accuracy and Performance: Involve your business Subject Matter Expert (SME) to review the results of the LLM Model outcome and validate it with the actual dataset. Achieving a similar outcome/result from the LLM can be done by adopting best practices such as Prompt Engineering, Prompt versioning, Prompt tuning, and curating the dataset.
Token Management (Token Per Minute): Assess and manage the token sizes for efficient processing.
Infrastructure: Ensure the availability of the appropriate infrastructure and supporting components.
Security: Incorporate strong security measures such as ResponsibleAI, address security threats for GenAI Apps (jailbreak,prompt injection)
Business Continuity: Plan for continuity during deployment, such as redundant deployment at the region / cross-region level in order to scale the deployment and workloads.
Governance: Implement governance practices for monitoring and logging end to end.
Responsible AI: Monitor and manage AI products in a responsible manner.

Reference:

· - Azure/aoai-smart-loadbalancing: Smart load balancing for Azure OpenAI endpoints (github.com)

· - Scaling AOAI deployment using PTU and PAYG Azure/aoai-apim: Scaling AOAI using APIM, PTUs and TPMs (github.com)

· - Prompt engineering techniques. Azure PromptFlow tool

· - Benchmarking AOAI loads Azure/azure-openai-benchmark: Azure OpenAI benchmarking tool (github.com)

Conclusion:

Transitioning from the PoC to the MVP stage allows you to demonstrate and validate the business value, as well as define the key criteria for determining the path to live deployment. This assists you in identifying both business and technical dependencies and requirements, and prepares your organization to embrace and adopt the new wave of AI, enhancing your competitiveness in your industry.

Series: Next article will discuss approach of moving from GenAI Application from MVP to Production.

Paolo Colecchia StephenMS Taonga_Banda renbafa Morgan Gladwell

Updated Jun 13, 2024

Version 2.0

artificial intelligence

Cognitive Services

mlops

prompt flow