Co-Authors (Prakash, Prabhjot)
The purpose of this blog is to cover the concepts related to Azure Open AI in an easy-to-understand concise format for anyone with no or limited ML background.
Let's begin by understanding the fundamental components of Azure OpenAI solutions, their tools, and patterns, and explore how they are distinct from Azure OpenAI itself.
Open AI is an independent research organization focused on artificial intelligence (AI) which in addition to research also develops various GPT (Generative pre-trained) models like GPT-4, GPT-4V, DALL-E 3, Whisper. Primary uses case for GPT models natural language processing tasks, language translation, text summarization, and Q&A. These models can be used with Enterprise data and additional domain specific models using various patterns and techniques.
Azure Open AI refers to the collaboration between OpenAI and Microsoft Azure. Under this partnership, OpenAI's AI models and technologies are hosted by Microsoft in Azure, making them accessible to developers and organizations through the Azure platform. The Azure OpenAI service automatically encrypts any data that persists in the cloud, including training data and fine-tuned models. This encryption helps protect the data and ensures that it meets organizational security and compliance requirements. Although Azure OpenAI is designed to meet data protection, privacy, and security standards, it is the your responsibility to use the technology in compliance with applicable laws and regulations and in a manner that aligns with their specific business needs.
Components of Azure open AI
Azure OpenAI offers a ready-to-use service with finely tuned capabilities, accessible via an API (Model as a Service). The key assets contributing to the Generative AI solution include LLMs, agents, plugins, prompts, chains, and APIs.
Fundamentals of Utilizing Azure OpenAI:
We query Azure Open AI using Prompts (fig 1). Prompt has three core components.
LangChain and Semantic Kernel have some similarities, but each one has their unique features and use cases.
LangChain
LangChain is modular and supports both Python and JavaScript/TypeScript. It streamlines development by breaking down complex tasks into a sequence of components. LangChain offers a versatile framework for developing applications that involve natural language processing (NLP) tasks. Its modular nature and support for both Python and JavaScript/TypeScript indicate flexibility in development environments. Breaking down complex tasks into manageable components like Model I/O, Retrieval, Chains, Agents, Memory, and Response simplifies the development process and allows for easier debugging and maintenance.
The use of Chains to construct sequences of calls suggests a workflow-oriented approach, where developers can organize tasks into a logical sequence. Agents add another layer of abstraction by enabling chains to choose tools based on high-level directives, potentially increasing adaptability and efficiency.
The inclusion of Memory for persisting application state between runs of a chain indicates support for stateful processing, which can be crucial for certain types of applications where context needs to be maintained across interactions.
Overall, LangChain appears to be a good tool for building applications that involve NLP tasks, offering modularity, flexibility, and support for different programming languages. Its components provide developers with a structured approach to developing complex applications while streamlining the development process.
Sematic Kernel
Semantic Kernel is an open-source SDK (software development kits) that simplifies the process of constructing agents that can activate your existing code. It is a highly adaptable SDK that is compatible with models from OpenAI, Azure OpenAI, Hugging Face, and beyond. By merging your existing C#, Python, and Java code with these models, you can create agents that are proficient in responding to questions and automating tasks.
Empowering Developers with Semantic Kernel:
Beyond Simple Chat Applications: While modern AI models are adept at generating messages and images, constructing fully autonomous AI agents that can automate business operations and enhance user productivity requires more. A framework that can interpret model responses and utilize them to trigger existing code is essential for productive tasks.
Semantic Kernel fulfills this need by providing an SDK that enables you to describe your existing code to AI models, allowing them to request its execution. Semantic Kernel then converts the model's response into an actionable call to your code.
To summarize, LangChain is a powerful framework that has more out of the box tools and integrations whereas Semantic Kernel is more lightweight. Both frameworks have a wide range of use cases, making them versatile tools for developers. Whether you choose Langchain or Semantic Kernel will depend on the language your team supports and what features and integrations are included out of the box.
Samples: semantic-kernel/dotnet/samples at main · microsoft/semantic-kernel (github.com)
Vectors and Embeddings
Vector representation is to capture the essential characteristics of an item in a numerical format. Embedding is a special type of vector of data representation that LLMs can use.
A vector database is a storage system engineered to house and handle vector embeddings, which are numerical representations of complex data within a multi-dimensional space. Each dimension in this space is associated with a particular attribute of the data, and sophisticated data can be represented using tens of thousands of dimensions. The position of a vector within this space signifies its distinct characteristics. Various types of data, including words, phrases, documents, images, and audio, can be converted into vector form. These embeddings are crucial for functions such as similarity searches, multi-modal searches, recommendation systems, and large language models (LLMs), among others.
In a vector database, embeddings are indexed and queried through vector search algorithms based on their vector distance or similarity.
The following are some of the Vector Databases:
Retrieval augmented generation (RAG) is an essential element of utilizing Generative AI, particularly in enterprise contexts. This approach involves acquiring domain-specific knowledge and integrating it with the initial prompt (refer to figure 2) to enhance the precision and relevance of the results produced by Azure Open AI. The 'Bring Your Own Data' feature is a unique capability that facilitates the implementation of RAG, and Azure AI studio simplifies its application for straightforward scenarios.
Responsible AI (RAI)
Built in features in Azure Open AI studio: Azure Open AI goes through an RAI ensemble of AI models to filter Inputs and outputs for Sex, Hate, Violence, Self-Harm. These filters are configurable by customers.
RAI Toolbox Github repository
Tools: The tools which can be used to develop Azure Open AI based solutions
Small Language Model (SLM)
Compact language models, such as Microsoft's Phi and those from various providers, possess capabilities akin to larger Generative AI models but require significantly fewer resources. They can operate on any Nvidia-based hardware, allowing for the deployment of Small Language Models (SLMs) in diverse settings. SLMs demonstrate considerable proficiency in areas like common sense reasoning, language comprehension, and knowledge. However, they may not match the larger models in terms of world knowledge due to their size constraints.
Azure OpenAI Approved as a Service within the FedRAMP High Authorization for Azure Commercial
Microsoft’s Azure OpenAI service is now included within the US Federal Risk and Authorization Management Program (FedRAMP) High Authorization for Azure Commercial. This Provisional Authorization to Operate (P-ATO) within the existing FedRAMP High Azure Commercial environment was approved by the FedRAMP Joint Authorization Board (JAB). This milestone follows our previously announced solution enabling Azure Government customers to access Azure OpenAI Service in the commercial environment. With this latest update, agencies requiring FedRAMP High can directly access Azure OpenAI from Azure commercial.
Challenges
Challenges faced by early adopters are being addressed through ongoing efforts. Utilizing patterns and approaches such as APIM or AI landing zones can mitigate some issues:
Model Updates: Frequent modifications to the underlying Large Language Models (LLMs) can pose operational challenges.
Multilingual Scenarios: In applications supporting multiple languages, the accuracy of responses may decline, with LLMs potentially delivering mixed-language content.
Performance, HA/DR: Ensuring consistent performance in production applications that use Open AI can be challenging, with possible increased latency.
Secure Sensitive Information: To secure sensitive data, enterprises must work closely with their Office of Responsible AI during the project qualification stage, especially for sensitive AI use cases. Strict adherence to their advice on managing sensitive or explicit content is imperative. Organizations are required to follow established security principles and apply data classification labels, known as sensitivity labels, to protect documents, emails, PDFs, Teams meetings, and chats.
Cost Management: A strategy employed by customers involves using an orchestrator to determine which GPT model to invoke based on the query. Not all queries necessitate GPT-4; many can be adequately addressed with GPT-3.5, thus managing costs effectively.
In conclusion, this article will help to quickly understand the opportunities and the landscape of enabling Azure Open AI in your applications.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.