Forum Discussion
Azure OpenAI Service - Features Overview and Key Concepts
Azure artificial intelligence services including a variety services related to language and language processing (speech recognition, speech formation, translations), text recognition, and image and character recognition.
What is Azure OpenAI Service?
Azure OpenAI Service provides REST API access to OpenAI's powerful language models including the GPT-3, Codex and Embeddings model series.
Azure OpenAI Model
Azure OpenAI provides access to many different models, grouped by family and capability. A model family typically associates models by their intended task.
Azure OpenAI Service Model capabilities
Each model family has a series of models that are further distinguished by capability. These capabilities are typically identified by names, and the alphabetical order of these names generally signifies the relative capability and cost of that model within a given model family.
Azure OpenAI models fall into a few main families:
- GPT-4: A set of models that improve on GPT-3.5 and can understand as well as generate natural language and code.
- GPT-3.5: A set of models that improve on GPT-3 and can understand as well as generate natural language and code.
- Embeddings: A set of models that can convert text into numerical vector form to facilitate text similarity.
- DALL-E: A series of models that can generate original images from natural language.
Key concepts:
The completions endpoint is the core component of the API service. This API provides access to the model's text-in, text-out interface. Users simply need to provide an input prompt containing the English text command, and the model will generate a text completion.
Azure OpenAI processes text by breaking it down into tokens. Tokens can be words or just chunks of characters. For example, the word “hamburger” gets broken up into the tokens “ham”, “bur” and “ger”
The total number of tokens processed in a given request depends on the length of your input, output and request parameters. The quantity of tokens being processed will also affect your response latency and throughput for the models.
Azure OpenAI is a new product offering on Azure. You can get started with Azure OpenAI the same way as any other Azure product where you create a resource, or instance of the service, in your Azure Subscription. You can read more about Azure's resource management design.
Once you create an Azure OpenAI Resource, you must deploy a model before you can start making API calls and generating text. This action can be done using the Deployment APIs. These APIs allow you to specify the model you wish to use.
The models used by Azure OpenAI use natural language instructions and examples provided during the generation call to identify the task being asked and skill required. When you use this approach, the first part of the prompt includes natural language instructions and/or examples of the specific task desired. The model then completes the task by predicting the most probable next piece of text. This technique is known as "in-context" learning.
There are three main approaches for in-context learning:
- Few-shot: In this case, a user includes several examples in the call prompt that demonstrate the expected answer format and content.
- One-shot: This case is the same as the few-shot approach except only one example is provided.
- Zero-shot: In this case, no examples are provided to the model and only the task request is provided.
The service provides users access to several different models. Each model provides a different capability and price point.
- GPT-4 models are the latest available models. Due to high demand access to this model series is currently only available by request.
- The GPT-3 base models are known as Davinci, Curie, Babbage, and Ada in decreasing order of capability and increasing order of speed.
- The Codex series of models is a descendant of GPT-3 and has been trained on both natural language and code to power natural language to code use cases.
Use cases: GPT 3.5
- Generating natural language for chatbots and virtual assistants with awareness of the previous history of chat
- Power chatbots that can handle customer inquiries, provide assistance, and converse but doesn’t have memory of conversations
- Automatically summarize lengthy texts
- Assist writers by suggesting synonyms, correcting grammar and spelling errors, and even generating entire sentences or paragraphs
- Help researchers by quickly processing large amounts of data and generating insights, summaries, and visualizations to aid in analysis
- Generate good quality code based on natural language
Use cases: GPT 4.0
- Generating and understanding natural language for customer service interactions, chatbots, and virtual assistants – doesn’t have memory of conversations
- Generating high-quality code for programming languages based on natural language input.
- Providing accurate translations between languages
- Improving text summarization and content generation
- Provides for multi-modal interaction (text and images)
- Substantial reduction in Hallucinations
- Consistency between different runs is high
Multi-Modal Transformer Architecture
Multi-modal models combine text and other types of input (such as graphics, images etc.) and are more task-specific. One multi-modal model in the collection has not been pre-trained in the same self-supervised manner.
These models have performed state-of-the-art tasks, including visual question answering, image captioning, and speech recognition.
Pricing
Pricing will be based on the pay-as-you-go consumption model with a price per unit for each model, which is similar to other Azure Cognitive Services pricing models.
- Language models
- Image models
- Fine-tuned models
- Embedding models
DALL-E
- Image Generation
- Editing an image
- Creating variations of image
Embedding models
The embedding is an information dense representation of the semantic meaning of a piece of text. Microsoft currently offers three families of Embeddings models for different functionalities:
- Similarity embedding: are good at capturing semantic similarity between two or more pieces of text.
- Text search embedding: help measure whether long documents are relevant to a short query.
- Code search embedding: are useful for embedding code snippets and embedding natural language search queries.