azure ai studio
25 TopicsWelcoming Mistral, Phi, Jais, Code Llama, NVIDIA Nemotron, and more to the Azure AI Model Catalog
We are excited to announce the addition of several new foundation and generative AI models to the Azure AI model catalog. From Hugging Face, we have onboarded a diverse set of stable diffusion models, falcon models, CLIP, Whisper V3, BLIP, and SAM models. In addition to Hugging Face models, we are adding Code Llama and Nemotron models from Meta and NVIDIA respectively. We are also introducing our cutting-edge Phi models from Microsoft research. These exciting additions to the model catalog have resulted in 40 new models and 4 new modalities including text-to-image and image embedding. Today, we are also pleased to announce Models as a Service. Pro developers will soon be able to easily integrate the latest AI models such as Llama 2 from Meta, Command from Cohere, Jais from G42, and premium models from Mistral as an API endpoint to their applications. They can also fine-tune these models with their own data without needing to worry about setting up and managing the GPU infrastructure, helping eliminate the complexity of provisioning resources and managing hosting. Below is additional information about the incredible new models we are bringing to Models as a Service and to the Azure AI model catalog. New Models in Models as a Service (MaaS) Command (Coming Soon) Command is Cohere's premier text generation model, designed to respond effectively to user commands and instantly cater to practical business applications. It offers a range of default functionalities but can be customized for specific company language or advanced use cases. Command's capabilities include writing product descriptions, drafting emails, suggesting press release examples, categorizing documents, extracting information, and answering general queries. We will soon support Command in MaaS. Jais (Coming Soon) Jais is a 13-billion parameter model developed by G42 and trained on a diverse 395-billion-token dataset, comprising 116 billion Arabic and 279 billion English tokens. Notably, Jais is trained on the Condor Galaxy 1 AI supercomputer, a multi-exaFLOP AI supercomputer co-developed by G42 and Cerebras Systems. This model represents a significant advancement for the Arabic world in AI, offering over 400 million Arabic speakers the opportunity to explore the potential of generative AI. Jais will also be offered in MaaS as inference APIs and hosted fine-tuning. Mistral Mistral is a Large Language Model with 7.3 billion parameters. It is trained on data that is able to generate coherent text and perform various natural language processing tasks. It is a significant leap from previous models and outperforms many existing AI models on a variety of benchmarks. One of the key features of the Mistral 7B model is its use of grouped query attention and sliding window attention, which allow for faster inference and longer response sequences. Azure AI model catalog will soon offer Mistral’s premium models in Model-as-a-Service (MaaS) through inference APIs and hosted-fine-tuning. Mistral-7B-V01 Mistral-7B-Instruct-V01 New Models in Azure AI Model Catalog Phi Phi-1-5 is a Transformer with 1.3 billion parameters. It was trained using the same data sources as Phi-1, augmented with a new data source that consists of various NLP synthetic texts. When assessed against benchmarks testing common sense, language understanding, and logical reasoning, Phi-1.5 demonstrates a nearly state-of-the-art performance among models with fewer than 10 billion parameters. Phi-1.5 can write poems, draft emails, create stories, summarize texts, write Python code (such as downloading a Hugging Face transformer model), etc. Phi-2 is a Transformer with 2.7 billion parameters that shows dramatic improvement in reasoning capabilities and safety measures compared to Phi-1-5, however it remains relatively small compared to other transformers in the industry. With the right fine-tuning and customization, these SLMs are incredibly powerful tools for applications both on the cloud and on the edge. Phi 1.5 Phi 2 Whisper V3 Whisper is a Transformer based encoder-decoder model, also referred to as a sequence-to-sequence model. It was trained on 1 million hours of weakly labeled audio and 4 million hours of pseudo labeled audio collected using Whisper large-v2. The models were trained on either English-only data or multilingual data. The English-only models were trained on the task of speech recognition. The multilingual models were trained in both speech recognition and speech translation. For speech recognition, the model predicts transcriptions in the same language as the audio. For speech translation, the model predicts transcriptions to a different language to the audio. OpenAI-Whisper-Large-V3 BLIP BLIP (Bootstrapping Language-Image Pre-training) is a model that is able to perform various multi-modal tasks including: Visual Question Answering, Image-Text retrieval (Image-text matching), Image Captioning. Created by Salesforce, the BLIP model is based on the concept of vision-language pre-training (VLP), which combines pre-trained vision models and large language models (LLMs) for vision-language tasks. BLIP effectively utilizes the noisy web data by bootstrapping the captions, where a captioner generates synthetic captions and a filter removes the noisy ones. It achieves state-of-the-art results on a wide range of vision-language tasks, such as image-text retrieval, image captioning, and VQA. The following variants are available in the model catalog: Salesforce-BLIP-VQA-Base Salesforce-BLIP-Image-Captioning-Base Salesforce-BLIP-2-OPT-2-7b-VQA. Salesforce-BLIP-2-OPT-2-7b-Image-To-Text. CLIP CLIP (Contrastive Language-Image Pre-Training) is a neural network trained on a variety of image-text pairs and created by OpenAI for efficiently learning visual concepts from natural language supervision. CLIP can be applied to any visual classification benchmark by simply providing the names of the visual categories to be recognized, similar to the “zero-shot” capabilities of GPT-2 and GPT-3. CLIP can also be used to extract visual and text embeddings for use in downstream tasks (such as information retrieval). Including this model increases our ever-growing list of other OpenAI models available in the model catalog including GPT and Dall-E. As mentioned earlier, these Azure Machine Learning curated models are thoroughly tested. The available CLIP variants include: OpenAI-CLIP-Image-Text-Embeddings-ViT-Base-Patch32 OpenAI-CLIP-ViT-Base-Patch32 OpenAI-CLIP-ViT-Large-Patch14 Code Llama As a result of the partnership between Microsoft and Meta, we are delighted to offer the new Code Llama model and its variants in the Azure AI model catalog. Code Llama is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 34 billion parameters. Code Llama is state-of-the-art for LLMs on code tasks and has the potential to make workflows faster and more efficient for current developers and lower the barrier to entry for people who are learning to code. Code Llama has the potential to be used as a productivity and educational tool to help programmers write more robust, well-documented software. The available Code Llama variants include: CodeLlama-34b-Python CodeLlama-34b-Instruct CodeLlama-13b CodeLlama-13b-Python CodeLlama-13b-Instruct CodeLlama-7b CodeLlama-7b-Python CodeLlama-7b-Instruct Falcon models The next set of models were created by the Technical Innovation Institute (TII). Falcon-7B is a large language model with 7 billion parameters and Falcon-40B with 40 billion parameters. It is a causal decoder-only model developed by TII and trained on 1,500 billion tokens and 1 trillion tokens of RefinedWeb dataset respectively, which was enhanced with curated corpora. The model is available under the Apache 2.0 license. It outperforms comparable open-source models and features an architecture optimized for inferencing. The Falcon variants include: Falcon-40b Falcon-40b-Instruct Falcon-7b-Instruct Falcon-7b NVIDIA Nemotron Another additional update is the launch of the new NVIDIA AI collection of models and registry. This partnership is also where NVIDIA is launching a new 8B LLM, titled Nemotron-3, in 3 variants pretrained, chat and Q&A. Nemotron-3, which is a family of enterprise ready GPT-based decoder-only generative text models compatible with NVIDIA NeMo Framework. Nemotron-3-8B-Base-4k Nemotron-3-8B-Chat-4k-SFT Nemotron-3-8B-Chat-4k-RLHF Nemotron-3-8B-Chat-4kSteerLM Nemotron-3-8B-QA-4k SAM The Segment Anything Model (SAM) is an innovative image segmentation tool capable of creating high-quality object masks from simple input prompts. Trained on a massive dataset comprising 11 million images and 1.1 billion masks, SAM demonstrates strong zero-shot capabilities, effectively adapting to new image segmentation tasks without prior specific training. Created by Meta, the model's impressive performance matches or exceeds prior models that operated under full supervision. Facebook-Sam-Vit-Large Facebook-Sam-Vit-Huge Facebook-Sam-Vit-Base Stable Diffusion Models The latest additions to the model catalog include Stable Diffusion models for text-to-image and inpainting tasks, developed by Stability AI and CompVis. These cutting-edge models offer a remarkable advancement in generative AI, providing greater robustness and consistency in generating images from text descriptions. By incorporating these Stable Diffusion models into our catalog, we are enhancing the diversity of available modalities and models, enabling users to access state-of-the-art capabilities that open new possibilities for creative content generation, design, and problem-solving. The addition of Stable Diffusion models in the Azure AI model catalog reflects our commitment to offering the most advanced and stable AI models to empower data scientists and developers in their machine learning projects, apps, and workflows. The available Stable Diffusion models include: Stable-Diffusion-V1-4 Stable-Diffusion-2-1 Stable-Diffusion-V1-5 Stable-Diffusion-Inpainting Stable-Diffusion-2-Inpainting Curated Model Catalog Inference Optimizations In addition to the above curated AI models, we also wanted to improve the overall user experience by optimizing the catalog and its features in meaningful ways. Models on the Azure AI model catalog are powered by a custom inferencing container to cater to the growing demand for high performance inference and serving of foundation models. The container comes equipped with multiple backend inferencing engines, including vLLM, DeepSpeed-FastGen and Hugging Face, to cover a wide variety of model architectures. Our default choice for serving models is vLLM, which provides high throughput and efficient memory management with continuous batching and Paged Attention. We are also excited to support DeepSpeed-FastGen - the latest offering by DeepSpeed team, which introduces a Dynamic SplitFuse technique to offer even higher throughput. You can try out the alpha version of DeepSpeed-FastGen with our Llama-2 family of models. Learn more about DeepSpeed-FastGen from here: https://github.com/microsoft/DeepSpeed/tree/master/blogs/deepspeed-fastgen. For models that can’t be served by vLLM or DeepSpeed-MII, the container also comes with Hugging Face engine. To further maximize GPU utilization and achieve even higher throughput, the container strategically deploys multiple model replicas based on the available hardware and routes the incoming requests to available replicas. This allows efficient serving of even more concurrent user requests. Additionally, we integrated Azure AI Content Safety to streamline the process of detecting potentially harmful content in AI-generated applications and services. This integration aims to enforce responsible AI practices and safe usage of our advanced AI models. You can start seeing benefits of using our container with Llama-2 family of models. We plan to extend support to even more models, including other modalities like Stable Diffusion and Falcon. To get started with model inferencing, view the “Learn more” section below. https://github.com/Azure/azureml-examples/blob/main/sdk/python/foundation-models/system/inference/text-generation/llama-safe-online-deployment.ipynb Fine-tuning Optimizations Training larger LLMs such as those with 70B parameters and above needs a lot of GPU memory and can run out of memory during fine-tuning, and sometimes even loading them is not possible if the GPU memory is small. This is exacerbated further for most real-life use cases where we need context lengths that are as close to the model’s maximum allowed context length, which pushes memory requirement even further. To solve this problem, we are excited to provide users some of the latest optimizations for fine-tuning – Low Rank Adaptation (LoRA), DeepSpeed ZeRO, and Gradient Checkpointing. Gradient Checkpointing lowers GPU memory requirement by storing only select activations computed during the forward pass and recomputing them during the backward pass. This is known to reduce GPU memory by a factor of sqrt(n) (where n is the number of layers) while adding a modest additional computational cost from recomputing some activations. LoRA freezes most of the model parameters in the pretrained model during fine-tuning, and only modifies a small fraction of weights (LoRA adapters). This reduces GPU memory required, and also reduces fine-tuning time. LoRA reduces the number of trainable parameters by orders of magnitude, without much impact on the quality of the fine-tuned model. DeepSpeed’s Zero Redundancy Optimizer (ZeRO) achieves the merits of data and model parallelism, while alleviating the limitations of both. DeepSpeed ZeRO has three stages which partition the model states – parameters, gradients, and optimizer states – across GPUs and uses a dynamic communication schedule to share the necessary model states across GPUs. The GPU memory reduction allows users to fine-tune LLMs like LLAMA-2-70B with a single node of 8xV100s, for typical sequence lengths of the data encountered in many use cases. All these optimizations are orthogonal and can be used together in any combination, empowering our customers to train large models on multi-GPU clusters with mixed precision to get the best fine-tuning accuracy. AI safety and Responsible AI Responsible AI is at the heart of Microsoft’s approach to AI and how we partner. For years we’ve invested heavily in making Azure the place for responsible, cutting-edge AI innovation, whether customers are building their own models or using pre-built and customizable models from Microsoft, Meta, OpenAI and the open-source ecosystem. We are thrilled to announce that Stable Diffusion models now support Azure AI Content Safety. Azure AI Content Safety detects harmful user-generated and AI-generated content in applications and services. Content Safety includes text and image APIs that allow you to detect material that is harmful. We also have an interactive Content Safety Studio that allows you to view, explore and try out sample code for detecting harmful content across different modalities. You can learn more from the link below. We cannot wait to witness the incredible applications and solutions our users will create using these state-of-the-art models. Explore SDK and CLI examples for foundation model in azureml-examples GitHub repo! SDK: azureml-examples/sdk/python/foundation-models/system at main · Azure/azureml-examples (github.com) CLI: azureml-examples/cli/foundation-models/system at main · Azure/azureml-examples (github.com) Learn more! Get started with new vision models in Azure AI Studio and Azure Machine Learning Sign up for Azure AI and start exploring vision-based models in the Azure Machine Learning model catalog. Announcing Foundation Models in Azure Machine Learning Explore documentation for the model catalog in Azure AI Learn more about generative AI in Azure Machine Learning Learn more about Azure AI Content Safety - Azure AI Content Safety – AI Content Moderation | Microsoft Azure Get started with model inferencing86KViews3likes3CommentsIntroducing Meta Llama 3 Models on Azure AI Model Catalog
Unveiling the next generation of Meta Llama models on Azure AI: Meta Llama 3 is here! With new capabilities, including improved reasoning and Azure AI Studio integrations, Microsoft and Meta are pushing the frontiers of innovation. Dive into enhanced contextual understanding, tokenizer efficiency and a diverse model ecosystem—ready for you to build and deploy generative AI models and applications across your organization. Explore Meta Llama 3 now through Azure AI Models as a Service and Azure AI Model Catalog, where next generation models scale with Azure's trusted, sustainable and AI-optimized high-performance infrastructure.75KViews4likes22CommentsMistral Large, Mistral AI's flagship LLM, debuts on Azure AI Models-as-a-Service
Microsoft is partnering with Mistral AI to bring its Large Language Models (LLMs) to Azure. Mistral AI’s OSS models, Mixtral-8x7B and Mistral-7B, we added to the Model Catalog last December. We are excited to announce the addition of Mistral AI’s new flagship model, Mistral Large to the Mistral AI collection of models in the Azure AI model catalog today. The Mistral Large model will be available through Models-as-a-Service (MaaS) that offers API-based access and token based billing for LLMs, making it easy to build Generative AI apps. You can provision an API endpoint in a matter of seconds and try out the model in the Azure AI Studio playground or use it with popular LLM app development tools like Azure AI prompt flow and LangChain. The APIs support two layers of safety – first, the model has built-in support for a “safe prompt” parameter and second, Azure AI content safety that screens for harmful content generated by the model, enabling developers to build safe and trustworthy applications.48KViews4likes7CommentsElevate Your LLM Applications to Production via LLMOps
Discover the Future of LLM Application Development! with Azure Machine Learning prompt flow. Dive into our latest blog, 'Elevate Your LLM Applications to Production via LLMOps,' where we unveil groundbreaking insights and tools to transform your AI development journey. Learn how to build, evaluate, and deploy LLM applications not just efficiently, but with newfound confidence. Get ready to explore the realm of possibilities in AI development – your next big leap starts here! #LLMOps #AIDevelopment #LLM applications30KViews0likes0CommentsMeta’s new Llama 3.2 SLMs and image reasoning models now available on Azure AI Model Catalog
Exciting news! In collaboration with Meta, Microsoft is thrilled to announce that Meta’s latest Llama 3.2 models are now available on the Azure AI Model Catalog! Starting today, developers can access the Llama 3.2 11B Vision Instruct and Llama 3.2 90B Vision Instruct models—Meta’s first multimodal models— and Llama 3.2 1B Instruct and Llama 3.2 3B Instruct SLMs for local on-device mobile/edge use cases via managed compute. Coming Soon: serverless inferencing with Models-as-a-Service APIs.28KViews0likes1CommentIntroducing NVIDIA Nemotron-3 8B LLMs on the Model Catalog
We are excited to announce that we are expanding our partnership with NVIDIA to bring the best of NVIDIA AI software to Azure. This includes a new family of large language models (LLMs) called the NVIDIA Nemotron-3 8B, Triton TensorRT-LLM server for inference, and the NeMo framework for training. The NVIDIA Nemotron-3 8B family of models joins a growing list of LLMs in the AI Studio model catalog. AI Studio enables Generative AI developers to build LLM applications by offering access to hundreds of models in the model catalog and a comprehensive set of tools for prompt engineering, fine-tuning, evaluation, retrieval augmentation generation (RAG) and more. NVIDIA Nemotron-3 8B family includes a pre-trained model, multiple variants of the chat model and question-answering model built on NVIDIA NeMo, a framework to build, customize, and deploy generative AI models that offers built-in parallelism across GPUs for distributed training and supports RLHF, p-tuning, prompt learning, and more for customizing models. Models trained with NeMo can be served with Triton Inference server using the TensorRT LLM backend that can generate GPU specific optimizations to achieve multi-fold increase in inference performance. “Our partnership with NVIDIA represents a significant enhancement to Azure AI, particularly with the addition of the Nemotron-3 8B models to our model catalog,” said John Montgomery, Corporate Vice President, Azure AI Platform at Microsoft. “This integration not only expands our range of models but also assures our enterprise customers of immediate access to cutting-edge generative AI solutions that are ready for production environments.” NVIDIA Nemotron-3 8B Models Nemotron-3-8B Base Model: This is a foundational model with 8 billion parameters. It enables customization, including parameter-efficient fine-tuning and continuous pre-training for domain-adapted LLMs. Nemotron-3-8B Chat Models: These are chatbot-focused models that target LLM-powered chatbot interactions. Designed for global enterprises, these models are proficient in 53 languages and are trained in 37 different coding languages. There are three chat model versions: Nemotron-3-8B-Chat-SFT: A building block for instruction tuning custom models, user-defined alignment, such as RLHF or SteerLM models. Nemotron-3-8B-Chat-RLHF: Built from the SFT model and achieves the highest MT-Bench score within the 8B category for chat quality. Nemotron-3-8B-Chat-SteerLM: Offers flexibility for customizing and training LLMs at inference, allowing users to define attributes on the fly. Nemotron-3-8B-Question-and-Answer (QA) Model: The Nemotron-3-8B-QA model is a question-and-answer model that's been fine-tuned on a large amount of data focused on the target use case. The Nemotron-8B models are curated by Microsoft in the ‘nvidia-ai’ Azure Machine Learning (AzureML) registry and show up on the model catalog under the NVIDIA Collection [Fig 1]. Explore the model card to learn more about the model architecture, use-cases and limitations. Fig 1. Discover Nemotron-3 models in Azure AI Model catalog High performance inference with Managed Online Endpoints in Azure AI and Azure ML Studio The model catalog in AI Studio makes it easy to discover and deploy the Nemotron-3 8B models along with the Triton TensorRT-LLM inference containers on Azure. Filter the list of models by the newly added NVIDIA collection and select one of the models. You can review the model card, code samples and deploy the model to Online Endpoints using the deployment wizard. The Triton TensorRT-LLM container is curated as an AzureML environment in the ‘nvidia-ai’ registry and passed as the default environment in the deployment flow. Once the deployment is complete, you can use the Triton Client library to score the models. Fig 2. Deploy Nemotron-3 models in Azure AI Model Catalog Prompt-tune Nemotron-3 Base Model in AzureML Studio You can tune the parameters of the Nemotron-3-8B-Base-4k model to perform well on a specific domain using prompt-tuning (P-tuning). This is supported for the text-generation task and curated for our users as an AzureML component. There are two ways customers can perform p-tuning on Nemotron-3 base model today on the AzureML Studio – using code-first approach leveraging the notebook samples in the model card and going no-code leveraging the drag-and-drop experience on AzureML pipelines. The NeMo P-tuning components are available in the ‘nvidia-ai’ registry. You can build your custom P-tuning pipelines as shown in Fig 3, configure the input parameters such as learning rate, max_steps etc. in the P-tuning component and submit the pipeline job which outputs the P-tuned model weights. Fig 3. Author custom pipelines to P-tune Nemotron-3-8B-Base-4k using AzureML Designer Fig 4. Sample P-tuning and Evaluation pipeline for Text-Generation task Evaluate Nemotron-3 Base Model Nemotron-3-Base-4k model can be evaluated on a set of tasks such as text generation, text classification, translation and question-answering. AzureML offers curated Evaluation pipelines which evaluate the Nemotron-3 Base model by performing batch inference to generate predictions on the test data and using them further to generate task-based performance metrics such as perplexity and GPT evaluation metrics (Coherence, Fluency, Groundedness and Relevance). The curated components for evaluation are available in the 'nvidia-ai' registry for users to consume while authoring pipelines in AzureML Designer, with an experience similar to Fig 3. The sample notebooks for evaluation are available on the model card. Fig 5. Out-of-the-box evaluation component for QA task Note about the license Users are responsible for compliance with the terms of NVIDIA AI Product Agreement for the use of Nemotron-3 models. Conclusion “We are excited to team with Microsoft to bring NVIDIA AI Enterprise, NeMo, and our Nemotron-3 8B to Azure AI,” said Manuvir Das, VP, Enterprise Computing, NVIDIA. “This gives enterprise developers the easiest path to train and tune LLMs and deploy them at scale on Azure cloud.” We want Azure AI to be the best platform for Generative AI developers. The integration with Nvidia AI Enterprise software advances our goal to integrate with the best models and tools for Generative AI development and enables you to build your own delightful LLM applications on Azure.21KViews0likes0CommentsSecure Model Deployments with Microsoft Entra and Managed Online Endpoints
With Microsoft Entra and Azure Machine Learning managed online endpoints, you can consume multiple endpoints using a single token with full RBAC support and streamline control plane and data plane operations.17KViews2likes0CommentsAI Innovation Continues: Introducing Mistral Large 2 and Mistral Nemo in Azure
Exciting News! We're expanding our partnership with Mistral AI by introducing Mistral Large 2 and Mistral Nemo models to Azure AI, offering state-of-the-art reasoning, multilingual support, and coding capabilities.14KViews1like1CommentFine-tune FLUX.1 LORA with your own images and Deploy using Azure Machine Learning
The landscape of artificial intelligence and machine learning continues to evolve rapidly, with significant advancements in generative AI models. One such notable development comes from Black Forest Labs with their FLUX.1 suite of models. These models push the boundaries of text-to-image synthesis, offering unparalleled image detail, prompt adherence, and style diversity. In this blog, we will delve into the process of fine-tuning the FLUX model using Dreambooth, a method that has gained traction for its effectiveness in producing high-quality, customized AI-generated content.13KViews2likes4Comments