mlops
52 TopicsHarness the power of Large Language Models with Azure Machine Learning prompt flow
Unlock the full potential of your AI solutions with our latest blog on prompt flow! Discover how to effectively assess and refine your prompts and flows, leading to production-ready, impactful LLM-infused applications. Don't miss out on these game-changing insights!107KViews17likes6CommentsIntroducing NVIDIA Nemotron-3 8B LLMs on the Model Catalog
We are excited to announce that we are expanding our partnership with NVIDIA to bring the best of NVIDIA AI software to Azure. This includes a new family of large language models (LLMs) called the NVIDIA Nemotron-3 8B, Triton TensorRT-LLM server for inference, and the NeMo framework for training. The NVIDIA Nemotron-3 8B family of models joins a growing list of LLMs in the AI Studio model catalog. AI Studio enables Generative AI developers to build LLM applications by offering access to hundreds of models in the model catalog and a comprehensive set of tools for prompt engineering, fine-tuning, evaluation, retrieval augmentation generation (RAG) and more. NVIDIA Nemotron-3 8B family includes a pre-trained model, multiple variants of the chat model and question-answering model built on NVIDIA NeMo, a framework to build, customize, and deploy generative AI models that offers built-in parallelism across GPUs for distributed training and supports RLHF, p-tuning, prompt learning, and more for customizing models. Models trained with NeMo can be served with Triton Inference server using the TensorRT LLM backend that can generate GPU specific optimizations to achieve multi-fold increase in inference performance. “Our partnership with NVIDIA represents a significant enhancement to Azure AI, particularly with the addition of the Nemotron-3 8B models to our model catalog,” said John Montgomery, Corporate Vice President, Azure AI Platform at Microsoft. “This integration not only expands our range of models but also assures our enterprise customers of immediate access to cutting-edge generative AI solutions that are ready for production environments.” NVIDIA Nemotron-3 8B Models Nemotron-3-8B Base Model: This is a foundational model with 8 billion parameters. It enables customization, including parameter-efficient fine-tuning and continuous pre-training for domain-adapted LLMs. Nemotron-3-8B Chat Models: These are chatbot-focused models that target LLM-powered chatbot interactions. Designed for global enterprises, these models are proficient in 53 languages and are trained in 37 different coding languages. There are three chat model versions: Nemotron-3-8B-Chat-SFT: A building block for instruction tuning custom models, user-defined alignment, such as RLHF or SteerLM models. Nemotron-3-8B-Chat-RLHF: Built from the SFT model and achieves the highest MT-Bench score within the 8B category for chat quality. Nemotron-3-8B-Chat-SteerLM: Offers flexibility for customizing and training LLMs at inference, allowing users to define attributes on the fly. Nemotron-3-8B-Question-and-Answer (QA) Model: The Nemotron-3-8B-QA model is a question-and-answer model that's been fine-tuned on a large amount of data focused on the target use case. The Nemotron-8B models are curated by Microsoft in the ‘nvidia-ai’ Azure Machine Learning (AzureML) registry and show up on the model catalog under the NVIDIA Collection [Fig 1]. Explore the model card to learn more about the model architecture, use-cases and limitations. Fig 1. Discover Nemotron-3 models in Azure AI Model catalog High performance inference with Managed Online Endpoints in Azure AI and Azure ML Studio The model catalog in AI Studio makes it easy to discover and deploy the Nemotron-3 8B models along with the Triton TensorRT-LLM inference containers on Azure. Filter the list of models by the newly added NVIDIA collection and select one of the models. You can review the model card, code samples and deploy the model to Online Endpoints using the deployment wizard. The Triton TensorRT-LLM container is curated as an AzureML environment in the ‘nvidia-ai’ registry and passed as the default environment in the deployment flow. Once the deployment is complete, you can use the Triton Client library to score the models. Fig 2. Deploy Nemotron-3 models in Azure AI Model Catalog Prompt-tune Nemotron-3 Base Model in AzureML Studio You can tune the parameters of the Nemotron-3-8B-Base-4k model to perform well on a specific domain using prompt-tuning (P-tuning). This is supported for the text-generation task and curated for our users as an AzureML component. There are two ways customers can perform p-tuning on Nemotron-3 base model today on the AzureML Studio – using code-first approach leveraging the notebook samples in the model card and going no-code leveraging the drag-and-drop experience on AzureML pipelines. The NeMo P-tuning components are available in the ‘nvidia-ai’ registry. You can build your custom P-tuning pipelines as shown in Fig 3, configure the input parameters such as learning rate, max_steps etc. in the P-tuning component and submit the pipeline job which outputs the P-tuned model weights. Fig 3. Author custom pipelines to P-tune Nemotron-3-8B-Base-4k using AzureML Designer Fig 4. Sample P-tuning and Evaluation pipeline for Text-Generation task Evaluate Nemotron-3 Base Model Nemotron-3-Base-4k model can be evaluated on a set of tasks such as text generation, text classification, translation and question-answering. AzureML offers curated Evaluation pipelines which evaluate the Nemotron-3 Base model by performing batch inference to generate predictions on the test data and using them further to generate task-based performance metrics such as perplexity and GPT evaluation metrics (Coherence, Fluency, Groundedness and Relevance). The curated components for evaluation are available in the 'nvidia-ai' registry for users to consume while authoring pipelines in AzureML Designer, with an experience similar to Fig 3. The sample notebooks for evaluation are available on the model card. Fig 5. Out-of-the-box evaluation component for QA task Note about the license Users are responsible for compliance with the terms of NVIDIA AI Product Agreement for the use of Nemotron-3 models. Conclusion “We are excited to team with Microsoft to bring NVIDIA AI Enterprise, NeMo, and our Nemotron-3 8B to Azure AI,” said Manuvir Das, VP, Enterprise Computing, NVIDIA. “This gives enterprise developers the easiest path to train and tune LLMs and deploy them at scale on Azure cloud.” We want Azure AI to be the best platform for Generative AI developers. The integration with Nvidia AI Enterprise software advances our goal to integrate with the best models and tools for Generative AI development and enables you to build your own delightful LLM applications on Azure.21KViews0likes0CommentsDeploying Hugging Face Hub models in Azure Machine Learning
Microsoft has partnered with Hugging Face to bring open-source models to Azure Machine Learning. Hugging Face is the creator of Transformers, a widely popular library for working with over 200,000 open-source models hosted on the Hugging Face hub. Thanks to this partnership, you can now find thousands of transformer models in the new Hugging Face collection Azure ML model catalog and deploy them in just a few clicks on managed endpoints running on secure and scalable Azure infrastructure.20KViews2likes0CommentsAn Enterprise Design for Azure Machine Learning - An Architect's Viewpoint
This article provides an opinionated design for an enterprise-level data science capability, implemented within an Azure data platform. The guidance provides a starting point for the design of an ML platform that fits your business requirements.18KViews6likes2CommentsSecure Model Deployments with Microsoft Entra and Managed Online Endpoints
With Microsoft Entra and Azure Machine Learning managed online endpoints, you can consume multiple endpoints using a single token with full RBAC support and streamline control plane and data plane operations.18KViews2likes0CommentsContinuously Monitor the Performance of your AzureML Models in Production
We are thrilled to announce the public preview of Azure Machine Learning model monitoring, allowing you to effortlessly monitor the overall health of your deployed models. Model monitoring is an essential part of the cyclical machine learning lifecycle, encompassing both data science and operational aspects of tracking model performance in production. Changes in data and consumer behavior can influence your model, causing your AI systems to become outdated. This may result in reduced model performance in production, adversely affecting business outcomes and potentially leading to compliance concerns in highly regulated environments. With AzureML model monitoring, you can receive timely alerts about critical issues, analyze results for model enhancement, and minimize the numerous inherent risks associated with deploying ML models. Capabilities of AzureML model monitoring AzureML model monitoring provides the following capabilities: Simple model monitoring configuration with AzureML online endpoints. If you deploy your model to production with AzureML online endpoints, AzureML collects production inference data automatically and uses it for continuous model monitoring, providing you with an easy configuration process. Pre-configured and customizable monitoring signals. Model monitoring supports a variety of configurable monitoring signals for tabular datasets, including data drift, prediction drift, data quality, and feature attribution drift. You can choose your preferred metric(s) and adjust alert thresholds for each signal. If the pre-configured signals don't suit your needs, create a custom monitoring signal component tailored to your business scenario. Use of recent past production data or training data as comparison baseline dataset. For model signals and metrics, AzureML lets you set these datasets as the baseline dataset for comparison, enabling you to monitor for both drift and skew. Monitoring of data drift or data quality based on feature importance explanations. If you use training data as your comparison baseline dataset, you can define data drift or data quality signals and monitor only the most important features for your predictions, saving costs. Analyze monitoring metrics from a comprehensive UI. View change in drift metrics over time, see which features are violating defined thresholds, and analyze your baseline and production feature distributions side-by-side within a comprehensive monitoring UI. AzureML model monitoring signals Evaluating the performance of a production ML system requires examining various signals, including data drift, model prediction drift, data quality, and feature attribution drift. Such shifts can lead to outdated models: by identifying these shifts, organizations can proactively implement measures like model retraining to maintain optimal model performance and minimize risks associated with outdated or mismatched data. Data drift: Monitoring data drift is vital for maintaining the accuracy and performance of machine learning models in production. AzureML allows you to detect changes in data distributions, mitigating risks associated with outdated or mismatched data. Prediction drift: Significant changes in a model's prediction distribution may indicate prediction drift, which can result from shifts in data or code. AzureML’s proactive monitoring of model outputs aids you in identifying issues within the model as it responds to these data shifts. Data quality: Maintaining data quality is essential, as errors in upstream data processing can lead to unexpected model behavior. Changes in data sources, schemas, logging, or upstream features generated by other ML models can impact your model significantly. AzureML detects data issues such as null values, range violations, or type mismatches, ensuring optimal performance and enabling you to proactively fix issues. Feature attribution drift: Changes in feature importance distributions between training and production may signify feature attribution drift, potentially indicating unexpected model behavior. AzureML helps you evaluate each feature's influence on predictions by tracking their contributions over time and detecting shifts in feature importance, which helps identify unexpected behavior and potential accuracy impacts. For a complete overview of AzureML model monitoring signals and metrics, take a look at this document. How to enable AzureML model monitoring Take the following steps to enable model monitoring in AzureML: Enable production inference data collection. If you deploy a model to an AzureML online endpoint, you can enable production inference data collection by using AzureML Model Data Collector. If you deploy your model to an AzureML batch endpoint or outside of AzureML, you're responsible for collecting your own production inference data, which can then be used for AzureML model monitoring. Configure model monitoring. You can use AzureML’s SDK, CLI, or the Studio UI to easily set up model monitoring. During setup, you can specify your preferred monitoring signals, configure your desired metrics, and set the respective alert threshold for each metric. View and analyze model monitoring results. Once model monitoring is configured, a monitoring job is scheduled, which calculates and evaluates metrics for all selected monitoring signals, and triggers alert notifications whenever a specified threshold is exceeded. You can follow the link in the alert notification to your AzureML workspace to view and analyze monitoring results. AzureML model monitoring best practices Each machine learning model and its use cases are unique. Therefore, model monitoring is unique for each situation. The following is a list of recommended best practices for model monitoring: Start monitoring your model as soon as it is deployed to production. The sooner you begin monitoring your production model, the sooner you will be able to identify issues and resolve them. Work with data scientists that are familiar with the model to set up model monitoring. These data scientists have insight into the model and its use cases. They are best positioned to recommend the best monitoring signals, metrics, and alert thresholds to use, thereby reducing alert fatigue. Include multiple monitoring signals in your monitoring setup. With multiple monitoring signals, you get both a broad view of your model’s health in addition to granular insights into model performance. For example, you can combine both data drift and feature attribution drift signals to get an early warning about a model performance issue. Use model training data as the baseline dataset. For comparison based on the baseline dataset, AzureML allows you to use the recent past production data or historical data (such as training data or validation data). For a meaningful comparison, we recommend that you use the training data as the comparison baseline for data drift and data quality. For prediction drift, we recommend using the validation data as the comparison baseline. Specify the monitoring frequency based on how your production data will change over time. For example, if your production model has a large amount of daily traffic, and the daily data accumulation is sufficient for you to monitor, then you can configure your model monitor to run on a daily basis. Otherwise, you can consider a weekly or monthly monitoring frequency, based on the growth of your production data over time. Monitor the top N important features or a subset of features. If you use training data as your comparison baseline by default, AzureML monitors data drift or data quality for the top 10 important features. For models that have a large number of features, consider monitoring a subset of those features to reduce both computation costs and monitoring noise. Get started with AzureML model monitoring today Get started with AzureML model monitoring today! You can find more information about AzureML Model Monitoring below: https://aka.ms/azureml-momo/doc To learn more about AzureML model monitoring, watch these Microsoft Build 2023 breakout sessions: Breakout: Practical deep-dive into machine learning techniques and MLOps Breakout: Build and maintain your company Copilot with Azure Machine Learning and GPT-417KViews3likes11Comments