Azure Machine Learning
When you start using Azure Machine Learning (AML), you have many options to choose from. One option is Azure Machine Learning Designer, which is easy to use and doesn't require coding. Another option is Azure Machine Learning Automated Machine Learning, which uses Python. You can also use the AML Studio, which has a graphical user interface. If you prefer using open-source models like ScikitLearn or Tensorflow, you can easily integrate them into AML. AML also has MLFlow, which helps you track and monitor your models. If you want to deploy a custom model, AML provides tools and resources to make it easier. Azure Machine Learning gives you lots of choices to find the best approach for your needs and preferences.
Azure Machine Learning Designer
Let's start with Azure Machine Learning designer, a useful tool for data science. It has a simple interface where you can easily drag and drop datasets from different sources like Azure Blob Storage, Azure Data Lake Storage, Azure SQL, or local files. You can also preview and visualize the data with just one click. It offers various built-in modules to preprocess data and do feature engineering. You can build and train machine learning models using advanced algorithms for computer vision, text analytics, recommendations, and anomaly detection. You have the option to use pre-built models or customize them with Python and R code. You can execute machine learning pipelines interactively, cross-validate models, and visualize performance. Troubleshooting and debugging are made easier with graphs, log previews, and outputs. Deploying models for real-time and batch inferencing is streamlined, and models and assets are securely stored in a central registry for tracking and lineage. With Azure Machine Learning designer, you have endless possibilities to achieve data-driven success.
AML Designer Model Selection
When it comes to data science, one common question is, "Which machine learning algorithm should I use?" The answer depends on two important factors. First, you need to understand what you want to achieve with your data. This means identifying the business question you want to answer by analyzing historical data. Second, you need to evaluate the specific requirements of your data science scenario. This includes considering accuracy, training time, linearity, number of parameters, and number of features that your solution can handle. By considering these factors, you can make a well-informed decision about the best machine learning algorithm for your situation.
To help you figure out what you want to do with your data, you can use the Azure Machine Learning Algorithm Cheat Sheet. It's a useful resource that helps you find the right machine learning model for your predictive analytics solution. The Azure Machine Learning Designer offers a wide range of algorithms, such as Multiclass Decision Forest, Recommendation systems, Neural Network Regression, Multiclass Neural Network, and K-Means Clustering. Each algorithm is designed to solve specific machine learning problems. You can find a comprehensive list of these algorithms and detailed documentation on their functionality and parameter optimization in the Machine Learning Designer algorithm and component reference. https://learn.microsoft.com/en-us/azure/machine-learning/media/algorithm-cheat-sheet/machine-learnin...
Algorithm |
Accuracy |
Training time |
Linearity |
Parameters |
Notes |
Classification family |
|
|
|
|
|
Good |
Fast |
Yes |
4 |
|
|
Excellent |
Moderate |
No |
5 |
Shows slower scoring times. Suggest not working with One-vs-All Multiclass. |
|
Excellent |
Moderate |
No |
6 |
Large memory footprint |
|
Good |
Moderate |
No |
8 |
|
|
Good |
Moderate |
Yes |
4 |
|
|
Good |
Fast |
Yes |
5 |
Good for large feature sets |
|
Good |
Fast |
Yes |
4 |
|
|
Excellent |
Moderate |
No |
5 |
Shows slower scoring times |
|
Excellent |
Moderate |
No |
6 |
Tends to improve accuracy with some small risk of less coverage |
|
Good |
Moderate |
No |
8 |
|
|
- |
- |
- |
- |
See properties of the two-class method selected |
|
Regression family |
|
|
|
|
|
Good |
Fast |
Yes |
4 |
|
|
Excellent |
Moderate |
No |
5 |
|
|
Excellent |
Moderate |
No |
6 |
Large memory footprint |
|
Good |
Moderate |
No |
8 |
|
|
Clustering family |
|
|
|
|
|
Excellent |
Moderate |
Yes |
8 |
A clustering algorithm |
Azure Machine Learning Designer Evaluation
In this section, we will provide an overview of the metrics available for evaluating different types of models in the Evaluate Model framework. This includes classification models, regression models, and clustering models. By understanding the specific metrics for each model type, we can better assess their performance. Whether you need to evaluate the accuracy of a regression model, the precision and recall of a classification model, or the clustering quality of a clustering model, this section will help you effectively analyze and evaluate your model results.
Evaluation of Classification Models
When evaluating binary classification models, a set of crucial metrics are reported to assess their performance accurately. Accuracy, the first metric, gauges the effectiveness of a classification model by measuring the proportion of true results in relation to the total number of cases examined. Precision, on the other hand, quantifies the ratio of true positive results to all positive results, providing insights into the model's ability to correctly identify positive instances. Recall, the third metric, calculates the fraction of relevant instances that are accurately retrieved by the model. The F1 score, a weighted average of precision and recall, offers a comprehensive evaluation of the model's performance, with a perfect score of 1 indicating optimal accuracy. Additionally, the area under the curve (AUC) is measured by plotting true positives against false positives. This metric is particularly valuable as it allows for the comparison of models across different types, providing a single numerical value to assess performance. Notably, AUC is classification-threshold-invariant, meaning it assesses the predictive quality of the model regardless of the chosen classification threshold. By considering these metrics, one can gain a comprehensive understanding of the binary classification models under evaluation and make informed decisions based on their performance characteristics.
Evaluation of Regression Models
When assessing regression models, the metrics returned are specifically designed to estimate the amount of error present. A well-fitted model is characterized by a minimal difference between observed and predicted values. However, examining the residuals, which represent the difference between each predicted point and its corresponding actual value, provides valuable insights into potential bias within the model.
The evaluation of regression models entails the consideration of several key metrics. The mean absolute error (MAE) measures the proximity of predictions to the actual outcomes, with a lower score indicating better accuracy. The root mean squared error (RMSE) condenses the overall error into a single value, disregarding the distinction between over-prediction and under-prediction by squaring the differences. Relative absolute error (RAE) is determined by dividing the mean difference between expected and actual values by the arithmetic mean. Similarly, the relative squared error (RSE) normalizes the total squared error of predicted values by dividing it by the total squared error of actual values.
Furthermore, the coefficient of determination, commonly referred to as R2, serves as an indicator of the model's predictive power, ranging from 0 to 1. A value of 0 signifies a random model that explains nothing, while a value of 1 signifies a perfect fit. However, caution must be exercised when interpreting R2 values, as low values can be entirely normal and high values can raise suspicions. By thoroughly considering these metrics, one can effectively evaluate the performance of regression models and make informed decisions based on their error estimation and predictive capability.
Evaluation of Clustering Models
When it comes to clustering models, they exhibit notable distinctions from classification and regression models, which is why Evaluate Model provides a distinct set of statistics tailored specifically for clustering models.
The statistics furnished for clustering models offer valuable insights into various aspects, including the allocation of data points to each cluster, the degree of separation between clusters, and the compactness of data points within each cluster.
These statistics are computed by averaging over the entire dataset and are accompanied by additional rows that present cluster-specific statistics.
The evaluation of clustering models entails the consideration of the following metrics:
Azure Machine Learning AutoML
Automated machine learning, also known as automated ML or AutoML, revolutionizes the development of machine learning models by automating the laborious and iterative tasks involved. This powerful capability empowers data scientists, analysts, and developers to construct ML models with remarkable scalability, efficiency, and productivity, all without compromising model quality. The implementation of automated ML within Azure Machine Learning is the result of a groundbreaking innovation stemming from Microsoft Research. With this cutting-edge technology, the process of building ML models is streamlined, enabling professionals to focus on higher-level tasks and leverage the full potential of machine learning to drive impactful outcomes.
In Azure Machine Learning, the training process incorporates the creation of multiple pipelines in parallel, where each pipeline explores different algorithms and parameters. These iterations involve pairing ML algorithms with feature selections, resulting in models that produce training scores. The model's fitness to the data is determined by the score of the desired metric, with higher scores indicating better performance. The training process continues until the experiment's defined exit criteria are met.
To conduct automated ML training experiments using Azure Machine Learning, the following steps can be followed:
AutoML Classification
Classification is a fundamental aspect of supervised learning, where models are trained on existing data and utilize that knowledge to make predictions on new data. Azure Machine Learning offers specialized featurizations designed specifically for classification tasks, such as deep neural network text featurizers tailored to enhance the accuracy of classification models. Exploring the available featurization options can provide valuable insights into optimizing the performance of classification algorithms. Additionally, AutoML in Azure Machine Learning supports a wide range of algorithms for classification tasks, offering flexibility and versatility in model development. Classification models have a wide array of applications, including fraud detection, handwriting recognition, and object detection, among others. To gain a practical understanding of classification and automated machine learning, you can refer to a Python notebook that provides an illustrative example and further exploration of these concepts. By delving into the world of classification and automated machine learning, you can unlock powerful techniques to make accurate predictions and drive impactful outcomes in diverse domains.
AutoML Regression
Regression tasks, similar to classification, are an essential component of supervised learning. Azure Machine Learning provides specialized featurization techniques tailored specifically for regression problems, offering a comprehensive set of options to enhance the performance of regression models. Familiarizing yourself with these featurization options can provide valuable insights into optimizing regression algorithms. Additionally, AutoML in Azure Machine Learning supports a diverse range of algorithms for regression tasks, ensuring flexibility and adaptability in model development. Unlike classification, where predicted values are categorical, regression models aim to predict numerical output values based on independent predictors. The primary objective of regression is to establish the relationship among these independent variables by estimating how one variable influences the others. For instance, a regression model can predict automobile prices based on features such as gas mileage, safety ratings, and more. By leveraging regression techniques in Azure Machine Learning, you can uncover valuable insights and make accurate predictions in various domains, enhancing decision-making processes and driving impactful outcomes.
AutoML Time-Series Forecasting
Forecasting plays a crucial role in the operations of any business, whether it involves predicting revenue, inventory levels, sales, or customer demand. By harnessing the power of automated ML, businesses can leverage a combination of techniques and approaches to obtain high-quality, recommended time-series forecasts. The list of supported algorithms for automated ML can be found here, providing a diverse range of options to suit specific forecasting requirements.
In the context of automated time-series experiments, a multivariate regression framework is employed. Historical time-series values are transformed into additional dimensions for the regressor, alongside other predictors. This approach offers a significant advantage over classical time series methods as it naturally incorporates multiple contextual variables and their interrelationships during the training process. Automated ML constructs a single model, often with internal branching, to accommodate all items and prediction horizons within the dataset. This approach allows for a more robust estimation of model parameters, enabling better generalization to unseen series.
Furthermore, advanced forecasting configurations encompass various features, such as holiday detection and featurization. Time-series and deep neural network (DNN) learners, including Auto-ARIMA, Prophet, and ForecastTCN, provide diverse options to cater to different forecasting needs. Grouping functionality extends support to many models, while rolling-origin cross-validation enables robust model evaluation. Configurable lags and rolling window aggregate features further enhance the forecasting capabilities of automated ML.
By utilizing these advanced configurations and techniques, businesses can harness the power of automated ML to generate accurate and insightful time-series forecasts. This empowers decision-makers to make informed choices, optimize operations, and drive success in a wide range of industries.
AutoML Computer Vision
The support for computer vision tasks in Azure Machine Learning offers a seamless and efficient approach to generating models trained on image data, catering to various scenarios such as image classification and object detection.
By leveraging this capability, users can effortlessly integrate with the data labeling feature in Azure Machine Learning. This integration enables the utilization of labeled data to train and generate accurate image models. Additionally, the performance of these models can be optimized by specifying the desired model algorithm and fine-tuning the hyperparameters to achieve the best possible results.
Once the model generation process is complete, users have the flexibility to download the resulting model or deploy it as a web service within Azure Machine Learning. This allows for easy access and utilization of the generated models in practical applications.
To ensure scalability and operational efficiency, Azure Machine Learning provides MLOps and ML Pipelines capabilities. These powerful features enable users to operationalize their computer vision models at scale, streamlining the deployment and management processes.
The authoring of AutoML models for vision tasks is facilitated through the Azure Machine Learning Python SDK, providing a user-friendly and intuitive development experience. Furthermore, the Azure Machine Learning studio UI allows easy access to experimentation jobs, models, and outputs, providing a comprehensive and accessible interface for managing and analyzing computer vision projects.
These tasks include multi-class image classification, multi-label image classification, object detection, and instance segmentation.
In multi-class image classification, the goal is to classify an image into a single label from a predefined set of classes. For example, an image can be classified as a 'cat,' 'dog,' or 'duck' based on its content.
On the other hand, multi-label image classification involves assigning multiple labels to an image from a given set of labels. This means an image can be labeled as both a 'cat' and a 'dog' simultaneously, based on its characteristics.
For object detection tasks, the objective is to identify and locate objects within an image by drawing bounding boxes around them. For instance, an algorithm can be used to detect and locate all instances of 'dogs' and 'cats' within an image, outlining each object with a bounding box.
Lastly, instance segmentation tasks involve identifying objects within an image at the pixel level. This means drawing a polygon around each object present in the image, providing a more precise delineation of their boundaries.
By harnessing the support for computer vision tasks in Azure Machine Learning, users can unlock the potential of image data and build powerful models for image classification and object detection. This empowers businesses to leverage visual information for enhanced decision-making, automation, and transformative experiences across various domains.
AutoML Natural Language Processing
The support for natural language processing (NLP) tasks in automated ML within Azure Machine Learning offers a seamless and efficient approach to generating models trained on text data. This capability caters to various scenarios, including text classification and named entity recognition.
By leveraging automated ML, users can easily author and train NLP models using the Azure Machine Learning Python SDK. This user-friendly development experience enables the creation of powerful NLP models for a wide range of applications.
The resulting experimentation jobs, models, and outputs can be conveniently accessed and managed through the Azure Machine Learning studio UI. This intuitive interface provides a comprehensive overview of NLP projects, allowing users to analyze and optimize their models effectively.
The NLP capability within Azure Machine Learning encompasses several key features. It supports end-to-end deep neural network training with the latest pre-trained BERT models, ensuring state-of-the-art performance in NLP tasks. Additionally, seamless integration with Azure Machine Learning data labeling simplifies the process of using labeled data for generating NLP models. The NLP capability also offers multi-lingual support, with the ability to process text in 104 different languages. Finally, distributed training with Horovod enables efficient and scalable NLP model training.
By leveraging the NLP support in Azure Machine Learning, businesses can unlock the power of text data and build sophisticated models for tasks such as text classification and named entity recognition. This empowers organizations to extract valuable insights from textual information, automate processes, and make informed decisions in a wide range of industries and domains.
Open Source Models
Once you have selected a model that suits your needs, Azure Machine Learning offers a suite of tools to enhance your model development and deployment process. This flexibility allows you to leverage popular frameworks such as SciKitLearn, TensorFlow, PyTorch, and XGBoost to create and train your own custom models.
Whether you are starting from scratch and training a machine learning model using scikit-learn, or you already have an existing model that you want to bring into the cloud, Azure Machine Learning provides the infrastructure to scale out your training jobs using elastic cloud compute resources. This ensures that you can efficiently handle large-scale training tasks and take full advantage of the cloud's scalability.
Furthermore, Azure Machine Learning enables you to build, deploy, version, and monitor production-grade models seamlessly. With robust tools and functionalities, you can ensure the smooth transition from model development to deployment, and effectively manage and monitor your models in a production environment.
By leveraging Azure Machine Learning's support for custom models and its comprehensive set of tools, you can accelerate your model development process, improve scalability, and seamlessly deploy and manage your models in production. This empowers data scientists and developers to deliver cutting-edge machine learning solutions and drive impactful outcomes in various industries and domains.
https://github.com/sqlshep/PredictiveMaintenance/tree/main/XGBoost
https://github.com/sqlshep/PredictiveMaintenance/tree/main/LSTM
Open Source and MLFlow
Managing the complete lifecycle of machine learning models can be a complex task. However, with MLflow, an open-source framework, this process becomes much more streamlined. MLflow offers a comprehensive solution for efficiently managing models across various platforms, ensuring a consistent set of tools regardless of where your experiments are running.
One of the standout features of MLflow is its ability to train and serve models on different platforms. Whether you're conducting experiments on your local computer, a remote compute target, a virtual machine, or an Azure Machine Learning compute instance, MLflow allows you to utilize the same set of tools. This flexibility enables you to focus on your tasks without worrying about the underlying platform.
For users of Azure Machine Learning workspaces, MLflow compatibility is a game-changer. It means that you can leverage Azure Machine Learning workspaces just as you would an MLflow server. This compatibility offers several advantages:
https://learn.microsoft.com/en-us/azure/machine-learning/concept-mlflow?view=azureml-api-2
Sample Notebooks using MLFlow
Training and tracking an XGBoost classifier with MLflow https://github.com/Azure/azureml-examples/blob/main/sdk/python/using-mlflow/train-and-log/xgboost_cl...
Hyper-parameter optimization using HyperOpt and nested runs in MLflow https://github.com/Azure/azureml-examples/blob/main/sdk/python/using-mlflow/train-and-log/xgboost_ne...
Logging models with MLflow https://github.com/Azure/azureml-examples/blob/main/sdk/python/using-mlflow/train-and-log/logging_an...
Manage runs and experiments with MLFlow https://github.com/Azure/azureml-examples/blob/main/sdk/python/using-mlflow/runs-management/run_hist...
Principal author:
Other contributors:
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.