Announced last week at Ignite 2021, data teams now have a handful of new opportunities to drive value with machine learning built directly into their Apache Spark pools in Azure Synapse Analytics.
With the general availability of our machine learning library for Apache Spark on Azure Synapse, data teams now have expanded access to both code-first and code-free ML tools for forecasting, model training, and pre-built AI. This library provides both familiar open-source tools such as LightGBM as well as proprietary solutions to provide a comprehensive, streamlined approach to ML workloads. Updates include PREDICT, a new keyword that supports scoring AzureML and MLFlow models directly in Azure Synapse, and integration with Azure Cognitive Services, now generally available.
Simplified scalable batch scoring with PREDICT and MLFlow
We are simplifying the way customers can enrich their data with predictive models for batch scoring at scale in Azure Synapse. The PREDICT keyword on Spark, now in public preview, drastically simplifies the handoff between an ML model producer and a person operationalizing the model for batch scoring (the model consumer) by allowing you to point to MLFlow packaged and registered models in Azure Machine Learning directly from your Synapse workspace. This way, you can easily run predictions at large scale on Spark on Synapse using a variety of MLFlow model flavors to streamline and simplify the batch scoring process. PREDICT also supports referencing models stored in Azure Data Lake Storage Gen2.
The model flavors supported in this public preview are Sklearn, Pytorch, and TensorFlow, and we continuously work on expanding this list. PyFunc models can also be used with PREDICT. As a user, all you need to do is to point to your model and the data within the secured boundaries of Azure Synapse. Information like model type and expected inputs and outputs are simply picked up from the MLFlow packaging format, which the model producer captured at the time of training the model.
PREDICT simplifies calling ML models for processing and analysis
Pre-built AI on Spark with Azure Cognitive Services
Data professionals in Synapse can leverage pre-built AI directly within from Spark using a built-in integration with Azure Cognitive Services, now generally available in Azure Synapse Analytics. This includes support for a large range of cognitive services capabilities . Examples of newly added support for Translator and Form Recognizer and authentication is now supported through linked services as well as in data exfiltration protected workspaces.
AI solutions for streamlining retail product recommendations
Industry solutions in Azure Synapse Analytics make it easy to get started with solving common industry problems. As of Ignite, customers can now use a retail AI solution to streamline building a recommendation engine in Spark on Azure Synapse. This solution can be deployed on a database modeled using Azure Synapse database templates, also announced at Ignite. Read our blog from last week to learn more.
A rich ecosystem of ML tools on Spark
Our machine learning library for Apache Spark on Azure Synapse makes it possible for data engineers and data scientists to further simplify and streamline machine learning in Azure Synapse. This Spark library contains both familiar open source and new proprietary machine learning tools available in every Azure Synapse workspace. Whether you are a data engineer looking to enrich your data with pre-trained ML models or a data scientist looking to develop high-performing distributed ML models, this library simplifies machine learning on Apache Spark and shortens the time to value from your data.
Our customers can now easily embed state of the art machine learning systems such as Azure Cognitive Services, Light GBM, Vowpal Wabbit, MLFLow and AzureML into their existing Spark workflows. These tools enable new classes of powerful and scalable machine learning solutions across a variety of business use cases such as predictive maintenance, cybersecurity, computer vision, retail, and more. Moreover, these capabilities are available to developers across the many different language surfaces of Spark, including Python, Scala, Java, and R.