Deriving advanced insights with Artificial Intelligence using Azure Machine learning and Snowflake
Published Sep 16 2022 01:52 PM 3,169 Views
Established Member

Azure Machine Learning is a cloud service for accelerating and managing the machine learning project lifecycle. Machine learning professionals, data scientists, and engineers can use it in their day-to-day workflows: Train and deploy models and manage MLOps. Azure Machine learning can be used to create advanced models using open-source technologies and the MLOps tools help with monitor, retrain, and deploy models. Snowflake Data Cloud supports advanced workloads like Artificial intelligence and Machine learning enabling enterprises to have a single place to instantly access all the relevant data by having a single point of global network of trust data and providing native support for structured, semi-structured (JSON, Avro, ORC, Parquet, or XML), and unstructured data.


Customers can leverage the power of Azure Machine learning with Snowflake utilizing Snowpark to support various ml-driven data science use-cases like Forecasting, Prediction, etc. By using the best of both technologies Customers can now develop and deploy ML models in a secure enterprise ready collaborative environment. Azure ML can interface with Snowflake in both Notebooks as well as Pipelines. Notebooks can be used for Ad-hoc analytics and Pipelines can be used for ML Pipelines.


Connecting via Azure Data Factory

Azure Data Factory can be leveraged to connect to Snowflake, pull in the data from Snowflake and staged into Azure ML Datastore (Data Lake). This can then be registered as a Data set and utilized in Azure ML. Azure Data Factory would orchestrate the flow and can trigger Azure ML Notebooks and Pipelines. Azure Data Factory can also be used to push the processed data back into Snowflake.



Connecting via Snowflake Python Connector

Snowflake Python connector can be used to connect to Snowflake directly and consume the data in Azure ML. The Snowflake Python connector supports push down to execute queries directly in Snowflake utilizing Snowflake’s compute. The Python connector can be used in Notebook as well as Pipelines for real-time and batch endpoints.



Using Snowpark with Azure ML

Snowpark provides the ability to execute ML workloads in Snowflake. It supports pushdowns for all operations, including Snowflake UDFs and all the computations are done within Snowflake. Snowpark can be leveraged by Azure ML to deploy ML models into Snowflake. The model can be prepared and trained in Azure ML to deploy into Snowflake using Snowpark. The Snowpark function can be triggered by Azure ML for processing.




The complete picture

We have created a simple Azure ML notebook to demonstrate the various integration capabilities. The notebook uses a simple banking dataset and leverages Azure ML notebook and compute to connect to Snowflake and use the data to train the model and deploy it using Snowpark UDF to run batch inference.


The notebook uses a classification model that applies the data science process lifecycle. The Data Source is Snowflake tables accessed using the python Snowflake connector. EDA is then applied to get a sense of the data and feature engineering is used to transform the data into features which can be processed by the model. The transformed features are written back to a Snowflake table, so they can be utilized by the model. The model is trained using Random Forest Classifier from skikit-learn and evaluated using the confusion matrix. The python libraries are imported into Snowflake using the stage and the model is deployed into Snowflake as a UDF to run batch inference on a random set of records. The predicted data is stored into Snowflake. The notebook can be accessed from here.


Version history
Last update:
‎Sep 19 2022 11:05 AM
Updated by: