Understanding your R strategy options on the Azure AI Platform
Published Jul 03 2019 06:51 AM 11.5K Views
Microsoft

The R and Python programming languages are primary citizens for data science on the Azure AI Platform. These are the most common languages for performing data preparation, transformation, training and operationalization of machine learning models; the core components for one’s digital transformation leveraging AI. Yet they are fundamentally different in many aspects, directly affecting not only deployed solutions IT architectures but also but also corporate strategies for developer skills and product supportability.

 

This series of articles is designed help you understand the options your company and customers have to support and evolve their R strategy.

 

The articles in this series are:

  • Understanding your R strategy options on the Azure AI Platform (this post)
  • A deeper dive on R operationalization options
  • Big Data architectural topologies with R
  • Orchestrating mixed R and Python Machine Learning pipelines with AML Services

As with all of our blog posts, please share your questions, comments, and feedback.

 

Python vs R... or both?

 

Programming languages flame wars are common within the technology community. They arise from habit and trauma and influence team strategies, product placement and solution feasibility. Addressing coexistence and integration under your solution platform is key to enabling productivity and optimizing tool usage. They should not compete; they should work together to achieve more in a sustainable way.

 

Both Python and R have unbelievable strengths: great community; astonishing number of libraries; big data processing capabilities and many integration patterns. To exemplify trends on the latter: reticulate library to the rescue; Apache Spark and Apache Arrow on big data, MLS and many other possibilities.

 

Python has been playing the major role in deep learning methods: it is a full object oriented programming language, streamlines end-to-end DevOps workflows and is the go-to resource for data engineers, AI developers and deep learning research.

 

R is a very accomplished language for statistical and mathematical research, has unmatched libraries for data transformation and visualization and has a very concise experience for exploration and experimentation, one of the reasons it is often used in research to push the boundaries of the domain.

 

Here are some key realities about R:

  • R has plenty of support for scalable computations, the “capped by local resources” is long gone.
  • Python integration is possible and a common place using reticulate, feather and others
  • It is totally possible to use Keras, Tensorflow under R with full GPU support.
  • Both easily can and should leverage SQL-based platforms. Proficiency in SQL enables you to avoid unnecessary data movement.
  • Using ONNX to manage neural network models is a key pattern for integrating open platforms.
  • MLFlow has R support.
  • R has user interfaces for every user. RStudio is a first-class IDE, and Notebook based interfaces have long standing R support.
  • DevOps/MLOps pipelines and infrastructure as code are abstracted as API calls and JSON objects moving around, patterns that can be addressed by both languages.
  • Containers platforms (including AKS and ACI) can be used to deploy exposed models, usually using the libraries Shiny or plumber.

 

Azure Machine Learning Service to the rescue

 

This all is easier said than done. Aligning multiple technology roadmaps and having a streamlined experience is challenging. Azure Machine Learning Studio had the concept nailed with its canvas based experimentation interface, as you could visually place and connect Python and R steps into the workflow and generate a REST API for you models easily. The service was an easy fit for small teams and quick prototyping, but there were major limitations on compute scalability and platform experience for a large-scale corporate deployment.

 

The Azure Machine Learning service is a fully managed cloud service used to train, deploy, and manage machine learning models at scale. It has all the streamlined workflow experience and technological requirements and integrations needed for supporting these workloads. It is bringing over the same canvas-based experience, has integrated top-notch AutoML and makes the integration between Jupyter based interfaces and distinct compute targets a breeze. The service only supports Python as of today, but that is about to change. Microsoft is committed to bring R support to Azure Machine Learning Service.

 

 

A review of your topology options

 

The Azure Data Architecture guide provides a comprehensive landscape of the Machine Learning products to be used to choose what makes sense for your current scenario. Be it on-premises, cloud, hybrid, Python, R and both; there is an architecture pattern and product to support your solutions.

 

Incorporating a hybrid approach is feasible and encouraged. R can be successfully deployed and managed under many scenarios. The following decision-making patterns come to mind while we are engineering R solutions to our customers and partners:

 

  • When tackling cloud solutions:
  • When dealing with on-premises and hybrid approaches:
    • SQL Server Machine Learning Services
    • Machine Learning Server
    • Spark on AKS
    • Spark on-premises
  • When choosing interfaces: RStudio, Jupyter notebooks; VSCode; Visual Studio and others are all welcome.
  • When considering I/O and performance: R can read and write to pretty much everything (just like python):
  • When considering operationalization: Docker and Kubernetes are the standards to pursuit. This is maturing fast.

 

Conclusion

 

The goal of this post was to give you an updated take on Microsoft’s current support for R, and the forthcoming capabilities to support R and Python-based deployments. Deployment and sustainability of corporate solutions can be challenging, but R is a strong choice both a programming language and a platform for machine learning solutions. Today those challenges are around streamlined orchestration, model management, devops and operationalization for R, and soon the Azure Machine Learning Service will elegantly address all those challenges and enable smoother end to end hybrid Python and R solutions.

 

Please leave your comments! The other articles of this series are on the way.

 

 

1 Comment
Version history
Last update:
‎Jul 08 2019 07:01 AM
Updated by: