Developers Introduction to Data Science (video series)
Published Jul 14 2020 01:51 PM 17.3K Views

Data science is about extracting knowledge from data. Data science is an important area of study because it is a  tool that data scientists leverage to gain insights from data and prepare it for the machine learning modeling phase. By “doing data science”, data scientists actually apply techniques, such as data pre-processing and cleaning, feature engineering and descriptive statistics, to their data in order to understand it and start building AI solutions.

 

In this sense, data science has become an area of study that universities and companies should look at as a first step to start their machine learning journey:

 

 

Picture1.png

To learn more about Machine Learning versus AI and Deep Learning, please visit: https://www.aka.ms/DLvsML

 

Sarah Guthals, PhD (@sarahguthals) and Francesca Lazzeri, PhD (@frlazzeri) have released “A Developer's Introduction to Data Science”, a 28-part video series that focuses on how to use data science to build machine learning solutions: this series is now live on both Channel 9 and YouTube.

 

The content of this series is structured as follow:

 

Video Title

Video Description

Introduction to the Developer's Intro to Data Science Video Series

In this 28-video series, you will learn important concepts and technologies to build your end-to-end machine learning applications on Azure. To learn more, check out: http://www.aka.ms/DevIntroDS_GitHub​ and http://www.aka.ms/DevIntroDS_Learn

 

What is the Data Science Lifecycle?

In this video you will learn what the Data Science Lifecycle is and how you can use it to design your data science solutions.

How do you define your business goal and scope your data science solution?

In this video, Sarah describes the problem that she is facing where she thinks data science methods might be able to help her improve her business goals.

What is Machine Learning?

What is Machine Learning? In this video you will learn what machine learning, supervised learning and unsupervised learning are and how you can use the model development cycle to build, train, test and deploy your machine learning models.

Which Machine Learning Algorithm Should You Use?

Which machine learning algorithm should I use? In this video you will learn how to select the right machine learning algorithm for your data science scenario and how to answer different questions with different machine learning approaches.

What is AutoML?

In this video, you will learn how you can use Automated machine learning (Automated ML) to accelerate the data science life cycle.

How do you create a machine learning resource in Azure?

In this video, you will learn how to create a Machine Learning resource inside of Azure. By using Azure for your machine learning toolset, you're able to create the storage account, application insights, key vault and container registry (all resources that will support your machine learning work) in just a matter of minutes.

How do you setup your local environment for data exploration?

In this video, you will learn how to setup your local environment for data exploration. Specifically, you will setup Visual Studio Code to be able to run Python Jupyter Notebooks and connect to your Azure Machine Learning resource.

How do Jupyter notebooks work in Visual Studio Code?

In this video, you will get an introduction to how Jupyter Notebooks work inside of Visual Studio Code and install the Python packages useful for this data science project, and make sure you have access to the AzureML SDK.

How do you connect your Azure Machine Learning resources to your local Visual Studio Code environment?

In this video, you will learn how to connect the Machine Learning resource that you created in Azure to your local Visual Studio Code environment. This allows you to run your machine learning experiments on the cloud instead of locally.

How do you prepare your data for a time series forecast?

In this video you will learn how to prepare your data to be effectively run through machine learning algorithms. Then, you will learn how to upload your data from your local computer into your Azure Machine Learning resource (specifically the datastore resource) and how to

Why do you split data into testing and training data in data science?

In this video, you will learn why you split your data into training and testing data. Then you will learn how to actually split your data using a date into two different Pandas DataFrames using Python in Visual Studio Code.

What is an AutoML Config file?

In this video you will learn how to run a machine learning experiment with Automate ML and how to create your AutoMLConfig file to submit an automated ML experiment in Azure Machine Learning.

What should your parameters be when creating an AutoML Config file?

See how Sarah and Francesca configure and run an AutoMLConfig file to submit an automated ML experiment in Azure Machine Learning.

How do you create an AutoML Config file and run your data science experiments on the cloud?

In this video, you will actually put your data through AutoML in Azure to train and test with a number of machine learning algorithms that Azure supports.

What is Azure Machine Learning?

Azure Machine Learning is a cloud-based environment that you can use to train, deploy, automate, manage, and track your machine learning models.

How can you collaborate on Jupyter Notebooks using Azure Machine Learning studio?

In this video, you will see the Azure Machine Learning Studio and learn how create a Jupyter Notebook in the cloud. By doing this, you ensure you have access to your code anywhere.

How do you choose the best model and perform feature engineering?

In this video you will learn how to use Automated ML to select your best model and perform features engineering: Automated ML is the process of automating the time consuming, iterative tasks of machine learning model development and can help you optimize when developing end to end applications on Azure.

How do you use Azure ML for best model selection and featurization?

During training, the Azure Machine Learning service creates a number of in parallel pipelines that try different algorithms and parameters. When configuring your experiments, you can enable the advanced setting featurization, that can help you with automatic data cleansing, preparing, and transformation to generate synthetic features.

How do you evaluate and retrieve a time series forecast from Azure Machine Learning?

In this video, you will learn how to use an external python function to run your data through a forecast evaluation. Using Python files uploaded to the cloud environment within the Azure Machine Learning Studio, you can call functions within those files from the Jupyter Notebooks within the same cloud environment.

How do you score your machine learning model on accuracy?

In this video, you will use the root mean squared error, mean absolute error, and mean absolute percentage error to score the accuracy of your model. You will then learn how to visualize the productions of your model within the Jupyter Notebook within the Azure Machine Learning studio cloud environment using scatter plots.

How do you deploy a machine learning model as a web service within Azure?

In this video, you will gather all of the important pieces of your model to be able to deploy it as a web service on Azure so that your other applications can call it on the fly.

What have you learned from deploying a machine learning model as a web service?

In this video, Sarah summarizes all of the learnings from measuring the accuracy of the machine learning model used in this series. Sarah also revisits the business goal to determine whether the effort would actually provide valuable information for her business.

What is the importance of model deployment in machine learning?

In this video, Francesca summarizes the most important steps to deploy your machine learning models with Azure Machine Learning. Model deployment is the method by which you integrate a machine learning model into an existing production environment.

How do you select the right machine learning algorithm?

A common question in data science is “Which machine learning algorithm should I use?”. In this video you will learn how the algorithm you select depends primarily on two different aspects of your data science scenario:
1) What you want to do with your data? Specifically, what is the business question you want to answer by learning from your past data?
2) What are the requirements of your data science scenario? Specifically, what is the accuracy, training time, linearity, number of parameters, and number of features your solution supports?

How does ethics play a role in data science?

In this video, Sarah challenges you to think about where ethics plays a role in all data science problems. Regardless of the type of data analysis or machine learning model you are using, your questions, data, and parameters to your algorithms might introduce bias and actually cause harm.

What is model interpretability and how can you incorporate it into your data science solutions?

Interpretability is critical for data scientists, auditors, and business decision makers alike to ensure compliance with company policies, industry standards, and government regulations. In this video you will learn how to use the Model Interpretability toolkit to explain your models

Concluding the Developer's Intro to Data Science Video Series

In this 28-video series, you learnt important concepts and technologies to build your end-to-end machine learning applications on Azure: Sarah and Francesca guided you through the data science process, from understanding your data, to applying machine learning algorithms and deploying your models on Azure.

 

 Additional resources:

More videos coming:

The month of July 2020 is Data month for the Microsoft Reactors global live streams on Twitch! For the middle two weeks of July, you will get to dive even deeper on these and similar concepts with Sarah, Francesca, and other Cloud Advocates and Microsoft employees!

1 Comment
Version history
Last update:
‎Aug 04 2020 06:02 AM
Updated by: