Blog Post

Educator Developer Blog
4 MIN READ

Data Science and Machine Learning using regression models

bethanyjep's avatar
bethanyjep
Icon for Microsoft rankMicrosoft
Jul 05, 2022

We have gone far beyond learning all the numerous Machine Learning concepts and techniques, now it is time to put into practice what we have been learning. But first, a recap: 

 

In the 30DaysOfLearning, we cover different Machine Learning techniques, that as a Data Scientist equips you with the relevant knowledge to build, train and evaluate your models. The full curriculum can be found at: https://aka.ms/30DL-DSMLPage. In this blog, we will summarize the different concepts covered under machine learning and a link to the resources. 

Access to the required curriculum and content 

Understanding Machine Learning 

This is often the foundation for an AI system and is the way we "teach" a computer model to make predictions and draw conclusions from data. 

Machine learning automates the process of pattern-discovery by finding meaningful insights from real-world or generated data. Below is a summary of the Machine learning process: 

 

 Regression Models 

If you want to predict the probable height for a person of a given age, you'd use linear regression, as you're seeking a numeric value.  

If you're interested in discovering whether a type of cuisine should be considered vegan or not, you're looking for a category assignment so you would use logistic regression. 

You can watch the recording on Regression below and all the resources used during the session at: https://aka.ms/30DL-RegressionRe    

Before you go to the next section ensure you complete Regression module: Train and evaluate regression models - Learn | Microsoft Docs 

Deploying your Machine Learning models. 

You have successfully built your first model; how do you go about sharing the model? How do you utilize the models you have built in applications? One of the topics we covered was how to deploy your models. Utilizing UFO dataset, we went ahead and “pickled” our trained model and deployed it in a Flask application. Go ahead and watch the session video below Deploy Your ML Model Using Flask Framework - YouTube. Find all additional resources to guide you here. 

Classification Models 

 

 

Classification is a form of supervised learning that bears a lot in common with regression techniques. It generally falls into two groups: binary classification and multiclass classification. A simple machine learning classifier is as follows: 

 

 

X_train, X_test, y_train, y_test = train_test_split(cuisines_feature_df, cuisines_label_df, test_size=0.3) 

lr = LogisticRegression(multi_class='ovr',solver='liblinear’)  
model = lr.fit(X_train, np.ravel(y_train))  

accuracy = model.score(X_test, y_test)  
print ("Accuracy is {}".format(accuracy)) 

 

 

Before you go to the next section ensure you complete classification module: Train and evaluate classification models - Learn | Microsoft Docs 

Clustering Models 

Clustering models help you make sense of chaos and is part of unsupervised learning. In a professional setting, clustering can be used to determine things like market segmentation, determining what age groups buy what items, for example. Another use would be anomaly detection, to detect fraud from a dataset of credit card transactions. Or you might use clustering to determine tumors in a batch of medical scans. You can watch the session on clustering below and find all the resources here. 

Before you go to the next section ensure you complete clustering module: Train and evaluate clustering models - Learn | Microsoft Docs 

 

Time Series Forecasting 

Time series forecasting is a sort of 'crystal ball': based on past performance of a variable such as price, you can predict its future potential value. Using time series, you can predict trends, understand seasons, detect outliers and many more.  

Time series data is a list of ordered observations, unlike data that can be analyzed by linear regression. The most common one is ARIMA, an acronym that stands for "Autoregressive Integrated Moving Average". You can read more on time series here. 

 

Natural Language Processing (NLP) 

Natural Language processing is generally working with text data. Using NLP, you can determine sentiments, how people feel about a particular topic or subject, and determine whether text is spam or not. 

 

Dataset to evaluate: https://aka.ms/30DL-NLPData   

Before you go to the next section ensure you complete deep learning module: Train and evaluate deep learning models - Learn | Microsoft Docs 

Computer Vision on the Cloud 

Last on Machine Learning is how you leverage the cloud and low code tools to build, train and consume data models. In our last session we did. We used custom vision ai and trained our model to tell the difference between humans and horses. After the model was trained, we tested our API and were able to correctly classify an image. 

Watch the on-demand session: https://aka.ms/30DL-MLSumProject   

Access to the required curriculum and content 

 

 

Updated Jul 07, 2022
Version 3.0