A beginners guide to using Jupyter Notebooks on the Microsoft Azure Notebooks services to visualise data
By T.T. Ouzounellis Kavlakonis Microsoft Student Partner at University of Cambridge
About Me:
Hello planet Earth! Welcome to my first blog post. My name is Theo and I am a first year Engineering student at Trinity College, University of Cambridge. I was always very curious about how things work or how things are made and as such I always like tinkering with stuff. I love learning about new technologies and experimenting with new ideas. I am always up for a fun challenge as such I very much enjoy going to a lot of hackathons!
A bit of a background
In today’s blog I will be sharing my experience of using Azure/Jupyter notebooks to visualize climate data.
But what exactly is a Jupyter notebook?
“A Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text”. In practice what this allows is to be able to run code online and at the same time have normal text and document type attachments alongside it. This is very convenient for all types of enthusiasts and professionals alike.
Excited to dive into it? Let’s go!
Do you need it needs an expert to go through all this? Well let’s just say that in less than 45’ you can become an expert yourself.
1. First things first: Setting up:All we need to set up is a Microsoft account so if you don not have one please create one to proceed. You can do so at: https://account.microsoft.com/account .
Having that set up all you need to do now is to log into Azure Notebooks. https://notebooks.azure.com . It is also assumed that you have downloaded the resource files which can be found in the tutorial page.
2. The lab experience: Put on your lab coats and get to work!:
2.1: Setting up the notebook:
The lab in itself was fairly short and self-explanatory in its instructions so the main focus will be on the more interesting, discussion-friendly bits.
After we set up our Notebook as instructed we select our programming language as Python 3.5.
Python is what we call a high-level programming language and its main advantage is that it has quite a natural syntax writing (the equivalent of grammar in programming). Hence this makes it quite easy to understand especially for someone with no programming experience.
2.2: The code:
As in everything in real life when we want to build something, we need our toolbox. In programming those our called libraries. Libraries provide certain functions (our tools if you like) in order to make the process of achieving our goal easier. As such we don’t have to reinvent the wheel!
Hence we start of by importing the libraries we will be using in this laboratory.
These are:
· Matplotlib.pyplot : used to create axis and in general plot data in different formats and curve fitting
· Numpy: numpy is a very standard library used in python. It is used for numerically manipulating large sets of number. It is very efficient and thus preferred
· Sklearn: This library provides simple tools for data mining and data analysis via the use of machine learning.
· Seaborn: This is based upon matplotlib and it provides a high-level interface for drawing attractive statistical graphics.
After importing the .csv files we proceed on creating a scatter plot of the data using the simple functions provided by the matplotlib library. The result?
Now that we have our scatter plot we need to analyse the data. A very useful property used extensively for predictive modelling is that of curve fitting. This is a method by which we try and identify a trend matching a specific profile. In this lab we are implementing a linear regression fit in three different ways.
Method 1: Matplotlib
This method uses the matplotlib library to fit a line on the scatter plot data. This is done by first calling the polyfit() function which fits a polynomial curve (here is 1 as specified in the parameter) to the data. Also, in order to draw the line seen on the plot we do have to declare a separate “line function”.
Hence, we can see how matplotlib data fitting can be used in generally simple cases rather than very complex ones. It is still a very powerful tool to use though.
Method 2: Scikit-learn:
The advantage of scikit-learn is that it contains different models of data fitting which are usually convenient to use. This is done using the power of Machine Learning which is built into the library itself.In this example since we wanted a linear fit we imported just the linear model. As such the code is pretty simple comprising of less function calls than the matplotlib example. As such one can see how scikit-learn can be applied to data analysis more universally and more easily than matplotlib concerning more complex case. As it was expected the result is exactly the same as the previous one.
Method 3: Seaborn
Seaborn is an open source library specific to statistical visualization. Hence Seaborn provides very high performance as conveniently as possible. This can easily be seen from the little amount of code needed to do the same thing as on the previous two methods. A projection area was also added to show where points are expected to lie around this linear regression.
2.3 PresentationAn important aspect of the Microsoft Azure environment and the Microsoft philosophy is the aspect of collaboration. In today’s fast changing world collaboration helps us move forward. In order to reflect that Azure Notebooks has the functionality of converting your notebook to an easily shareable link or a presentation if needed. This is done very easily by clicking at the following:
And then setting each cell to be either a Fragment (Code) or a Slide (Text). Clicking slideshow after having done that will start a slide show of your notebook. This is very convenient for professional presentation which involved the use of such technologies.
2.4: Sharing is CaringAdditionally, you can also as mentioned share the notebook with your peers and/or collaborators by clicking the share button at the notebooks page.
3. The experience/reflection
This lab exercise was a very nice experience as it highlights the core principles we all as software enthusiasts and hobbyists share:
Principle 1: There are always many different approaches and solutions to each problem. This was demonstrated by all the different methods that we used to create the line regression for the scatter plot.
This is a core principle as it shows that different people come up with different solutions and it the goal is for each one to improve and discuss with one another to find the best one.
Principle 2: Leading from the last principle in order to achieve the goal of finding the best solution there is a need for people to collaborate and exchange their ideas with their peers. Different people with different experiences think about a problem in the different way and contribute in their own unique way towards the solution. As such collaboration and teamwork are very crucial values for todays professional environment. Microsoft has been committed to that idea and encourages collaboration and the exchange of ideas through its ease of sharing as demonstrated in the laboratory with Azure Notebooks.
Finally, I hope that the skills cultivated in this lab gave you the eagerness to explore more in the world of machine learning and Python programming as the applications are endless. As such here are some interesting projects to undergo:
Project 1:Create a model to predict the trend of a stock price over a month and compare the performance of different models.
Project 2:Create a Jupyter Notebook application to investigate how population is changing over the years in developed v developing countries.
Got a project idea of your own? Definitely do that! Do not feel restricted to those projects pursue your curiosity and embrace it. Please feel free to share with the rest of the community!
4. Conclusion
I hope that you fellow life-learner have finished this blog feeling empowered and ready to take a new challenge even harder than the one just completed. As such I would like to say a very big thank you for reading my blog and I really hope to see you back on my next one!
This is the link to my completed library: https://notebooks.azure.com/theo8299/libraries/notebook-hol
Lots of engineering love,
Theo
5. Bibliography:
[1] ^ "Web App Service - Microsoft Azure" . Microsoft.
[2] "Microsoft Azure Machine Learning combines power of comprehensive machine learning with benefits of cloud" . blogs.microsoft.com. June 16, 2014.
[3] https://seaborn.pydata.org/introduction.html