Blog Post

Educator Developer Blog
6 MIN READ

Example of using Data Science techniques to analyse popular TV

Lee_Stott's avatar
Lee_Stott
Icon for Microsoft rankMicrosoft
Jun 26, 2019

Guest blog by Amy Boyd Cloud Advocate

Within the UK there has been a huge TV success https://www.itv.com/loveisland over the last 4 series (2016–2019) hundreds of thousands have been watching and Amy and her data science colleagues wanted to see what interesting insights could be found..


In this article, the team wants to share the Data Science approach and technologies used to create the findings in the article Are you Love Island’s “Type on Paper”.


So the key steps to Data Science in the real world is

1. Gathering the Data

2. Using Azure Notebooks to Explore the Data

3. Which Python Packages supported Data Analysis

4. How Power BI brought the Data to Life

5. Enhancing the Visuals with Azure Storage Support

Gathering the Data

 

Each series has a Wikipedia page containing information about the contestants, any show details and the couplings and events such as dates, challenges and exits. we took this data and collated all public information into a excel file, starting with a tab for each series and adding information as we found it.

 

 

I have created this collated dataset from public information and my own viewing of YouTube, news articles, Wikipedia and TV.

 

What could you find in the data? Let us know and contribute to our GitHub repository

 

Using Azure Notebooks to Explore the Data

 

 

I read the data from the CSV files into a Pandas data frame structure, so it was easy to start accessing rows and columns of the matrix given certain values contained in the data.

 

For example, accessing all the winners in the dataset:

winners = data[data[‘OUTCOME’].values == ‘WINNER’]

And building graphs to visually explore similarities and correlations in the data

areaofuk = data['Area of UK'].value_counts()destination = data['From'].value_counts()plt.pie(areaofuk, labels = areaofuk.index)

 

If you want to ‘Get to know’ the code a bit better, check out the full notebook here: https://github.com/amynic/love-island-project/tree/master/code

 

You can create a new project in Azure Notebooks easily by Cloning a GitHub repository, so give this one a try and see what you can find within the historical dataset.

 

As this dataset is hundreds rather than millions of rows of data — I was able to use the Azure Notebooks Free Compute offering to run my Jupyter instance in the cloud. However if I need more powerful compute in the future — for example GPU compute — I can also leverage this by creating a Data Science Virtual Machine in the Azure Cloud and pointing my Jupyter instance towards it.

 

Which Python Packages supported Data Analysis

 

The main Python packages to mention are Pandas and Matplotlib. These packages made loading in datasets, manipulating them and visualising them simpler.

 

For example, loading a simple CSV file into a Pandas Data frame structure for manipulation is an import statement and one line of code:

import pandas as pddata = pd.read_csv('love-island-historical-dataset.csv')

The data frame is a tabular structure with labelled rows and columns. You can access columns/rows using the labels and apply operations to these dimensions. Within Python this is like manipulating a dictionary object. A data frame is a well-used data structure.

 

For more information on manipulating data frames I found this article by Analytics Vidhya (25/06/2019) which lists some good techniques people use to manipulate data

 

The Matplotlib package is very useful. This package is accessible and well documented, with many examples produced by the community and shared online

 

I created Python plots within my Azure Notebook experience and was able to view the distribution of columns in the dataset as well as exploring possible correlations between columns (some showing positive correlation and other hypothesis not showing a string correlation as I may have expected).

age = data[‘Age’]numcouples = data['Number of Couples']plt.figure(figsize=(10,10))plt.scatter(age,numcouples)plt.xlabel("Age")plt.ylabel("Number of couples")plt.title('A graph to show how age relates to Number of couples across the show')plt.show()

Data exploration is a key first step to understand your dataset and be able to analyse and build upon your hypothesis.

How Power BI brought the Data to Life

 

Creating reports and dashboards using data visualisation/business intelligence services allows you to quickly create stories of your data. I was able to share my findings with my team (both technical and non-technical) and allow them to explore the data for themselves by selecting graphs to filter them and asking natural language questions of the dataset behind the visuals.

 
 

 

Handy Tip -> Who uses what tool?

 [End users, Technical and Non-Technical] Power BI Service, access via a web browser. View and explore reports

 [Report creator, Technical] Power BI Desktop tool, Build data models and reports/visuals that tell stories. Download for Free

Every data visualisation you create, you want it to match the theme or style of the project/brand/product you’re working on. You can do this in Power BI by creating a Power BI Theme file. This is a simple JSON file I created using Visual Studio Code. I added into the JSON schema a list of colour Hex codes that related to the bright summer colours within love island images.

 

 

I have not finished building great visuals in Power BI yet, there are so many different things you can do within the tooling. Do you want to ‘get to know’ the tooling better I recommend looking at:

Enhancing the Visuals with Azure Storage Support

 

We accessed both datasets and accompanying images within an Azure Blob Storage account. Azure Blob storage is great for storing unstructured data objects which you need access to. You can setup a storage account using the Azure Portal UI(user interface) or via command line.

 

Other than just storing the data, many Microsoft and other services may have an Azure Blob Storage connector meaning you can access your data within those tools or via a REST request (API). I was able to access contestant images to enhance my data visualisation reports within Power BI

 

First, I created a new column in the dataset. I pointed the column calculation at the Azure Blob Storage account and appended the contestant name from another column in the dataset. Finally, I then set the data type in Power BI to Image URL, so the service knows to render the image

 
 

 

Handy Tip -> Azure Storage Explorer

• I use the Azure Storage Explorer tool as an easy way to upload, download and manage my storage accounts and data sources within the cloud

• Available on Windows, MacOS and Linux

Where is your head at?

 

Want to learn more about the technologies mentioned above?

Updated Jun 26, 2019
Version 3.0
No CommentsBe the first to comment