By following the step-by-step guide in this article, you will be able to build the stunning dashboard as can be seen in the above image.
- The Workflow
- Data Gathering and Transformation
- Sentiment Analysis
- Data Modelling in Power BI
- Data Analysis
As a data analyst, there will be scenarios where your data will come from secondary sources, e.g. social media data (Twitter). Data like these are gotten through web scraping. A simple use case here; A business is interested in understanding their customer perception and emotion about their brand based on their activities on twitter. To get the data for the analysis, you have to find a way to scrape this data first, clean it, analyze it, and then use a visualization tool to present it to the business.
This project is a collaboration between myself and Flora Oladipupo (@Flora_Oladipupo). We are both Beta Microsoft Learn Student Ambassadors. This article contains embedded links that will lead to Part 1 of this work (Scraping and transforming the twitter data using Python).
This analysis is not meant for the prediction of the outcome of the Nigeria 2023 president but just to show details and sentiments surrounding the election based solely on people’s tweets about the top 3 presidential candidates.
Identifying the process and steps taken is very important in each stage to make a meaningful, useful piece from the report and which brings about an interactive insight.
The Documentation includes:
- Data Gathering and Transformation
- Data Modelling
Data Gathering and Transformation
This was accomplished by my collaboration partner @Flora_Oladipupo .
Check out this article https://aka.ms/twitterdataanalysispart1 for full guide on how to scrape and clean the twitter data using python. You will also learn how to export the data to Excel/CSV for visualization.
The cleaned data has 58,633 records. The data includes Date, ID, URL, Username, Location, Tweet, source, number of likes, Number of Retweet, processed tweets, Sentiment, Names of candidates (Peter obi, Tinubu and Atiku), Names of party (PDP, APC and LP) which will allow me to properly visualize and dimension the data.
Because the data has been pre-processed with python, I only had to remove duplicate tweets, create some additional columns, and connect the data to date tables.
This was already done by my partner using the TextBlob library with Python @Flora_Oladipupo. check it out here https://aka.ms/twitterdataanalysispart1. From the dataset received there's a column called sentiment which has the types of sentiment expressed in people's tweet i.e., Positive, negative, and neutral. Positive represents a good sentiment, negative represent a bad sentiment while neutral indicates no interest. A donut chart was plotted to represent the sentiment analysis.
Watch this video to import the data into Power BI Desktop
Summary of Transformation Carried as can be seen in the Video
After extraction of the data set, so much work was done on removing the outliers, duplicates for proper visualizations and insights. Here are some of the steps taken to get a good result and insight.
- Removing the blank spaces found in location and renaming them to Unknown: A total number of 16603 of empty column was found in the location row and this was replaced with Unknown since the users didn’t provide their location and this was done by
- Extracting a new column: A new column was extracting through the existing column and the column extracted include
- Year and day: This were done by creating a new column and using the available Date column that was provided initially to extract the year and the day.
- Month: month was also extracted from the date column, but it was provided in numbers after extractions. A new column was created to rename the month number to their name, and this was done by conditional formatting.
Steps taking in the Video
STEP 1: Using the steps used above, same step was also used for month. After this was done, the month needs to be renamed and this was done using conditional column , shown below.
STEP 2: The if statement was used to rename the month number and this was how it was represented above.
STEP 3: Shows the outcome of the conditional formatting represented above.
- Time: time was also extracted from the date column by creating a new column
After data was transformed into the Power BI desktop, designs that would be needed for visualizations was downloaded. A new date measure was created in other to build a relationship with data set that was available. After creating a relationship, proceed into building my dashboard. Green and white is the Nigeria color, that was why it was used on the dashboard.
Total number of likes, retweet, tweets and the total amount of tweet Labor Party, PDP, and APC appeared in a tweet were all visualized using cards.
Steps taken for data modelling are as follows:
Step one : Create a new Table in the data view
Step Two: Rename the title and write out your formula( Date = calendarauto) . It automatoically fills the date for you that is the start date and the end date.
Step Three: Enter the formula and here is the result below
Step Four: creating a model and relationship. the model looks like this
Visualizing the Data in Power BI
below are steps taken to visualize the data
Tweet by Month
Results of the Analysis
A visualization showing the top 3 tweet based on source and the result shows that a percentage of 69.68% tweet are from Android users, follow by iPhone users with a percentage of 20.24% while twitter for web has a percentage of 10.09%.
Most talk about words: this visualization was used to show the most talk about tweet Peter obi and labor party has the highest number of words and stop words was used to remove some words to make words visible
Also, a visual on the highest number of candidate that was talked about peter obi has the highest number with a total number of 67k while Atiku has a total number of 7k follow by Tinubu with a total number of 2k.
An Analysis on tweet by time was also done to know the time most users tweet the most, the result shows that most users tweet by 3pm in the afternoon, 9pm in the evening and 8am in the morning.
Visualization on the Top 5 location where the tweet is coming from: Location Unknown has the highest number but was excluded because there was no precise location. Lagos has the highest number follow by people what filled Nigeria as their location, but it was excluded also, Abuja was the second follow by port Harcourt and then United Kingdom which was also excluded, Enugu was the next location and Ibadan was the last location.
Based on the result from the sentiment analysis, people are encouraged to talk more positively about the election, and they should not see it as something they are indifferent about since the election will impact them. The sentiment Analysis shows that 51.95% Tweets were positive, 20.69% Tweets were Neutral and 17.35% were Negative.
This tutorial shows the impact of sentiment analysis in politics. Although the use case extends beyond politics it can be applied in businesses to determine customer sentiments based on their review thereby letting the business owner know how their business is perceived by customers. Using the right tool to analyze sentiment is also as important as getting the intended result and you can't miss it when you combine Python with Power BI to accomplish that.