Have you ever wanted to scrape data from a website and create a dashboard with the data? Maybe you want to get insights from the metrics? Or maybe you just want to make pretty visuals?
Well, so did we :D!
We wanted to track metrics from the Healthcare and Life Sciences blog . Ideally, we wanted a dashboard showing at least views but possibly other KPIs. And what do you know! Each blog conveniently shows how many views we have per blog and we wanted to capitalize on that feature.
So Write Code?
Initially we thought about screen scrape-ing, using open source screen scraping libraries. The problems with this approach:
- Time to result: we probably wouldn’t get to our end goal (a pretty dashboard) in a day.
- Liability: we would be stuck maintaining code until we didn’t need the dashboard
- Hosting: even if we created a SPA we’d still need to host it, and share links etc
- Authorization: we expected we would come to a point where we could not make all the data we were collecting public- so we’d need to add Authorization
With all these problems even finding a library to start from seemed like a daunting task.
Hello, Power BI!
This is where Power BI (desktop) comes in. It has built in intelligence to scrape a web page and it tries to guess different tablature data that exist on the page. (And it does a pretty decent job at it, too!)
This is our final pie chart- showing views per author, filtered for the last 45 days.
It will take you less than 10 minutes from here to get to this chart. Shall we get started?
Let's start the magic:
First open Power BI. Click ‘GetData’. Search ‘Web’ in the modal dialog and click ‘Connect’:
Enter in the URL you want Power BI to scrape:
Since this is a public site, select ‘Anonymous’
This will now show all the possible tables that exist on the page:
For our example, we will select Table 1. (This table includes the information we need, such as, author, views, post date and etc.). Check the checkbox next to Table 1 (or relevant table you need) and click ‘Load’
Now, it’s all just aesthetics and grabbing and filtering on the correct fields.
Beautifying our data
Let’s go and change the column names for our sanity :D. As of now, it’s just Column 1, 2, 3, and etc.
Click on the second icon (one that looks like a table) and once the data loads, right click on the columns and ‘rename’:
Go back to the dashboard view:
Choose your preferred visual, then fields:
Oh, you want more?
You can use FILTERS to hone in and get more detailed metrics (ie. Our final result). We dragged the Post Date column to ‘Filter on this page’ section. We then filtered the chart to only show view counts in the last 45 days.
Note: As you might have figured, the notable limitation to this ‘cheat’ is how much data the website actually offers for the viewers. For our usage, it was a perfect use case to leverage an existing feature on an existing solution without having to reinvent the wheel :smiling_face_with_smiling_eyes:. We still believe this is a great trick to keep it in your sleeves as there are plethora of sites where you can extract views, likes, dates, and descriptions right from the page!
Thanks for reading :)