How can I pull ratings data from IMDB and place in my excel spreadsheet please?

Copper Contributor

Hi everyone,

 

I have an Excel spreadsheet of all my films I own and I wanted to create extra columns for year and IMDB rating. 

 

Is there a quick (and easy) way of scraping the year and rating from IMDB and placing it in my spreadsheet without the need to visit each film and manually type it in. 

 

The rating is the main goal here, but having the year also would be a bonus. 


I am not that knowledgeable about Xcel. I know the basics well but am not too advanced so it would need to be relatively easy to do. i.e. through the excel menus. 

 

Thanks very much.

5 Replies

@JANNER66 

There are some different approaches you can try.

  1. Web Scraping (Not Recommended)
  2. API (Probably there is some free API available)
  3. Download the database straight from the IMDB website.

For the last option, that I recommend you just need to import it to your excel file.

Information about the data format and types (https://datasets.imdbws.com/)

Datasets https://datasets.imdbws.com/

 

 

 

Thanks Luthius@Juliano-Petrukio 

 

I had actually found the databases and I downloaded the 

name.basics.tsv.gz

title.basics.tsv.gz

title.ratings.tsv.gz

 

I'm not really sure how to import these files into the spreadsheet in a meaningful way. 

 

I went to DATA - FROM WEB and then located these files. It just populated some random words here and there though. 

I can't see how else you would import these files. 

@JANNER66 

.gz files are UNIX archives, first you shall extract files from such archives, on Windows using 7zip or like utility.

Within it will be .tsv files (tab separated values) which you could import into Excel with Power Query or by legacy From text utility using tab as separator (in general it shall be recognized automatically).

Hello all~
did anyone ever figure out how to achieve this task? I found a page on the IMDB website that will create an HTML Plugin for any movie on said site. Does anybody know how to use an HTML Plugin from within Excel???
Thanks~ Joey

@JetsetJoey69 

I was able to do this with Access VBA. Excel VBA should be similar. No plugins are needed, only built-in VBA libraries and functions. The technique I used was web-scraping. Downloading the entire IMDb database like the other user did won't work because (1) IMDb data keep changing, and (2) IMDb DB is huge (gigabytes), so it is better to just get the info you need.

 

STEP ONE: Declare the URLDownloadToFile function in VBA as shown here.  Use it to download the source page of an IMDb title *reference* page (e.g. https://www.imdb.com/title/tt0102926/reference ) to a text file.

 

STEP TWO: Configure the text file for utf-8 encoding with the VBA code below.  Utf-8 is needed for handling many non-English language text (if you deal with a lot of foreign films), otherwise you may get gibberish text in your data.

 

Set fs = CreateObject("ADODB.Stream")
fs.Charset = "utf-8"
fs.Open
fs.LoadFromFile "C:\MyFolder\DownloadedTextFile.txt"
MyText = fs.ReadText()

 

 

STEP THREE: This is the actual web-scraping work.  Write a procedure that reads the text file and looks for HTML tags that contain IMDb data.  For instance, to look for the IMDb rating, look for this HTML tag:

 

<span class="ipl-rating-star__rating">8.6</span>

You can use the InStr function or whatever to read what is between the <span> tag and the </span> tag.  And that is how you find the rating: 8.6.

 

This method can be used to obtain just about any data on IMDb: title, year, country, run time, cast & crew, URL of title picture, etc.