What causes this process of extracting webpage data run so slow?

Copper Contributor

Hello my dear life savers! :rose:

 

I've created an Excel file with a macro which works as I want, but for processes on larger data it performs slow (or not as fast as I need so I must give up), and I need your help to find out if this is normal or not and if it's just a system limit or the nature of this operation is time consuming. And if it's not natural, how can I improve it?

 

The macro, opens a webpage including results as separate items, downloads the HTML document then navigates through the results, extracts each item and it's corresponding pieces of data and inserts them into a table.

 

I'm not an expert on VBA or Excel but the areas of code I suppose might have the potential to cause problem are:

1- Memory issues due to opening objects for IE. I've read that you must clear the objects at the end and I've tried to do so. Does this make the process slow if not addressed properly?

2- The way I extract data: Now I extract data for an item then insert that item into the cells, then extract the next item and insert it's data into the cells and so on. Does it make a difference to extract an individual item and insert it into cells then go to the next item OR extract all items, assign them all into an array, then insert items from array into cells at once?

 

The maximum number of items on each page is 25 but there can be more than one page and the macro loops through the available pages. If the number of available items is about 100 it's not a big deal, but sometimes there's around thousands of items and I don't believe running at such slow speed is the power of the CPU! And I don't think the time needed to download the document is problematic because as I monitor the data insertion into the cells, after a page is loaded and the macro starts inserting the data into the cells it moves slowly from one cell to the next. So I suppose low speed is in the process of finding elements and inserting their data into the cells not getting the document itself.

 

The code is rather long so I'm not sure if it's appropriate to post it here. But if you can do me a favor and check the code please tell me to post the code for you.

 

If this data extraction can run fast It really streamlines one of my tasks at my work.

I'm very grateful for your time and help. :folded_hands::folded_hands::folded_hands::folded_hands:

4 Replies

@VahidPouyafar Perhaps you are trying to re-invent the wheel. Excel has a very powerful tool built-in called PowerQuery. Among many sources, it can connect to web pages though it depends on how these are built. Which page are you connecting to?

Hello, but I don't think so @Riny_van_Eekelen. I searched on the internet and came to this solution. The website is an an online directory you search for businesses. The macro takes in a query string, generated a URL using that (to be able to get the whole results you need to be logged in). Then the page provides results (with pagination). I'm not just trying to open a webpage and download a simple table. I need to get data from each result some of which are in hidden menus (i.e. you need to click on a button) so I need to look for particular HTML elements in each shown result's data structure and get their innerHTML. Now it works and it's only a matter of speed. Do you think the powerQuery would help?

@VahidPouyafar Can't tell? Which site? Can you give an example of a search string and the end result it should produce?

Hi @Riny_van_Eekelen I'm really thankful for your attention. I searched for power query and while trying to implement the given example for my own case I realized the Web.Page function of power query doesn't get all elements of the page and some ones are missing and this is a weakness of this function. Web.BrowserContents can do better but it's not supported in Excel but Power BI. But I'm trying to accomplish this in Excel. So that would be so better if I could increase the current extraction speed in Excel. The website is an online directory you can search for businesses. I've uploaded a short video so you can see what I mean by being slow! Each row of the table in the video represents a single result.