Power Query get data from pdf

Question

Latest version of Excel via Office 365 on WINDOWS.

I have a Pdf that has a single table split into dozens of tables by page breaks.

This is the first time I have used Get Data and Power Query. I selected the Pdf file and then the pages I wanted. As it is a single data table over several pages, the columns on each page are the same. However, individual tables may have columns where no data is present. Transform or Load are removing these columns when empty so different page loads have different numbers if columns, depending on whether that column has no values on a particular page. This means I cannot merge tables into one Excel worksheet. Hiw do I ensure all columns are represented on every table?

As mentioned earlier, I am new to Power Query so please do not assume I have detailed knowledge of this functionality.

Cheers

sergeibaklan · Answer

Stain555&nbsp;
Nobody have detailed knowledge, if only&nbsp;Guy Hunkin&nbsp;. In general&nbsp;Pdf.Tables - PowerQuery M | Microsoft Docs&nbsp;has an option to combine tables from different pages which shall work by default

However, I assume that's only if you have headers for both parts of the table. If your parts returned as

and

you may append one to another to another to receive resulting table. Yes, that's required some manual work.

stain555 · Answer

I actually discovered, by saving the Pdf as a word document, that tables had been saved in the pdf with tabs that bypassed null columns, so, although there were column headers, Power Query was always going to build the columns from actual data. So I need to manipulate extracted data myself to insert columns where missing. That's life.

Forum Discussion

Power Query get data from pdf

2 Replies