SOLVED

Power Query and PDF files

Copper Contributor

Hi

 

I receive financial accounts information, Nominal Ledger Activity Report by individual account code, from clients in PDF format, which can run to 100's of pages, example below:

 

NL Doc Sample.jpg

 

I use Power Query to load elements into Excel for analysis - which works great when the block of information being loaded is on a single sheet on the PDF file, but when the block of information I need runs onto several pages, I end up with multiple worksheets (as PQ is treating each page as a separate query) in Excel when I've loaded it.

 

I've tried to transform the data, by appending the data together - issues with column headings missing after the first page.  Also, after appending, when I close and load, I cannot just load the appended query, it loads everything into multiple Excel worksheets - all the individual pages of the PDF.

 

Any help would be gratefully received.

 

Thanks.

2 Replies
best response confirmed by Keith Farmery (Copper Contributor)
Solution

Hi @Keith Farmery 

...when the block of information I need runs onto several pages, I end up with multiple worksheets (as PQ is treating each page as a separate query) in Excel when I've loaded it

If you get n queries it's probably because you select multiple Tables/Pages in the Navigator, ex.:

Sample.png

Instead, you should uncheck Select multiple items > Click on the file name (what I squared in green), then at the bottom of the Navigator click Transform Data. You then get a single query with all the Tables/Pages found in the PDF, ex.:

Sample2.png

From there Combine & Transform as you expect

 

Re. issues with column headings missing after the first page 

This is +/- expected, this mainly depends on how the PDF is structured. In principal that's manageable...

 

Difficult to say more without an actual PDF sample. Hope this helps a bit

Hi L z.

Thank you this, I'll try you suggestion - Thanks again.
1 best response

Accepted Solutions
best response confirmed by Keith Farmery (Copper Contributor)
Solution

Hi @Keith Farmery 

...when the block of information I need runs onto several pages, I end up with multiple worksheets (as PQ is treating each page as a separate query) in Excel when I've loaded it

If you get n queries it's probably because you select multiple Tables/Pages in the Navigator, ex.:

Sample.png

Instead, you should uncheck Select multiple items > Click on the file name (what I squared in green), then at the bottom of the Navigator click Transform Data. You then get a single query with all the Tables/Pages found in the PDF, ex.:

Sample2.png

From there Combine & Transform as you expect

 

Re. issues with column headings missing after the first page 

This is +/- expected, this mainly depends on how the PDF is structured. In principal that's manageable...

 

Difficult to say more without an actual PDF sample. Hope this helps a bit

View solution in original post