Forum Discussion

Keith Farmery's avatar
Keith Farmery
Copper Contributor
May 12, 2023
Solved

Power Query and PDF files

Hi

 

I receive financial accounts information, Nominal Ledger Activity Report by individual account code, from clients in PDF format, which can run to 100's of pages, example below:

 

 

I use Power Query to load elements into Excel for analysis - which works great when the block of information being loaded is on a single sheet on the PDF file, but when the block of information I need runs onto several pages, I end up with multiple worksheets (as PQ is treating each page as a separate query) in Excel when I've loaded it.

 

I've tried to transform the data, by appending the data together - issues with column headings missing after the first page.  Also, after appending, when I close and load, I cannot just load the appended query, it loads everything into multiple Excel worksheets - all the individual pages of the PDF.

 

Any help would be gratefully received.

 

Thanks.

  • Hi Keith Farmery 

    ...when the block of information I need runs onto several pages, I end up with multiple worksheets (as PQ is treating each page as a separate query) in Excel when I've loaded it

    If you get n queries it's probably because you select multiple Tables/Pages in the Navigator, ex.:

    Instead, you should uncheck Select multiple items > Click on the file name (what I squared in green), then at the bottom of the Navigator click Transform Data. You then get a single query with all the Tables/Pages found in the PDF, ex.:

    From there Combine & Transform as you expect

     

    Re. issues with column headings missing after the first page 

    This is +/- expected, this mainly depends on how the PDF is structured. In principal that's manageable...

     

    Difficult to say more without an actual PDF sample. Hope this helps a bit

2 Replies

  • Lorenzo's avatar
    Lorenzo
    Silver Contributor

    Hi Keith Farmery 

    ...when the block of information I need runs onto several pages, I end up with multiple worksheets (as PQ is treating each page as a separate query) in Excel when I've loaded it

    If you get n queries it's probably because you select multiple Tables/Pages in the Navigator, ex.:

    Instead, you should uncheck Select multiple items > Click on the file name (what I squared in green), then at the bottom of the Navigator click Transform Data. You then get a single query with all the Tables/Pages found in the PDF, ex.:

    From there Combine & Transform as you expect

     

    Re. issues with column headings missing after the first page 

    This is +/- expected, this mainly depends on how the PDF is structured. In principal that's manageable...

     

    Difficult to say more without an actual PDF sample. Hope this helps a bit

    • Keith Farmery's avatar
      Keith Farmery
      Copper Contributor
      Hi L z.

      Thank you this, I'll try you suggestion - Thanks again.

Resources