Forum Discussion

samuel_kodjoe's avatar
samuel_kodjoe
Copper Contributor
Jun 28, 2021

how do i extract data from a pdf file using power query in excel?

hi team,

 

Please i would like to find out if there is a way to extract data from a pdf file using power query in excel?

 

regards!!

16 Replies

  • I have a similar question to read a PDF.  The PDF is link to a PDF file share to the public (no login required); however, I not figure out how to get Power Query to read that publicly shared PDF file.

     

    Is there a trick related to this using the same method described or is there another set of instructions associated with reading a PDF file referenced as a URL?

     

    I tried the Power Query get from web, but it seems to think it needs to authenticate to access the file even though authentication is not required.

    • Lorenzo's avatar
      Lorenzo
      Silver Contributor

      Russ_MS_Commmunity 

      The following works with a PDF stored on a Personal OneDrive:

      - From a Web Browser, download the PDF file
      - Open the Downloads page of the Web Browser (pic. from MSFT Edge below)
      - Copy the Download link:

      In Excel:
      - Data (tab) > From Web > Paste the Download link > OK > ...


      If you want to try with the PDF I shared, a Download link is:

       

       

      https://public.am.files.1drv.com/y4mp7lEIREyPqh2W_UW3tzsSdWp6Kpim9Hpj2b2Sp-OcoTenAZ-33ASspv_OnvEkzD3RpLWfT4ftsglbRAfyHuRz5AIZq3FNL8HRX8-n6-0eV7bV1nTQkSrT77dOVHFIwewQYfCE0hNshQGrbt1JW4dVZuYOMEJy0yN4R7_3CweweFjm6wPeZWFYJxC8QSrloB5Fk9lsiU4a2RjqmGAMdxThaztoypmCrUsT8j7kO3H35E?AVOverride=1
      • Russ_MS_Commmunity's avatar
        Russ_MS_Commmunity
        Copper Contributor
        It did not seem to resolve the problem.

        Going to Data >> From Web >> Pasting in the download link is the process I originally used. However, it interprets the OneDrive address as something that requires an authentication.

        It comes back with an error message "We could not authenticate with the credentials authenticated. Please try again." There is no authentication. required to view and download the file, but when you try to put the URL in power query it doesn't seem to understand that.
  • samuel_kodjoe 

    If you are on Windows, on Excel 365 and not on semi-annual channel you shall have this connector

    Perhaps these days availability bit wider, not sure. But in any case that's for subscription model.

    • samuel_kodjoe's avatar
      samuel_kodjoe
      Copper Contributor
      wow.....so there is no ad ins i can install to access this with 2019 version?
      • Lorenzo's avatar
        Lorenzo
        Silver Contributor

        samuel_kodjoe 

        Copy/paste the following code in a new blank query using the Advanced Editor

        let
            Source = Record.ToTable(
                Record.RemoveFields(#shared,
                    Record.FieldNames(#sections[Section1])
                )
            ),
            SelectedTypeFunction = Table.SelectRows(Source, each [Value] is function),
            RemovedValue = Table.SelectColumns(SelectedTypeFunction, {"Name"}),
            SortedByName = Table.Sort(RemovedValue, {{"Name", Order.Ascending}})
        in
            SortedByName

         

        then load the result of that query on a sheet. If you find Pdf.Tables in column [Name] the function is available in your version of Excel/PQ. Per its name the function is looking for table(s) in the PDF...

Resources