Forum Discussion
samuel_kodjoe
Jun 28, 2021Copper Contributor
how do i extract data from a pdf file using power query in excel?
hi team,
Please i would like to find out if there is a way to extract data from a pdf file using power query in excel?
regards!!
16 Replies
Sort By
- Russ_MS_CommmunityCopper Contributor
I have a similar question to read a PDF. The PDF is link to a PDF file share to the public (no login required); however, I not figure out how to get Power Query to read that publicly shared PDF file.
Is there a trick related to this using the same method described or is there another set of instructions associated with reading a PDF file referenced as a URL?
I tried the Power Query get from web, but it seems to think it needs to authenticate to access the file even though authentication is not required.
- LorenzoSilver Contributor
The following works with a PDF stored on a Personal OneDrive:
- From a Web Browser, download the PDF file
- Open the Downloads page of the Web Browser (pic. from MSFT Edge below)
- Copy the Download link:In Excel:
- Data (tab) > From Web > Paste the Download link > OK > ...
If you want to try with the PDF I shared, a Download link is:https://public.am.files.1drv.com/y4mp7lEIREyPqh2W_UW3tzsSdWp6Kpim9Hpj2b2Sp-OcoTenAZ-33ASspv_OnvEkzD3RpLWfT4ftsglbRAfyHuRz5AIZq3FNL8HRX8-n6-0eV7bV1nTQkSrT77dOVHFIwewQYfCE0hNshQGrbt1JW4dVZuYOMEJy0yN4R7_3CweweFjm6wPeZWFYJxC8QSrloB5Fk9lsiU4a2RjqmGAMdxThaztoypmCrUsT8j7kO3H35E?AVOverride=1
- Russ_MS_CommmunityCopper ContributorIt did not seem to resolve the problem.
Going to Data >> From Web >> Pasting in the download link is the process I originally used. However, it interprets the OneDrive address as something that requires an authentication.
It comes back with an error message "We could not authenticate with the credentials authenticated. Please try again." There is no authentication. required to view and download the file, but when you try to put the URL in power query it doesn't seem to understand that.
If you are on Windows, on Excel 365 and not on semi-annual channel you shall have this connector
Perhaps these days availability bit wider, not sure. But in any case that's for subscription model.
- samuel_kodjoeCopper Contributorwow.....so there is no ad ins i can install to access this with 2019 version?
- LorenzoSilver Contributor
Copy/paste the following code in a new blank query using the Advanced Editor
let Source = Record.ToTable( Record.RemoveFields(#shared, Record.FieldNames(#sections[Section1]) ) ), SelectedTypeFunction = Table.SelectRows(Source, each [Value] is function), RemovedValue = Table.SelectColumns(SelectedTypeFunction, {"Name"}), SortedByName = Table.Sort(RemovedValue, {{"Name", Order.Ascending}}) in SortedByName
then load the result of that query on a sheet. If you find Pdf.Tables in column [Name] the function is available in your version of Excel/PQ. Per its name the function is looking for table(s) in the PDF...