Jun 28 2021 12:42 PM
hi team,
Please i would like to find out if there is a way to extract data from a pdf file using power query in excel?
regards!!
Jun 28 2021 01:39 PM
If you are on Windows, on Excel 365 and not on semi-annual channel you shall have this connector
Perhaps these days availability bit wider, not sure. But in any case that's for subscription model.
Jun 29 2021 12:53 AM
Jun 29 2021 01:35 AM
Copy/paste the following code in a new blank query using the Advanced Editor
let
Source = Record.ToTable(
Record.RemoveFields(#shared,
Record.FieldNames(#sections[Section1])
)
),
SelectedTypeFunction = Table.SelectRows(Source, each [Value] is function),
RemovedValue = Table.SelectColumns(SelectedTypeFunction, {"Name"}),
SortedByName = Table.Sort(RemovedValue, {{"Name", Order.Ascending}})
in
SortedByName
then load the result of that query on a sheet. If you find Pdf.Tables in column [Name] the function is available in your version of Excel/PQ. Per its name the function is looking for table(s) in the PDF...
Jun 29 2021 03:07 AM
Jun 29 2021 04:03 AM
Given you don't have the corresponding wizard in Excel user interface, you have to code it yourself. You start with a new blank query and in the formula bar, something like:
= Pdf.Tables(File.Contents("FolderPath\Example.pdf"), [Implementation="1.2"])
Then, assuming the function finds a Table in your PDF, click on it in the [Data] column:
NB: re. [Implementation=x.y] the Pdf.Tables doc. says:
Jun 29 2021 06:44 AM
please this is what i get trying to implement for 1.2. the other values are giving me invalid answers
Jun 29 2021 07:18 AM
Jun 29 2021 02:15 PM
Small comment, 1.3 is available now. Not sure about exact PQ version.
Jun 29 2021 11:47 PM
Jun 30 2021 02:19 PM
Yes, that's PQ 2.95.223.0 on insiders channel.
Feb 22 2023 02:51 PM
I have a similar question to read a PDF. The PDF is link to a PDF file share to the public (no login required); however, I not figure out how to get Power Query to read that publicly shared PDF file.
Is there a trick related to this using the same method described or is there another set of instructions associated with reading a PDF file referenced as a URL?
I tried the Power Query get from web, but it seems to think it needs to authenticate to access the file even though authentication is not required.
Feb 23 2023 05:07 AM - edited Feb 25 2023 02:14 AM
The following works with a PDF stored on a Personal OneDrive:
- From a Web Browser, download the PDF file
- Open the Downloads page of the Web Browser (pic. from MSFT Edge below)
- Copy the Download link:
In Excel:
- Data (tab) > From Web > Paste the Download link > OK > ...
If you want to try with the PDF I shared, a Download link is:
https://public.am.files.1drv.com/y4mp7lEIREyPqh2W_UW3tzsSdWp6Kpim9Hpj2b2Sp-OcoTenAZ-33ASspv_OnvEkzD3RpLWfT4ftsglbRAfyHuRz5AIZq3FNL8HRX8-n6-0eV7bV1nTQkSrT77dOVHFIwewQYfCE0hNshQGrbt1JW4dVZuYOMEJy0yN4R7_3CweweFjm6wPeZWFYJxC8QSrloB5Fk9lsiU4a2RjqmGAMdxThaztoypmCrUsT8j7kO3H35E?AVOverride=1
Feb 24 2023 03:21 PM
Feb 25 2023 03:01 AM
When OneDrive isn't unresponsive this works no problem here. The attached sample query contains a download link and the expected dataset is returned to Excel
After putting in place the query, if I clear to Permissions to OneDrive (the one highlighted below that matches the download link - https://public.am.files.1drv.com...):
of course, I'm asked to Authenticate (even as Anonymous) the next time I refresh the query
IMHO you should have a look at your Data Source Settings and in case this doesn't help provide more information regarding your context, i.e. where the PDF is actually stored (ex. OneDrive Business, SharePoint...)
Mar 30 2023 08:38 AM
Providing no feedback, good or bad, doesn't help people who Search (you did - thanks)
Mar 30 2023 08:54 AM
@L z. I have not been able to get the above solution proposed to work successfully. In every attempt on multiple computers and multiple networks it is still seeking a login.