Jun 28 2021 12:42 PM
hi team,
Please i would like to find out if there is a way to extract data from a pdf file using power query in excel?
regards!!
Jun 28 2021 01:39 PM
If you are on Windows, on Excel 365 and not on semi-annual channel you shall have this connector
Perhaps these days availability bit wider, not sure. But in any case that's for subscription model.
Jun 29 2021 12:53 AM
Jun 29 2021 01:35 AM
Copy/paste the following code in a new blank query using the Advanced Editor
let
Source = Record.ToTable(
Record.RemoveFields(#shared,
Record.FieldNames(#sections[Section1])
)
),
SelectedTypeFunction = Table.SelectRows(Source, each [Value] is function),
RemovedValue = Table.SelectColumns(SelectedTypeFunction, {"Name"}),
SortedByName = Table.Sort(RemovedValue, {{"Name", Order.Ascending}})
in
SortedByName
then load the result of that query on a sheet. If you find Pdf.Tables in column [Name] the function is available in your version of Excel/PQ. Per its name the function is looking for table(s) in the PDF...
Jun 29 2021 03:07 AM
Jun 29 2021 04:03 AM
Given you don't have the corresponding wizard in Excel user interface, you have to code it yourself. You start with a new blank query and in the formula bar, something like:
= Pdf.Tables(File.Contents("FolderPath\Example.pdf"), [Implementation="1.2"])
Then, assuming the function finds a Table in your PDF, click on it in the [Data] column:
NB: re. [Implementation=x.y] the Pdf.Tables doc. says:
Jun 29 2021 06:44 AM
please this is what i get trying to implement for 1.2. the other values are giving me invalid answers
Jun 29 2021 07:18 AM
Jun 29 2021 02:15 PM
Small comment, 1.3 is available now. Not sure about exact PQ version.
Jun 29 2021 11:47 PM
Jun 30 2021 02:19 PM
Yes, that's PQ 2.95.223.0 on insiders channel.
Feb 22 2023 02:51 PM
I have a similar question to read a PDF. The PDF is link to a PDF file share to the public (no login required); however, I not figure out how to get Power Query to read that publicly shared PDF file.
Is there a trick related to this using the same method described or is there another set of instructions associated with reading a PDF file referenced as a URL?
I tried the Power Query get from web, but it seems to think it needs to authenticate to access the file even though authentication is not required.
Feb 23 2023 05:07 AM - edited Feb 25 2023 02:14 AM
The following works with a PDF stored on a Personal OneDrive:
- From a Web Browser, download the PDF file
- Open the Downloads page of the Web Browser (pic. from MSFT Edge below)
- Copy the Download link:
In Excel:
- Data (tab) > From Web > Paste the Download link > OK > ...
If you want to try with the PDF I shared, a Download link is:
https://public.am.files.1drv.com/y4mp7lEIREyPqh2W_UW3tzsSdWp6Kpim9Hpj2b2Sp-OcoTenAZ-33ASspv_OnvEkzD3RpLWfT4ftsglbRAfyHuRz5AIZq3FNL8HRX8-n6-0eV7bV1nTQkSrT77dOVHFIwewQYfCE0hNshQGrbt1JW4dVZuYOMEJy0yN4R7_3CweweFjm6wPeZWFYJxC8QSrloB5Fk9lsiU4a2RjqmGAMdxThaztoypmCrUsT8j7kO3H35E?AVOverride=1
Feb 24 2023 03:21 PM
Feb 25 2023 03:01 AM
When OneDrive isn't unresponsive this works no problem here. The attached sample query contains a download link and the expected dataset is returned to Excel
After putting in place the query, if I clear to Permissions to OneDrive (the one highlighted below that matches the download link - https://public.am.files.1drv.com...):
of course, I'm asked to Authenticate (even as Anonymous) the next time I refresh the query
IMHO you should have a look at your Data Source Settings and in case this doesn't help provide more information regarding your context, i.e. where the PDF is actually stored (ex. OneDrive Business, SharePoint...)