Forum Discussion
how do i extract data from a pdf file using power query in excel?
If you are on Windows, on Excel 365 and not on semi-annual channel you shall have this connector
Perhaps these days availability bit wider, not sure. But in any case that's for subscription model.
- LorenzoJun 29, 2021Silver Contributor
Copy/paste the following code in a new blank query using the Advanced Editor
let Source = Record.ToTable( Record.RemoveFields(#shared, Record.FieldNames(#sections[Section1]) ) ), SelectedTypeFunction = Table.SelectRows(Source, each [Value] is function), RemovedValue = Table.SelectColumns(SelectedTypeFunction, {"Name"}), SortedByName = Table.Sort(RemovedValue, {{"Name", Order.Ascending}}) in SortedByName
then load the result of that query on a sheet. If you find Pdf.Tables in column [Name] the function is available in your version of Excel/PQ. Per its name the function is looking for table(s) in the PDF...
- samuel_kodjoeJun 29, 2021Copper Contributorhi Lz!
I saw Pdf Tables in column name after loading the code, how then do i locate it.
Thanks for the feedback.- LorenzoJun 29, 2021Silver Contributor
Given you don't have the corresponding wizard in Excel user interface, you have to code it yourself. You start with a new blank query and in the formula bar, something like:
= Pdf.Tables(File.Contents("FolderPath\Example.pdf"), [Implementation="1.2"])
Then, assuming the function finds a Table in your PDF, click on it in the [Data] column:
NB: re. [Implementation=x.y] the Pdf.Tables doc. says:
- The newest version should always give the best results
I've seen a couple of cases where this wasn't true so you have to test what gives you the best result based on your PDF - Valid values are "1.3", "1.2", "1.1", or null
1.3 isn't avail. yet. Please refer to DanielPerelman's reply on this thread
- The newest version should always give the best results