Forum Discussion
how do i extract data from a pdf file using power query in excel?
Copy/paste the following code in a new blank query using the Advanced Editor
let
Source = Record.ToTable(
Record.RemoveFields(#shared,
Record.FieldNames(#sections[Section1])
)
),
SelectedTypeFunction = Table.SelectRows(Source, each [Value] is function),
RemovedValue = Table.SelectColumns(SelectedTypeFunction, {"Name"}),
SortedByName = Table.Sort(RemovedValue, {{"Name", Order.Ascending}})
in
SortedByName
then load the result of that query on a sheet. If you find https://docs.microsoft.com/en-us/powerquery-m/pdf-tables in column [Name] the function is available in your version of Excel/PQ. Per its name the function is looking for table(s) in the PDF...
- samuel_kodjoeJun 29, 2021Copper Contributorhi Lz!
I saw Pdf Tables in column name after loading the code, how then do i locate it.
Thanks for the feedback.- LorenzoJun 29, 2021Silver Contributor
Given you don't have the corresponding wizard in Excel user interface, you have to code it yourself. You start with a new blank query and in the formula bar, something like:
= Pdf.Tables(File.Contents("FolderPath\Example.pdf"), [Implementation="1.2"])Then, assuming the function finds a Table in your PDF, click on it in the [Data] column:
NB: re. [Implementation=x.y] the https://docs.microsoft.com/en-us/powerquery-m/pdf-tables says:
- The newest version should always give the best results
I've seen a couple of cases where this wasn't true so you have to test what gives you the best result based on your PDF - Valid values are "1.3", "1.2", "1.1", or null
1.3 isn't avail. yet. Please refer to DanielPerelman's reply on https://docs.microsoft.com/en-us/answers/questions/382594/pdf-file-transformation.html
- SergeiBaklanJun 29, 2021Diamond Contributor
Small comment, 1.3 is available now. Not sure about exact PQ version.
- The newest version should always give the best results