Forum Discussion
Jeff_Lipschultz
Jun 19, 2023Copper Contributor
Converting PDF to Excel
I'm a struggling with the should-be-simple task of converting a pdf to Excel table. I can use the Get Data ->From PDF (Select Multiple Items) function to Query but it puts all the tables (there are ...
- Jun 19, 2023
Thanks for sharing a file. Combining all Tables in the Pdf you shared seems to do it (attached), just edit the query to change the path to the file - line File.Contents("ToBeReplaceWithYourPath")
let Source = Pdf.Tables( File.Contents("D:\Lorenzo\Downloads\2023_Participant_List_70.3_Waco_4.27.23.pdf"), [Implementation="1.3"] ), KindTable = Table.SelectRows(Source, each ([Kind] = "Table")), RemovedOtherColumns = Table.SelectColumns(KindTable,{"Data"}), CombinedTables = Table.Combine(RemovedOtherColumns[Data]), PromotedHeaders = Table.PromoteHeaders(CombinedTables, [PromoteAllScalars=true]), ChangedTypes = Table.TransformColumnTypes(PromotedHeaders, {{"LAST NAME", type text}, {"FIRST NAME", type text}, {"GENDER", type text}, {"AGE AS OF#(lf)12/31/2023", Int64.Type}, {"COUNTRY", type text}} ) in ChangedTypes
Jeff_Lipschultz
Jun 19, 2023Copper Contributor
Wow. Advanced.
Lorenzo
Jun 20, 2023Silver Contributor
Not avanced at all you can do it with the PQ User Interface only
The PDF you shared is really clean and doesn't require any specific transformation if you combine the Tables (not the Pages). The starting point is to select (in the PQ Navigator) the file name only and not Pages or Tables - as pictured in this thread