Forum Discussion
Converting PDF to Excel
I'm a struggling with the should-be-simple task of converting a pdf to Excel table. I can use the Get Data ->From PDF (Select Multiple Items) function to Query but it puts all the tables (there are 27) into separate sheets (by page in the PDF). I tried to consolidate the sheets and struggled with this too, unfortunately. There has got to be an easy way. The original file is on this web page: https://cdn1.sportngin.com/attachments/document/0b07-2896686/2023_Participant_List_70.3_Waco_4.27.23.pdf
Thanks for sharing a file. Combining all Tables in the Pdf you shared seems to do it (attached), just edit the query to change the path to the file - line File.Contents("ToBeReplaceWithYourPath")
let Source = Pdf.Tables( File.Contents("D:\Lorenzo\Downloads\2023_Participant_List_70.3_Waco_4.27.23.pdf"), [Implementation="1.3"] ), KindTable = Table.SelectRows(Source, each ([Kind] = "Table")), RemovedOtherColumns = Table.SelectColumns(KindTable,{"Data"}), CombinedTables = Table.Combine(RemovedOtherColumns[Data]), PromotedHeaders = Table.PromoteHeaders(CombinedTables, [PromoteAllScalars=true]), ChangedTypes = Table.TransformColumnTypes(PromotedHeaders, {{"LAST NAME", type text}, {"FIRST NAME", type text}, {"GENDER", type text}, {"AGE AS OF#(lf)12/31/2023", Int64.Type}, {"COUNTRY", type text}} ) in ChangedTypes
6 Replies
- LorenzoSilver Contributor
Thanks for sharing a file. Combining all Tables in the Pdf you shared seems to do it (attached), just edit the query to change the path to the file - line File.Contents("ToBeReplaceWithYourPath")
let Source = Pdf.Tables( File.Contents("D:\Lorenzo\Downloads\2023_Participant_List_70.3_Waco_4.27.23.pdf"), [Implementation="1.3"] ), KindTable = Table.SelectRows(Source, each ([Kind] = "Table")), RemovedOtherColumns = Table.SelectColumns(KindTable,{"Data"}), CombinedTables = Table.Combine(RemovedOtherColumns[Data]), PromotedHeaders = Table.PromoteHeaders(CombinedTables, [PromoteAllScalars=true]), ChangedTypes = Table.TransformColumnTypes(PromotedHeaders, {{"LAST NAME", type text}, {"FIRST NAME", type text}, {"GENDER", type text}, {"AGE AS OF#(lf)12/31/2023", Int64.Type}, {"COUNTRY", type text}} ) in ChangedTypes
- Jeff_LipschultzCopper ContributorWow. Advanced.
- LorenzoSilver Contributor
Not avanced at all you can do it with the PQ User Interface only
The PDF you shared is really clean and doesn't require any specific transformation if you combine the Tables (not the Pages). The starting point is to select (in the PQ Navigator) the file name only and not Pages or Tables - as pictured in this thread
- Patrick2788Silver ContributorThe PDF looks clean enough that you may want to try your luck with opening the PDF in Word which can convert it to a docx. Then you can copy/paste to Excel.
- Jeff_LipschultzCopper Contributor
Patrick2788 Great idea. But unfortunately, it did not copy paste into Word very cleanly (or do you have a hack for that, too?).
- Jeff_LipschultzCopper ContributorBut, you got me thinking! I cut and paste as a text file and I was able to import that TXT file cleanly into excel!