SOLVED

Convert External Web / PDF Table to Full Raw Excel Data Table

%3CLINGO-SUB%20id%3D%22lingo-sub-917670%22%20slang%3D%22en-US%22%3EConvert%20External%20Web%20%2F%20PDF%20Table%20to%20Full%20Raw%20Excel%20Data%20Table%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-917670%22%20slang%3D%22en-US%22%3E%3CP%3EI%20need%20some%20help%20with%20making%20a%20process%20much%20simpler.%20I'd%20like%20to%20take%20a%20table%20from%20an%20external%20source%20and%20turn%20it%20into%20full%2C%20raw%20data.%20The%20issue%20is%20the%20formatting%20from%20the%20source.%20Below%20is%20how%20it%20looks.%3C%2FP%3E%3CP%3E%3CSPAN%20class%3D%22lia-inline-image-display-wrapper%20lia-image-align-inline%22%20style%3D%22width%3A%20199px%3B%22%3E%3CIMG%20src%3D%22https%3A%2F%2Fgxcuf89792.i.lithium.com%2Ft5%2Fimage%2Fserverpage%2Fimage-id%2F138189i19EEC1E947D899B3%2Fimage-size%2Flarge%3Fv%3D1.0%26amp%3Bpx%3D999%22%20alt%3D%2210.17.19%20Excel%20Table.PNG%22%20title%3D%2210.17.19%20Excel%20Table.PNG%22%20%2F%3E%3C%2FSPAN%3E%3C%2FP%3E%3CP%3EI%20need%20raw%20data%20with%20all%20the%20combinations%20per%20item%20in%20%3CU%3E%3CSTRONG%3EHeader1%3C%2FSTRONG%3E%20%3C%2FU%3Ethat%20would%20look%20like%20this.%3C%2FP%3E%3CP%3E%3CSPAN%20class%3D%22lia-inline-image-display-wrapper%20lia-image-align-left%22%20style%3D%22width%3A%20199px%3B%22%3E%3CIMG%20src%3D%22https%3A%2F%2Fgxcuf89792.i.lithium.com%2Ft5%2Fimage%2Fserverpage%2Fimage-id%2F138190iF13EFDA16DF2BA10%2Fimage-dimensions%2F199x398%3Fv%3D1.0%22%20width%3D%22199%22%20height%3D%22398%22%20alt%3D%2210.17.19%20Excel%20Table%202.PNG%22%20title%3D%2210.17.19%20Excel%20Table%202.PNG%22%20%2F%3E%3C%2FSPAN%3E%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EIs%20there%20an%20efficient%20way%20to%20do%20this%20process%20with%20a%20much%20larger%20table%3F%20One%20that%20could%20be%20run%20as%20a%20macro%20would%20be%20amazing.%20Thanks!%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-LABS%20id%3D%22lingo-labs-917670%22%20slang%3D%22en-US%22%3E%3CLINGO-LABEL%3EExcel%3C%2FLINGO-LABEL%3E%3CLINGO-LABEL%3EFormulas%20and%20Functions%3C%2FLINGO-LABEL%3E%3CLINGO-LABEL%3EMacros%20and%20VBA%3C%2FLINGO-LABEL%3E%3C%2FLINGO-LABS%3E%3CLINGO-SUB%20id%3D%22lingo-sub-1038772%22%20slang%3D%22en-US%22%3ERe%3A%20Convert%20External%20Web%20%2F%20PDF%20Table%20to%20Full%20Raw%20Excel%20Data%20Table%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-1038772%22%20slang%3D%22en-US%22%3E%3CP%3E%3CA%20href%3D%22https%3A%2F%2Ftechcommunity.microsoft.com%2Ft5%2Fuser%2Fviewprofilepage%2Fuser-id%2F427596%22%20target%3D%22_blank%22%3E%40rlussky%3C%2FA%3E%26nbsp%3B%3C%2FP%3E%3CP%3EHello%2C%3C%2FP%3E%3CP%3EThis%20is%20a%20simple%20job%20with%20power%20query%3A%20see%20enclosed%20file%20with%20solution%3A%3C%2FP%3E%3CP%3E%3CU%3E%3CSTRONG%3EStep1.%3C%2FSTRONG%3E%3C%2FU%3E%3C%2FP%3E%3CP%3E-%20Load%20the%20starting%20table%20into%20power%20query%20as%20Table1%2C%20fill%20down%20the%20Header1%20column%2C%26nbsp%3B%3C%2FP%3E%3CP%3E-%20remove%20the%20column%20Header3%3C%2FP%3E%3CP%3E-%20remove%20the%20rows%20with%20null%20values%20in%20Header2%20column%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3E%3CSTRONG%3EPower%20Query%20steps%3A%3C%2FSTRONG%3E%3C%2FP%3E%3CP%3Elet%3CBR%20%2F%3ESource%20%3D%20Excel.CurrentWorkbook()%7B%5BName%3D%22Table1%22%5D%7D%5BContent%5D%2C%3CBR%20%2F%3E%23%22Changed%20Type%22%20%3D%20Table.TransformColumnTypes(Source%2C%7B%7B%22Header1%22%2C%20type%20text%7D%2C%20%7B%22Header2%22%2C%20type%20text%7D%2C%20%7B%22Header3%22%2C%20type%20text%7D%7D)%2C%3CBR%20%2F%3E%23%22Filled%20Down%22%20%3D%20Table.FillDown(%23%22Changed%20Type%22%2C%7B%22Header1%22%7D)%2C%3CBR%20%2F%3E%23%22Removed%20Columns%22%20%3D%20Table.RemoveColumns(%23%22Filled%20Down%22%2C%7B%22Header3%22%7D)%2C%3CBR%20%2F%3E%23%22Filtered%20Rows%22%20%3D%20Table.SelectRows(%23%22Removed%20Columns%22%2C%20each%20(%5BHeader2%5D%20%26lt%3B%26gt%3B%20null))%3CBR%20%2F%3Ein%3CBR%20%2F%3E%23%22Filtered%20Rows%22%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3E%3CU%3E%3CSTRONG%3EStep2.%3C%2FSTRONG%3E%3C%2FU%3E%3C%2FP%3E%3CP%3E-%20Load%20the%20starting%20table%20into%20power%20query%20as%20Table2%2C%20fill%20down%20the%20Header1%20column%2C%26nbsp%3B%3C%2FP%3E%3CP%3E-%20remove%20the%20column%20Header2%3C%2FP%3E%3CP%3E-%20remove%20the%20rows%20with%20null%20values%20in%20Header3%20column%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3E%3CSTRONG%3EPower%20Query%20steps%3A%3C%2FSTRONG%3E%3C%2FP%3E%3CP%3Elet%3CBR%20%2F%3ESource%20%3D%20Excel.CurrentWorkbook()%7B%5BName%3D%22Table1%22%5D%7D%5BContent%5D%2C%3CBR%20%2F%3E%23%22Changed%20Type%22%20%3D%20Table.TransformColumnTypes(Source%2C%7B%7B%22Header1%22%2C%20type%20text%7D%2C%20%7B%22Header2%22%2C%20type%20text%7D%2C%20%7B%22Header3%22%2C%20type%20text%7D%7D)%2C%3CBR%20%2F%3E%23%22Filled%20Down%22%20%3D%20Table.FillDown(%23%22Changed%20Type%22%2C%7B%22Header1%22%7D)%2C%3CBR%20%2F%3E%23%22Removed%20Columns%22%20%3D%20Table.RemoveColumns(%23%22Filled%20Down%22%2C%7B%22Header2%22%7D)%2C%3CBR%20%2F%3E%23%22Filtered%20Rows%22%20%3D%20Table.SelectRows(%23%22Removed%20Columns%22%2C%20each%20(%5BHeader3%5D%20%26lt%3B%26gt%3B%20null))%3CBR%20%2F%3Ein%3CBR%20%2F%3E%23%22Filtered%20Rows%22%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3E%3CU%3E%3CSTRONG%3EStep3.%3C%2FSTRONG%3E%3C%2FU%3E%3C%2FP%3E%3CP%3E-%20Merge%20Table1%20with%20Table2%20based%20on%20Header1%3C%2FP%3E%3CP%3E-%20Add%20Header3%20from%20merged%20table%3C%2FP%3E%3CP%3E-%20Sort%20the%20columns%20to%20obtain%20the%20requested%20order%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3E%3CSTRONG%3EPower%20Query%20steps%3A%3C%2FSTRONG%3E%3C%2FP%3E%3CP%3Elet%3CBR%20%2F%3ESource%20%3D%20Table.NestedJoin(Table1%2C%7B%22Header1%22%7D%2CTable2%2C%7B%22Header1%22%7D%2C%22Table2%22%2CJoinKind.LeftOuter)%2C%3CBR%20%2F%3E%23%22Expanded%20Table2%22%20%3D%20Table.ExpandTableColumn(Source%2C%20%22Table2%22%2C%20%7B%22Header3%22%7D%2C%20%7B%22Table2.Header3%22%7D)%2C%3CBR%20%2F%3E%23%22Sorted%20Rows%22%20%3D%20Table.Sort(%23%22Expanded%20Table2%22%2C%7B%7B%22Header1%22%2C%20Order.Ascending%7D%2C%20%7B%22Table2.Header3%22%2C%20Order.Ascending%7D%2C%20%7B%22Header2%22%2C%20Order.Ascending%7D%7D)%3CBR%20%2F%3Ein%3CBR%20%2F%3E%23%22Sorted%20Rows%22%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EFor%20more%20information%2C%20please%20do%20not%20hesitate%20to%20contact%20me.%3C%2FP%3E%3CP%3ERgds%2C%3C%2FP%3E%3CP%3EDirk%20Verliefden%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3C%2FLINGO-BODY%3E
Highlighted
New Contributor

I need some help with making a process much simpler. I'd like to take a table from an external source and turn it into full, raw data. The issue is the formatting from the source. Below is how it looks.

10.17.19 Excel Table.PNG

I need raw data with all the combinations per item in Header1 that would look like this.

10.17.19 Excel Table 2.PNG

 

Is there an efficient way to do this process with a much larger table? One that could be run as a macro would be amazing. Thanks!

2 Replies
Highlighted
Solution

@rlussky 

Hello,

This is a simple job with power query: see enclosed file with solution:

Step1.

- Load the starting table into power query as Table1, fill down the Header1 column, 

- remove the column Header3

- remove the rows with null values in Header2 column

 

Power Query steps:

let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Header1", type text}, {"Header2", type text}, {"Header3", type text}}),
#"Filled Down" = Table.FillDown(#"Changed Type",{"Header1"}),
#"Removed Columns" = Table.RemoveColumns(#"Filled Down",{"Header3"}),
#"Filtered Rows" = Table.SelectRows(#"Removed Columns", each ([Header2] <> null))
in
#"Filtered Rows"

 

Step2.

- Load the starting table into power query as Table2, fill down the Header1 column, 

- remove the column Header2

- remove the rows with null values in Header3 column

 

Power Query steps:

let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Header1", type text}, {"Header2", type text}, {"Header3", type text}}),
#"Filled Down" = Table.FillDown(#"Changed Type",{"Header1"}),
#"Removed Columns" = Table.RemoveColumns(#"Filled Down",{"Header2"}),
#"Filtered Rows" = Table.SelectRows(#"Removed Columns", each ([Header3] <> null))
in
#"Filtered Rows"

 

Step3.

- Merge Table1 with Table2 based on Header1

- Add Header3 from merged table

- Sort the columns to obtain the requested order

 

Power Query steps:

let
Source = Table.NestedJoin(Table1,{"Header1"},Table2,{"Header1"},"Table2",JoinKind.LeftOuter),
#"Expanded Table2" = Table.ExpandTableColumn(Source, "Table2", {"Header3"}, {"Table2.Header3"}),
#"Sorted Rows" = Table.Sort(#"Expanded Table2",{{"Header1", Order.Ascending}, {"Table2.Header3", Order.Ascending}, {"Header2", Order.Ascending}})
in
#"Sorted Rows"

 

For more information, please do not hesitate to contact me.

Rgds,

Dirk Verliefden

 

Highlighted

@Dirk VERLIEFDEN - Thank you! This was really helpful in understanding the logic.