SOLVED

Convert External Web / PDF Table to Full Raw Excel Data Table

Copper Contributor

I need some help with making a process much simpler. I'd like to take a table from an external source and turn it into full, raw data. The issue is the formatting from the source. Below is how it looks.

10.17.19 Excel Table.PNG

I need raw data with all the combinations per item in Header1 that would look like this.

10.17.19 Excel Table 2.PNG

 

Is there an efficient way to do this process with a much larger table? One that could be run as a macro would be amazing. Thanks!

2 Replies
best response confirmed by rlussky (Copper Contributor)
Solution

@rlussky 

Hello,

This is a simple job with power query: see enclosed file with solution:

Step1.

- Load the starting table into power query as Table1, fill down the Header1 column, 

- remove the column Header3

- remove the rows with null values in Header2 column

 

Power Query steps:

let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Header1", type text}, {"Header2", type text}, {"Header3", type text}}),
#"Filled Down" = Table.FillDown(#"Changed Type",{"Header1"}),
#"Removed Columns" = Table.RemoveColumns(#"Filled Down",{"Header3"}),
#"Filtered Rows" = Table.SelectRows(#"Removed Columns", each ([Header2] <> null))
in
#"Filtered Rows"

 

Step2.

- Load the starting table into power query as Table2, fill down the Header1 column, 

- remove the column Header2

- remove the rows with null values in Header3 column

 

Power Query steps:

let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Header1", type text}, {"Header2", type text}, {"Header3", type text}}),
#"Filled Down" = Table.FillDown(#"Changed Type",{"Header1"}),
#"Removed Columns" = Table.RemoveColumns(#"Filled Down",{"Header2"}),
#"Filtered Rows" = Table.SelectRows(#"Removed Columns", each ([Header3] <> null))
in
#"Filtered Rows"

 

Step3.

- Merge Table1 with Table2 based on Header1

- Add Header3 from merged table

- Sort the columns to obtain the requested order

 

Power Query steps:

let
Source = Table.NestedJoin(Table1,{"Header1"},Table2,{"Header1"},"Table2",JoinKind.LeftOuter),
#"Expanded Table2" = Table.ExpandTableColumn(Source, "Table2", {"Header3"}, {"Table2.Header3"}),
#"Sorted Rows" = Table.Sort(#"Expanded Table2",{{"Header1", Order.Ascending}, {"Table2.Header3", Order.Ascending}, {"Header2", Order.Ascending}})
in
#"Sorted Rows"

 

For more information, please do not hesitate to contact me.

Rgds,

Dirk Verliefden

 

@Dirk VERLIEFDEN - Thank you! This was really helpful in understanding the logic.

1 best response

Accepted Solutions
best response confirmed by rlussky (Copper Contributor)
Solution

@rlussky 

Hello,

This is a simple job with power query: see enclosed file with solution:

Step1.

- Load the starting table into power query as Table1, fill down the Header1 column, 

- remove the column Header3

- remove the rows with null values in Header2 column

 

Power Query steps:

let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Header1", type text}, {"Header2", type text}, {"Header3", type text}}),
#"Filled Down" = Table.FillDown(#"Changed Type",{"Header1"}),
#"Removed Columns" = Table.RemoveColumns(#"Filled Down",{"Header3"}),
#"Filtered Rows" = Table.SelectRows(#"Removed Columns", each ([Header2] <> null))
in
#"Filtered Rows"

 

Step2.

- Load the starting table into power query as Table2, fill down the Header1 column, 

- remove the column Header2

- remove the rows with null values in Header3 column

 

Power Query steps:

let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Header1", type text}, {"Header2", type text}, {"Header3", type text}}),
#"Filled Down" = Table.FillDown(#"Changed Type",{"Header1"}),
#"Removed Columns" = Table.RemoveColumns(#"Filled Down",{"Header2"}),
#"Filtered Rows" = Table.SelectRows(#"Removed Columns", each ([Header3] <> null))
in
#"Filtered Rows"

 

Step3.

- Merge Table1 with Table2 based on Header1

- Add Header3 from merged table

- Sort the columns to obtain the requested order

 

Power Query steps:

let
Source = Table.NestedJoin(Table1,{"Header1"},Table2,{"Header1"},"Table2",JoinKind.LeftOuter),
#"Expanded Table2" = Table.ExpandTableColumn(Source, "Table2", {"Header3"}, {"Table2.Header3"}),
#"Sorted Rows" = Table.Sort(#"Expanded Table2",{{"Header1", Order.Ascending}, {"Table2.Header3", Order.Ascending}, {"Header2", Order.Ascending}})
in
#"Sorted Rows"

 

For more information, please do not hesitate to contact me.

Rgds,

Dirk Verliefden

 

View solution in original post