Forum Discussion
Convert External Web / PDF Table to Full Raw Excel Data Table
I need some help with making a process much simpler. I'd like to take a table from an external source and turn it into full, raw data. The issue is the formatting from the source. Below is how it looks.
I need raw data with all the combinations per item in Header1 that would look like this.
Is there an efficient way to do this process with a much larger table? One that could be run as a macro would be amazing. Thanks!
Hello,
This is a simple job with power query: see enclosed file with solution:
Step1.
- Load the starting table into power query as Table1, fill down the Header1 column,
- remove the column Header3
- remove the rows with null values in Header2 column
Power Query steps:
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Header1", type text}, {"Header2", type text}, {"Header3", type text}}),
#"Filled Down" = Table.FillDown(#"Changed Type",{"Header1"}),
#"Removed Columns" = Table.RemoveColumns(#"Filled Down",{"Header3"}),
#"Filtered Rows" = Table.SelectRows(#"Removed Columns", each ([Header2] <> null))
in
#"Filtered Rows"Step2.
- Load the starting table into power query as Table2, fill down the Header1 column,
- remove the column Header2
- remove the rows with null values in Header3 column
Power Query steps:
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Header1", type text}, {"Header2", type text}, {"Header3", type text}}),
#"Filled Down" = Table.FillDown(#"Changed Type",{"Header1"}),
#"Removed Columns" = Table.RemoveColumns(#"Filled Down",{"Header2"}),
#"Filtered Rows" = Table.SelectRows(#"Removed Columns", each ([Header3] <> null))
in
#"Filtered Rows"Step3.
- Merge Table1 with Table2 based on Header1
- Add Header3 from merged table
- Sort the columns to obtain the requested order
Power Query steps:
let
Source = Table.NestedJoin(Table1,{"Header1"},Table2,{"Header1"},"Table2",JoinKind.LeftOuter),
#"Expanded Table2" = Table.ExpandTableColumn(Source, "Table2", {"Header3"}, {"Table2.Header3"}),
#"Sorted Rows" = Table.Sort(#"Expanded Table2",{{"Header1", Order.Ascending}, {"Table2.Header3", Order.Ascending}, {"Header2", Order.Ascending}})
in
#"Sorted Rows"For more information, please do not hesitate to contact me.
Rgds,
Dirk Verliefden
2 Replies
- Dirk VERLIEFDENCopper Contributor
Hello,
This is a simple job with power query: see enclosed file with solution:
Step1.
- Load the starting table into power query as Table1, fill down the Header1 column,
- remove the column Header3
- remove the rows with null values in Header2 column
Power Query steps:
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Header1", type text}, {"Header2", type text}, {"Header3", type text}}),
#"Filled Down" = Table.FillDown(#"Changed Type",{"Header1"}),
#"Removed Columns" = Table.RemoveColumns(#"Filled Down",{"Header3"}),
#"Filtered Rows" = Table.SelectRows(#"Removed Columns", each ([Header2] <> null))
in
#"Filtered Rows"Step2.
- Load the starting table into power query as Table2, fill down the Header1 column,
- remove the column Header2
- remove the rows with null values in Header3 column
Power Query steps:
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Header1", type text}, {"Header2", type text}, {"Header3", type text}}),
#"Filled Down" = Table.FillDown(#"Changed Type",{"Header1"}),
#"Removed Columns" = Table.RemoveColumns(#"Filled Down",{"Header2"}),
#"Filtered Rows" = Table.SelectRows(#"Removed Columns", each ([Header3] <> null))
in
#"Filtered Rows"Step3.
- Merge Table1 with Table2 based on Header1
- Add Header3 from merged table
- Sort the columns to obtain the requested order
Power Query steps:
let
Source = Table.NestedJoin(Table1,{"Header1"},Table2,{"Header1"},"Table2",JoinKind.LeftOuter),
#"Expanded Table2" = Table.ExpandTableColumn(Source, "Table2", {"Header3"}, {"Table2.Header3"}),
#"Sorted Rows" = Table.Sort(#"Expanded Table2",{{"Header1", Order.Ascending}, {"Table2.Header3", Order.Ascending}, {"Header2", Order.Ascending}})
in
#"Sorted Rows"For more information, please do not hesitate to contact me.
Rgds,
Dirk Verliefden
- rlusskyCopper Contributor
Dirk VERLIEFDEN - Thank you! This was really helpful in understanding the logic.