Forum Discussion

rlussky's avatar
rlussky
Copper Contributor
Oct 17, 2019
Solved

Convert External Web / PDF Table to Full Raw Excel Data Table

I need some help with making a process much simpler. I'd like to take a table from an external source and turn it into full, raw data. The issue is the formatting from the source. Below is how it looks.

I need raw data with all the combinations per item in Header1 that would look like this.

 

Is there an efficient way to do this process with a much larger table? One that could be run as a macro would be amazing. Thanks!

  • rlussky 

    Hello,

    This is a simple job with power query: see enclosed file with solution:

    Step1.

    - Load the starting table into power query as Table1, fill down the Header1 column, 

    - remove the column Header3

    - remove the rows with null values in Header2 column

     

    Power Query steps:

    let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Header1", type text}, {"Header2", type text}, {"Header3", type text}}),
    #"Filled Down" = Table.FillDown(#"Changed Type",{"Header1"}),
    #"Removed Columns" = Table.RemoveColumns(#"Filled Down",{"Header3"}),
    #"Filtered Rows" = Table.SelectRows(#"Removed Columns", each ([Header2] <> null))
    in
    #"Filtered Rows"

     

    Step2.

    - Load the starting table into power query as Table2, fill down the Header1 column, 

    - remove the column Header2

    - remove the rows with null values in Header3 column

     

    Power Query steps:

    let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Header1", type text}, {"Header2", type text}, {"Header3", type text}}),
    #"Filled Down" = Table.FillDown(#"Changed Type",{"Header1"}),
    #"Removed Columns" = Table.RemoveColumns(#"Filled Down",{"Header2"}),
    #"Filtered Rows" = Table.SelectRows(#"Removed Columns", each ([Header3] <> null))
    in
    #"Filtered Rows"

     

    Step3.

    - Merge Table1 with Table2 based on Header1

    - Add Header3 from merged table

    - Sort the columns to obtain the requested order

     

    Power Query steps:

    let
    Source = Table.NestedJoin(Table1,{"Header1"},Table2,{"Header1"},"Table2",JoinKind.LeftOuter),
    #"Expanded Table2" = Table.ExpandTableColumn(Source, "Table2", {"Header3"}, {"Table2.Header3"}),
    #"Sorted Rows" = Table.Sort(#"Expanded Table2",{{"Header1", Order.Ascending}, {"Table2.Header3", Order.Ascending}, {"Header2", Order.Ascending}})
    in
    #"Sorted Rows"

     

    For more information, please do not hesitate to contact me.

    Rgds,

    Dirk Verliefden

     

2 Replies

  • Dirk VERLIEFDEN's avatar
    Dirk VERLIEFDEN
    Copper Contributor

    rlussky 

    Hello,

    This is a simple job with power query: see enclosed file with solution:

    Step1.

    - Load the starting table into power query as Table1, fill down the Header1 column, 

    - remove the column Header3

    - remove the rows with null values in Header2 column

     

    Power Query steps:

    let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Header1", type text}, {"Header2", type text}, {"Header3", type text}}),
    #"Filled Down" = Table.FillDown(#"Changed Type",{"Header1"}),
    #"Removed Columns" = Table.RemoveColumns(#"Filled Down",{"Header3"}),
    #"Filtered Rows" = Table.SelectRows(#"Removed Columns", each ([Header2] <> null))
    in
    #"Filtered Rows"

     

    Step2.

    - Load the starting table into power query as Table2, fill down the Header1 column, 

    - remove the column Header2

    - remove the rows with null values in Header3 column

     

    Power Query steps:

    let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Header1", type text}, {"Header2", type text}, {"Header3", type text}}),
    #"Filled Down" = Table.FillDown(#"Changed Type",{"Header1"}),
    #"Removed Columns" = Table.RemoveColumns(#"Filled Down",{"Header2"}),
    #"Filtered Rows" = Table.SelectRows(#"Removed Columns", each ([Header3] <> null))
    in
    #"Filtered Rows"

     

    Step3.

    - Merge Table1 with Table2 based on Header1

    - Add Header3 from merged table

    - Sort the columns to obtain the requested order

     

    Power Query steps:

    let
    Source = Table.NestedJoin(Table1,{"Header1"},Table2,{"Header1"},"Table2",JoinKind.LeftOuter),
    #"Expanded Table2" = Table.ExpandTableColumn(Source, "Table2", {"Header3"}, {"Table2.Header3"}),
    #"Sorted Rows" = Table.Sort(#"Expanded Table2",{{"Header1", Order.Ascending}, {"Table2.Header3", Order.Ascending}, {"Header2", Order.Ascending}})
    in
    #"Sorted Rows"

     

    For more information, please do not hesitate to contact me.

    Rgds,

    Dirk Verliefden

     

Resources