Forum Discussion

Hugh Barry's avatar
Hugh Barry
Copper Contributor
Feb 01, 2018
Solved

Splitting a file with get and transform

I am a G&T novice and just wondering if it is possible to use this function to split a file into two separate datasets.

 

I have a .txt file which has a set of headers at the top followed by data in rows.  Then at some point half way through the file there is a new set of headers followed by rows of different data.

 

In Excel I manually split these into two new worksheets but i am wondering if Get & Transform can help automate the process?

 

Thanks for any feedback in advance

  • Hi Huge,

     

    Yes, the idea is to define where is the row with your second header within the source table. You may add Index column, select the row with second header, define it's position based on Index and skip all rows before it to have second part of the table.

     

    Then duplicate the query and do exactly the same but keep only first rows before second header.

    Both queries to land as tables into separate sheets.

    Like in this script

    let
        Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
        AddIndex = Table.AddIndexColumn(Source, "Index", 0, 1),
    
        // Select the row with second header
        SecondHeader = Table.SelectRows(AddIndex, each ([Column1] = "x")),
        // and find it's position
        RowToSplit = SecondHeader{0}[Index],
    
        // Based on it skip first part
        SecondPart = Table.Skip(Source,RowToSplit),
    
        // and cosmetic
        PromotedHeaders = Table.PromoteHeaders(SecondPart, [PromoteAllScalars=true]),
        ToNumber = Table.TransformColumnTypes(PromotedHeaders,{{"x", Int64.Type}, {"y", Int64.Type}})
    in
        ToNumber

    and in attached file.

    Details depends on how your initial data is structured.

4 Replies

  • SergeiBaklan's avatar
    SergeiBaklan
    Diamond Contributor

    Hi Huge,

     

    Yes, the idea is to define where is the row with your second header within the source table. You may add Index column, select the row with second header, define it's position based on Index and skip all rows before it to have second part of the table.

     

    Then duplicate the query and do exactly the same but keep only first rows before second header.

    Both queries to land as tables into separate sheets.

    Like in this script

    let
        Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
        AddIndex = Table.AddIndexColumn(Source, "Index", 0, 1),
    
        // Select the row with second header
        SecondHeader = Table.SelectRows(AddIndex, each ([Column1] = "x")),
        // and find it's position
        RowToSplit = SecondHeader{0}[Index],
    
        // Based on it skip first part
        SecondPart = Table.Skip(Source,RowToSplit),
    
        // and cosmetic
        PromotedHeaders = Table.PromoteHeaders(SecondPart, [PromoteAllScalars=true]),
        ToNumber = Table.TransformColumnTypes(PromotedHeaders,{{"x", Int64.Type}, {"y", Int64.Type}})
    in
        ToNumber

    and in attached file.

    Details depends on how your initial data is structured.

    • Hugh Barry's avatar
      Hugh Barry
      Copper Contributor

      Sergei,

       

      On a related point, as I mentioned I am a novice with G&T and getting used to the idea of "connections" to data.  Once I perfect the query that transforms the data and splits the file I would like to be able to apply this to new output.

       

      Right now I am creating a new query that connects to and transforms e.g. "January.txt" but next month when I have "February.txt" it seems that i will have to connect to that file and re-create the query to apply to it.

       

      I am sure there is something I am missing as otherwise the approach I might take is have a connection to "Generic.txt" which never changes and then delete all data within and replace with new - then refresh.

       

      Hope I am clear and thanks for the feedback once again.

      Hugh

      • SergeiBaklan's avatar
        SergeiBaklan
        Diamond Contributor

        Hi Hugh,

         

        If I understood correctly you'd like to apply same transformation to data imported from January.txt, February.txt, etc.

         

        If so based on your initial query you may create the function for which input parameter will be the name of text file (how to create e.g. https://blog.learningtree.com/creating-functions-m-power-query-formula-language/, first i googled), after that your query will look like

        let
          Source = fnMyTransformation("January.txt")
        in
          Source

        Not to hardcode name of the file could be defined as parameter or picked-up from Excel cell.

         

    • Hugh Barry's avatar
      Hugh Barry
      Copper Contributor

      Thanks Sergei, really appreciate the feedback,

      Hugh