Forum Discussion

samuel_kodjoe's avatar
samuel_kodjoe
Copper Contributor
May 14, 2021

power query - data cleaning and analysis

Hi Team,

Please the attached excel workbook contains two sheet namely "system data" which was generated from a system and the second sheet "transformed" which was generated using "text to column". i would like to know if i can achieve the same transformation using power query.

    • samuel_kodjoe's avatar
      samuel_kodjoe
      Copper Contributor
      WOW!!! that was very impressive. How do i accurately specify the positions? it seems to me like a tedious task
      • SergeiBaklan's avatar
        SergeiBaklan
        MVP

        samuel_kodjoe 

        I did that by couple of iteration and approximately, since we use TRIM() exact position most probably is not required.

        However, if you are on Excel with dynamic arrays you add in couple of cells one under another

        =SEQUENCE(,LEN(A2),0)
        and
        =MID(A2,SEQUENCE(,LEN(A2)),1)

        It looks like

        Positions start from 0 and each next position is first space after the previous field. That's one time job if only your system generates the file always in exactly the same format.

         

        If format is different from time time I guess removing of excess spaces could be done by Power Query but that will require M-script coding.

    • noushad1ali2000's avatar
      noushad1ali2000
      Copper Contributor

      Greetings,

      I am regular watching your vlog related excel, and now i need your help to clean a complex type of data extracted from my company system which i need to clean and provide them in simple form to create some insight from that.

      1st sheet as sample extracted

           i have given two sets of data as sample, whereas in my co file having 25860 rows of data for almost 1185-person

      2nd sheets need to be normalized

      it would be much appreciated if you could help me out

      Thanks

      Naushad Ali

      1st table

      Extracted messy data11/11/2012           
      ID   818181   role   data practioner 
      school              
      grade 12            
      try              
                     
      fry              
      sum              
      code         title    
       121212      jsr      
                     
                     
                     
                     
                     
      ________________________________________________________________________________________________________   
      ID   919191   role   Sr Data player 
      school              
      grade 14            
      try              
                     
      fry              
      sum              
      code         title    
       121212      jsr      
       212121      kel      
                     
                     
                     
                     
      ________________________________________________________________________________________________________   
                     

       

      2nd table - needed

      IDRolegradecodetitle
      818181data practioner12121212jsr
      919191Sr Data player14121212jsr
         212121kel
      • SergeiBaklan's avatar
        SergeiBaklan
        MVP

        noushad1ali2000 

        With some assumptions regarding the logic in source data, and keeping the source in csv file, main query could be

        let
            BlockSize = 15,
            Source = Csv.Document(File.Contents("C:\Test\Source.csv"),[Delimiter=",", Columns=15, Encoding=65001, QuoteStyle=QuoteStyle.None]),
            RemoveDate = Table.Skip(Source,1),
            SplitSource = Table.Split( RemoveDate, BlockSize),
            Custom1 = Table.Combine( List.Transform( SplitSource, (t) => fnTransformID(t) ) )
        in
            Custom1

        which calls the function

        (tbl as table) =>
        
        let
            RenameColumns = Table.RenameColumns(
                tbl,
                {
                      {"Column2", "Code"}
                    , {"Column3", "Grade"}
                    , {"Column5", "ID"}
                    , {"Column9", "Title"}
                    , {"Column13", "Role"}
                }),
            names = {"ID", "Role", "Grade", "Code", "Title"},
            SelectColumns = Table.SelectColumns(RenameColumns,names),
            ReplaceSpace = Table.ReplaceValue(SelectColumns," ",null,Replacer.ReplaceValue,names),
            ReplaceEmpty = Table.ReplaceValue(ReplaceSpace,"",null,Replacer.ReplaceValue,names),
            ReplaceRoleWord = Table.ReplaceValue(ReplaceEmpty,"role",null,Replacer.ReplaceValue,{"Title"}),
            ToColumns = Table.ToColumns( ReplaceRoleWord ),
            RemoveNulls = List.Transform( ToColumns, (q) => List.RemoveNulls(q)),
            GetTable = Table.FromColumns( RemoveNulls, names )
        in
            GetTable

        Result is

        Please check attached file

Resources