Forum Discussion
power query - data cleaning and analysis
Hi Team,
Please the attached excel workbook contains two sheet namely "system data" which was generated from a system and the second sheet "transformed" which was generated using "text to column". i would like to know if i can achieve the same transformation using power query.
i am talking about the fixed width feature as shown belowSergeiBaklan
You may query the column, slit text by position, change types, return table into sheet and add totals.
Please check in attached.
- samuel_kodjoeCopper ContributorWOW!!! that was very impressive. How do i accurately specify the positions? it seems to me like a tedious task
I did that by couple of iteration and approximately, since we use TRIM() exact position most probably is not required.
However, if you are on Excel with dynamic arrays you add in couple of cells one under another
=SEQUENCE(,LEN(A2),0) and =MID(A2,SEQUENCE(,LEN(A2)),1)
It looks like
Positions start from 0 and each next position is first space after the previous field. That's one time job if only your system generates the file always in exactly the same format.
If format is different from time time I guess removing of excess spaces could be done by Power Query but that will require M-script coding.
- noushad1ali2000Copper Contributor
Greetings,
I am regular watching your vlog related excel, and now i need your help to clean a complex type of data extracted from my company system which i need to clean and provide them in simple form to create some insight from that.
1st sheet as sample extracted
i have given two sets of data as sample, whereas in my co file having 25860 rows of data for almost 1185-person
2nd sheets need to be normalized
it would be much appreciated if you could help me out
Thanks
Naushad Ali
1st table
Extracted messy data 11/11/2012 ID 818181 role data practioner school grade 12 try fry sum code title 121212 jsr ________________________________________________________________________________________________________ ID 919191 role Sr Data player school grade 14 try fry sum code title 121212 jsr 212121 kel ________________________________________________________________________________________________________ 2nd table - needed
ID Role grade code title 818181 data practioner 12 121212 jsr 919191 Sr Data player 14 121212 jsr 212121 kel With some assumptions regarding the logic in source data, and keeping the source in csv file, main query could be
let BlockSize = 15, Source = Csv.Document(File.Contents("C:\Test\Source.csv"),[Delimiter=",", Columns=15, Encoding=65001, QuoteStyle=QuoteStyle.None]), RemoveDate = Table.Skip(Source,1), SplitSource = Table.Split( RemoveDate, BlockSize), Custom1 = Table.Combine( List.Transform( SplitSource, (t) => fnTransformID(t) ) ) in Custom1
which calls the function
(tbl as table) => let RenameColumns = Table.RenameColumns( tbl, { {"Column2", "Code"} , {"Column3", "Grade"} , {"Column5", "ID"} , {"Column9", "Title"} , {"Column13", "Role"} }), names = {"ID", "Role", "Grade", "Code", "Title"}, SelectColumns = Table.SelectColumns(RenameColumns,names), ReplaceSpace = Table.ReplaceValue(SelectColumns," ",null,Replacer.ReplaceValue,names), ReplaceEmpty = Table.ReplaceValue(ReplaceSpace,"",null,Replacer.ReplaceValue,names), ReplaceRoleWord = Table.ReplaceValue(ReplaceEmpty,"role",null,Replacer.ReplaceValue,{"Title"}), ToColumns = Table.ToColumns( ReplaceRoleWord ), RemoveNulls = List.Transform( ToColumns, (q) => List.RemoveNulls(q)), GetTable = Table.FromColumns( RemoveNulls, names ) in GetTable
Result is
Please check attached file