SOLVED

HELP! How can I apply rule based filtering and row deletion in Power Query

Copper Contributor

Hey everyone! @Excel 

I am new to Power Query and need some help. I am trying to automate the data cleaning process as I have source data from a csv file that has repeated matching errors. I need to retrieve new updated sales data every week. Unfortunately, we don't have the ability to change the source data as it from a third party.

As the source data is so large, cleaning it manually over and over every week is not an option as it would take many hours on repeat. The current team have just left the errors and accepted the sales figures being slightly off, but I want to amend this.

Issue #1 - A variety of different products without a category assigned - Several hundred of the products don't have a category assigned to them. I want to replace the 'null' value with the correct category,  but there is 5 different categories depending on the group the product is in. I have about 20 wider sub-categories which are all correct, so in a normal excel table I would filter sub-category one by one and add the correct category to the visible products, then repeat until all the null values were filled. However, once I do this is Power Query, I lose all the other data, so only the filtered sub-category is available. Is there any way to replace 'null' values with a condition attached (e.g. replace 'null' with 'Dolls' if sub-category matches 'Barbies'? Or is there a way to stop the filter being irreversible?

Issue #2 - A variety of different products with no brand assigned - Most of the products have a brand in the brand column, but some also have a 'null' value. Every single product name is complete and has a brand within the name. Is there any way to do the replacement of 'null' values based on a partial match in the name column?

Issue #3 - Competitor products accidently included in the data - A handful of competitor products have accidently been included in the data. I would like to delete these as the make a small but false contribution to our sales summary. Is there any way to delete rows that include a certain value? (e.g. Delete rows that contain 'compname' in the brand column).

If none of the above is possible, I will do a work around by making 20 separate 'filtered' queries that combine together into a larger final data query (essentially break them up to automate the cleaning and then bring the cleaned data back together).

I am happy to commit the time upfront to do this, however I am concerned about the complexity and data loading time this would create.

If anyone has any creative solutions to avoid splitting out the query, I would be eternally grateful!! 

TIA! 

3 Replies
best response confirmed by Lauren_Michelle (Copper Contributor)
Solution

@Lauren_Michelle 

In the attached file you can enter data in the blue dynamic table. Then you can click in any cell of the green table and right-click with the mouse and select refresh to update the green result table.

products categories and brands.JPG

The Query deletes all the rows that contain "Compname" in the "Brand" column. The filter can be expanded in order to delete rows that contain "Compname 1", "Compname 2" or "Compname 3" in the "Brand" column.

 

The Query adds "Doll" in the "Category added" column if "Barbie" is the sub-category and if the category is null. This is currently done with an IF THEN ELSE statement which can be adapted for 20 sub-categories.

 

The Query adds the brand Alpha, Beta, Gamma, Delta, Lambda, Epsilon or Zeta in the "Brand completed" column if there is a partial match in the "Product name" column and if the "Brand" column is empty in the blue table. This is currently done with a nested IF THEN ELSE IF statement and can be adapted if there are e.g 30 brands. However there should be a more suitable solution if you have e.g. 400 brands.

Hi @Lauren_Michelle 

 

Issue #3 - Competitor products accidently included in the data
I would like to delete these...Is there any way to delete rows that include a certain value? (e.g. Delete rows that contain 'compname' in the brand column)

Not quite sure this is the best approach. Let's assume that in Week1 you have unexpected [Brand]s A, B and C and you setup your cleaining query to exclude A,B,C. In Week2 your next CSV brings unexpected [Brand]s A, C, E, F => The last 2 won't be excluded by your query

 

In an ideal world you should know what your [Brand]s are. With that assumption a possible approach would be:
- Setup a table with your [Brand]s
- Merge that table with the Table from CSV

 

Sample.png

 

In the attached example I used Table.FuzzyNestedJoin (instead of Table.NestedJoin) as the function has options that help you better control the matching (Power Query is case-sensitive: a <> A):

 

// CompetitorsRemoved
let
    Source = TableFromCSV,
    FuzzyMatch = Table.FuzzyNestedJoin(
        Source, {"Brand"},
        TableBrands, {"Brand"},
        "BRANDS", JoinKind.Inner,
        [IgnoreCase=true, IgnoreSpace=false, Threshold=1]
    ),
    RemovedColumn = Table.RemoveColumns(FuzzyMatch,{"BRANDS"})
in
    RemovedColumn


Hope this makes sense, helps a bit and don't confuse you too much

Wow thank you so much, you've saved me days of messing around this is brilliant 😊 I have managed to adapt and custom fit it to my data and it is working perfectly.

Thank you so much!
1 best response

Accepted Solutions
best response confirmed by Lauren_Michelle (Copper Contributor)
Solution

@Lauren_Michelle 

In the attached file you can enter data in the blue dynamic table. Then you can click in any cell of the green table and right-click with the mouse and select refresh to update the green result table.

products categories and brands.JPG

The Query deletes all the rows that contain "Compname" in the "Brand" column. The filter can be expanded in order to delete rows that contain "Compname 1", "Compname 2" or "Compname 3" in the "Brand" column.

 

The Query adds "Doll" in the "Category added" column if "Barbie" is the sub-category and if the category is null. This is currently done with an IF THEN ELSE statement which can be adapted for 20 sub-categories.

 

The Query adds the brand Alpha, Beta, Gamma, Delta, Lambda, Epsilon or Zeta in the "Brand completed" column if there is a partial match in the "Product name" column and if the "Brand" column is empty in the blue table. This is currently done with a nested IF THEN ELSE IF statement and can be adapted if there are e.g 30 brands. However there should be a more suitable solution if you have e.g. 400 brands.

View solution in original post