Forum Discussion
Andrew__K
Dec 04, 2023Brass Contributor
PowerQuery Remove Rows dependent on value in column.
Labelling software generates a transaction report that represents each label generated, I pull data from this report using PowerQuery. Each row is an individual label or transaction.
When an error is made during the labelling process and the operator deletes/reverses that label, the transaction report shows it as a normal transaction with a 1 in a Transaction Type (Trx Type) column, followed by another row with the same information, but has a -1 as the transaction type to represent the deleted label.
Is there any way I can remove both these rows in my query? I know I can filter -1 Trx Types, but this leaves the original transaction in place. There is also a serial column for each transaction, these match on the -1 and 1 rows, so could potentially be used? Below is an example of the report.
Label Date | Product PPN | Description | Prod Date | Serial Number | Trx Type |
4/12/2023 | 99999 | PRODUCT LABEL | 2/11/2023 | 70001 | 1 |
4/12/2023 | 99999 | PRODUCT LABEL | 2/11/2023 | 70002 | 1 |
4/12/2023 | 99999 | PRODUCT LABEL | 2/11/2023 | 70003 | 1 |
4/12/2023 | 99999 | PRODUCT LABEL | 2/11/2023 | 70004 | 1 |
4/12/2023 | 99999 | PRODUCT LABEL | 2/11/2023 | 70005 | 1 |
4/12/2023 | 99999 | PRODUCT LABEL | 2/11/2023 | 70006 | 1 |
4/12/2023 | 99999 | PRODUCT LABEL | 2/11/2023 | 70007 | 1 |
4/12/2023 | 99999 | PRODUCT LABEL | 2/11/2023 | 70008 | 1 |
4/12/2023 | 99999 | PRODUCT LABEL | 2/11/2023 | 70009 | 1 |
4/12/2023 | 99999 | PRODUCT LABEL | 2/11/2023 | 70010 | 1 |
4/12/2023 | 99999 | PRODUCT LABEL | 2/11/2023 | 70011 | 1 |
4/12/2023 | 99999 | PRODUCT LABEL(*Deleted*) | 2/11/2023 | 70011 | -1 |
4/12/2023 | 99999 | PRODUCT LABEL | 2/11/2023 | 70012 | 1 |
4/12/2023 | 99999 | PRODUCT LABEL | 2/11/2023 | 70013 | 1 |
Alternative:
- Group by [Serial Number]: Count Rows + (keep) All Rows- Filter out [Count] > 1
- Combine tables (with Advanced Editor)
As variant
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content], Flagged = Table.SelectRows(Source, each ([Trx Type] = -1)), RemoveThem = Table.SelectRows( Source, each not List.Contains( Flagged[#" Serial Number "], [#" Serial Number "] ) ), DeclareType = Table.TransformColumnTypes( RemoveThem, { {"Label Date", type date} , {" Product PPN ", Int64.Type} , {"Description ", type text} , {"Prod Date", type date} , {" Serial Number ", Int64.Type} , {"Trx Type", Int64.Type} }) in DeclareType
- LorenzoSilver Contributor
If your records are ordered as you shared a possible optimization when grouping [Serial Number]:
GroupedSerial = Table.Group(Source, {" Serial Number "}, { {"COUNT", each Table.RowCount(_), Int64.Type}, {"DATA", each _, type table} }, GroupKind.Local // Possible optimization ),
Interesting when you have thousandsss of records otherwise you'll likely not see the difference
(GroupKind.Type) - LorenzoSilver Contributor
Alternative:
- Group by [Serial Number]: Count Rows + (keep) All Rows- Filter out [Count] > 1
- Combine tables (with Advanced Editor)- Andrew__KBrass ContributorThank you! with your suggestion, I was able to find a solution.
I ended up creating a helper merged column with PPN-DATE-SERIAL, then grouped by this column and filtered out any rows >1. Then I just expanded the All-rows column and removed the helper column. Thank you!
- Riny_van_EekelenPlatinum Contributor
Andrew__K I would Group (Count Rows) by Serial Number. Keep only serial numbers that have a count of 1. Then merge that table with the table from the Source step (RightOuter) and clean up a bit.
See attached file.