Mar 10 2021 01:03 PM
Dear colleagues,
I maintain a Covid test record list - name, result, date
Name | Test Result | Test date |
Alan | negative | 01.01.2021 |
Bred | negative | 01.01.2021 |
Alan | positive | 01.02.2021 |
Cindy | positive | 01.01.2021 |
Bred | negative | 02.03.2021 |
Alan | negative | 01.03.2021 |
Cindy | not tested | |
Bred | positive | 05.03.2021 |
The task is to extract an unique list of employees with their latest covid test.
The problem is, that the database contains records with "not tested" entries (the person already had Covid) and these records should be considered in the statistic.
I tried MAXIFS Date or Pivot Table, but the result is not 100%.
Power Query or helper column is eventually possible.
See attachment with details.
Thank You for a hint.
Zdenek Moravec
Cesky Krumlov, Czech Republic
Mar 10 2021 02:07 PM
Zdenec, if Power Query is an option that could be
let
Source = Excel.CurrentWorkbook(){[Name="T_Data"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(
Source,
{{"Test date", type date}}
),
#"Fake date" = Table.ReplaceValue(
#"Changed Type",
null,#date(9000, 12, 31),
Replacer.ReplaceValue,{"Test date"}
),
#"Sorted Rows" = Table.Sort(
#"Fake date",
{{"Name", Order.Ascending}, {"Test date", Order.Descending}}
),
#"Fix in memory" = Table.AddIndexColumn(
#"Sorted Rows",
"Index", 0, 1, Int64.Type
),
#"Keep distinct names" = Table.Distinct(#"Fix in memory", {"Name"}),
#"Remove fake date" = Table.ReplaceValue(
#"Keep distinct names",
#date(9000, 12, 31),
null,Replacer.ReplaceValue,{"Test date"}
),
#"Remove Index" = Table.RemoveColumns(#"Remove fake date",{"Index"})
in
#"Remove Index"
The idea is to temporary replace no date for not tested by far away date.
Mar 10 2021 02:40 PM
Mar 11 2021 01:08 AM
SolutionThe purpose of Index column is to fix the table in memory. Alternatively you may wrap it by Table.Buffer(). The only small difference you may manipulate with Index from UI only, Table.Buffer() shall be added manually in editor. But result is the same.
Without fixing in memory Table.Distinct could give wrong result (could not) due to lazy evaluation performed by Power Query.
Mar 14 2021 06:46 AM
Hello@Sergei Baklan
The power query works OK.
I guess, it could be also possible with formulas similar to the Re: Count values with max date in filtered excel table topic, but I will go this way.
Thank You very much!
Zdenek
Mar 14 2021 08:17 AM
@Zdenek_Moravec , you are welcome.
I didn't try with formulas, perhaps. Just Power Query is more straightforward way.
Mar 14 2021 03:26 PM
If you wanted a formula it is a question of having the right version of Excel.
= UNIQUE(T_Data[Name])
= XLOOKUP(distinct#,T_Data[Name],T_Data[Test Result],,,-1)
= XLOOKUP(distinct#,T_Data[Name],T_Data[Test date],,,-1)
gives a solution in 3 parts. You could also return the entire table with one formula
= LET(
distinct, UNIQUE(T_Data[Name]),
r, XMATCH(distinct, T_Data[Name],,-1),
INDEX(T_Data, r, {1,2,3}) )
Sorting can be introduced but it is not needed in this case.
Mar 15 2021 04:07 AM
Hello @Peter Bartholomew
I love Power Query, but I see, it is time to learn again. We have M365 business and I see all the formulas. I got a new task to create a summary for my manager, so this time I can try to use LET instead of Power Query.
Can You please send me an attachment? When I copy the formula, I get errors
But the formula returns a result, only different:
Alan |
positive |
0 |
Thank you
Zdenek
Mar 15 2021 05:18 AM
Zdenek, here is the file with @Peter Bartholomew formula, I only added substituting of blank date on empty string.
Formula shall be converted into your locale automatically.
Mar 15 2021 06:36 AM
After opening the attachment, I see my mistake. I copied the original formula from the @Peter Bartholomew post and translated it in an online web translator. But the web transformed "\" to ";" in the INDEX Formula.
So I learned by te way, that ";" returns columns array and "\" returns rows array.
The LET works OK now, I am going to play with it.
Thank You both!
Zdenek
Mar 15 2021 07:22 AM
Hello@Peter Bartholomew
I am learning the LET formular on this example. I roughly understand the logic.
The target of this post is to find not the LAST row of each name, but the LATEST row of each name (means the row with max test date of each person). It doesn't need to be on the bottom of the table.
Would You implement the MAXIFS formula into the LET?
(a helper column can contain 31.12.2099 for the "not tested" records)
Thank You
Zdenek
Mar 15 2021 10:04 AM
As variant
= LET(
p, IF(
COUNTIFS(T_Data[Name],T_Data[Name], T_Data[Test date],""),
99999,
MAXIFS(T_Data[Test date],T_Data[Name],T_Data[Name])
)=IF(T_Data[Test date]="",99999,T_Data[Test date]),
r, FILTER(T_Data,p ),
IF(r=0,"",r))
Mar 15 2021 12:12 PM
Split it on parts to make easier in maintenance
= LET(
noDate, 99999,
namesWithNoDate, COUNTIFS(T_Data[Name],T_Data[Name], T_Data[Test date],""),
namesActualMaxDates, MAXIFS(T_Data[Test date],T_Data[Name],T_Data[Name]),
datesWithNoDate, IF(T_Data[Test date]="",noDate,T_Data[Test date]),
addFakeDate, IF( namesWithNoDate, noDate, namesActualMaxDates ),
conditionMaxDate, addFakeDate=datesWithNoDate,
filterOnMaxDate, FILTER(T_Data,conditionMaxDate ),
IF(filterOnMaxDate=0,"",filterOnMaxDate)
)
Mar 15 2021 01:18 PM
Please accept my apologies for causing confusion. I had implemented a version of the formula that would accommodate entries out of date sequence but, wrongly, came to the conclusion that the situation was not likely to occur. Had I realised you were in a position to try the formula, including the workbook with its additional formula would have been helpful.
= LET(
distinct, UNIQUE(T_Data[Name]),
sorted, SORT(T_Data, 3),
r, XMATCH(distinct, INDEX(sorted, ,1),,-1),
INDEX(sorted, r, {1,2,3}) )
The n/a is achieved by number formatting and the blank test date will always appear as the last date.
Mar 15 2021 05:06 PM
Hello both @Sergei Baklan and @Peter Bartholomew
Thank You for the learning material, I could understand the LET function on my example data.
The question from the subject is now answered.
Zdenek Moravec
Mar 16 2021 01:49 AM
Glad it helped. Solution suggested by @Peter Bartholomew is definitely the best, the rest could be used mainly for education purposes.
Mar 16 2021 07:07 AM
Sorry to butt in (I don't mean to hijack the thread), but I have a clarification request:
I noticed that the return value of the LET function is
INDEX(sorted, r, {1,2,3})
I imagine that the
{1,2,3}
is intended as an array covering all 3 columns in "sorted" as the [column_num] parameter of INDEX. Is there a way to achieve the same thing when the number of columns in the array is not known in advance or may dynamically change?
Mar 16 2021 08:02 AM
You could use SEQUENCE to generate the index array.
You might also need to account for the Date column moving by changing the sort.
= LET(
distinct, UNIQUE(T_Data[Name]),
sorted, SORTBY(T_Data, T_Data[Test date]),
r, XMATCH(distinct, INDEX(sorted, ,1),,-1),
c, SEQUENCE(1,COLUMNS(T_Data)),
INDEX(sorted, r, c))
Mar 16 2021 08:12 AM
Yes, I indeed forgot to account for the Date column moving about! Thanks!
Mar 11 2021 01:08 AM
SolutionThe purpose of Index column is to fix the table in memory. Alternatively you may wrap it by Table.Buffer(). The only small difference you may manipulate with Index from UI only, Table.Buffer() shall be added manually in editor. But result is the same.
Without fixing in memory Table.Distinct could give wrong result (could not) due to lazy evaluation performed by Power Query.