SOLVED

Get record with latest date by criteria from excel table and handle blanks

Brass Contributor

Dear colleagues,

I maintain a Covid test record list - name, result, date

NameTest ResultTest date
Alannegative01.01.2021
Brednegative01.01.2021
Alanpositive01.02.2021
Cindypositive01.01.2021
Brednegative02.03.2021
Alannegative01.03.2021
Cindynot tested 
Bredpositive05.03.2021

The task is to extract an unique list of employees with their latest covid test.

The problem is, that the database contains records with "not tested" entries (the person already had Covid) and these records should be considered in the statistic.

I tried MAXIFS Date or Pivot Table, but the result is not 100%.

Power Query or helper column is eventually possible.

See attachment with details.

Thank You for a hint.

Zdenek Moravec

Cesky Krumlov, Czech Republic

18 Replies

@Zdenek_Moravec 

Zdenec, if Power Query is an option that could be

let
    Source = Excel.CurrentWorkbook(){[Name="T_Data"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(
        Source,
        {{"Test date", type date}}
    ),
    #"Fake date" = Table.ReplaceValue(
        #"Changed Type",
        null,#date(9000, 12, 31),
        Replacer.ReplaceValue,{"Test date"}
    ),
    #"Sorted Rows" = Table.Sort(
        #"Fake date",
        {{"Name", Order.Ascending}, {"Test date", Order.Descending}}
    ),
    #"Fix in memory" = Table.AddIndexColumn(
        #"Sorted Rows",
        "Index", 0, 1, Int64.Type
    ),
    #"Keep distinct names" = Table.Distinct(#"Fix in memory", {"Name"}),
    #"Remove fake date" = Table.ReplaceValue(
        #"Keep distinct names",
        #date(9000, 12, 31),
        null,Replacer.ReplaceValue,{"Test date"}
    ),
    #"Remove Index" = Table.RemoveColumns(#"Remove fake date",{"Index"})
in
    #"Remove Index"

The idea is to temporary replace no date for not tested by far away date.

 

Hello Sergei
Sort by name ascending, by date descending and remove duplicates - so easy is it!
The purpose of the Index column is to help Table.Distinct function to keep the first row? (I guess, because if I delete the index column from the process, the Table.Distinct function keeps the last row of each name ...). So how does the Table.Distinct work with the Index column?
Thank You
Zdenek
best response confirmed by Zdenek_Moravec (Brass Contributor)
Solution

@Zdenek_Moravec 

The purpose of Index column is to fix the table in memory. Alternatively you may wrap it by Table.Buffer(). The only small difference you may manipulate with Index from UI only, Table.Buffer() shall be added manually in editor. But result is the same.

 

Without fixing in memory Table.Distinct could give wrong result (could not) due to lazy evaluation performed by Power Query.

Hello@Sergei Baklan 

The power query works OK.

I guess, it could be also possible with formulas similar to the Re: Count values with max date in filtered excel table topic, but I will go this way.

Thank You very much!

Zdenek

@Zdenek_Moravec , you are welcome.

I didn't try with formulas, perhaps. Just Power Query is more straightforward way.

@Zdenek_Moravec 

If you wanted a formula it is a question of having the right version of Excel.

= UNIQUE(T_Data[Name])
= XLOOKUP(distinct#,T_Data[Name],T_Data[Test Result],,,-1)
= XLOOKUP(distinct#,T_Data[Name],T_Data[Test date],,,-1)

gives a solution in 3 parts.  You could also return the entire table with one formula

= LET(
  distinct, UNIQUE(T_Data[Name]),
  r, XMATCH(distinct, T_Data[Name],,-1),
  INDEX(T_Data, r, {1,2,3}) )

Sorting can be introduced but it is not needed in this case.

image.png

Hello @Peter Bartholomew

I love Power Query, but I see, it is time to learn again. We have M365 business and I see all the formulas. I got a new task to create a summary for my manager, so this time I can try to use LET instead of Power Query.

Can You please send me an attachment? When I copy the formula, I get errors

Zdenek_Moravec_1-1615806296402.png

Zdenek_Moravec_2-1615806325055.png

But the formula returns a result, only different:

Alan
positive
0

Thank you

Zdenek

@Zdenek_Moravec 

Zdenek, here is the file with @Peter Bartholomew formula, I only added substituting of blank date on empty string.

Formula shall be converted into your locale automatically.

@Sergei Baklan,

After opening the attachment, I see my mistake. I copied the original formula from the @Peter Bartholomew post and translated it in an online web translator. But the web transformed "\" to ";" in the INDEX Formula.

So I learned by te way, that ";" returns columns array and "\" returns rows array.

Zdenek_Moravec_0-1615815062659.png

The LET works OK now, I am going to play with it.

Thank You both!

Zdenek

Hello@Peter Bartholomew 

I am learning the LET formular on this example. I roughly understand the logic.

The target of this post is to find not the LAST row of each name, but the LATEST row of each name (means the row with max test date of each person). It doesn't need to be on the bottom of the table.

Would You implement the MAXIFS formula into the LET?

(a helper column can contain 31.12.2099 for the "not tested" records)

Thank You

Zdenek

@Zdenek_Moravec 

As variant

= LET(
   p, IF(
        COUNTIFS(T_Data[Name],T_Data[Name], T_Data[Test date],""),
        99999,
        MAXIFS(T_Data[Test date],T_Data[Name],T_Data[Name])
      )=IF(T_Data[Test date]="",99999,T_Data[Test date]),
  r, FILTER(T_Data,p ),
   IF(r=0,"",r))

@Zdenek_Moravec 

Split it on parts to make easier in maintenance

= LET(
   noDate,              99999,
   namesWithNoDate,     COUNTIFS(T_Data[Name],T_Data[Name], T_Data[Test date],""),
   namesActualMaxDates, MAXIFS(T_Data[Test date],T_Data[Name],T_Data[Name]),
   datesWithNoDate,     IF(T_Data[Test date]="",noDate,T_Data[Test date]),
   addFakeDate,         IF( namesWithNoDate, noDate, namesActualMaxDates ),
   conditionMaxDate,    addFakeDate=datesWithNoDate,
   filterOnMaxDate,     FILTER(T_Data,conditionMaxDate ),
   IF(filterOnMaxDate=0,"",filterOnMaxDate)
)

@Peter Bartholomew 

Please accept my apologies for causing confusion.  I had implemented a version of the formula that would accommodate entries out of date sequence but, wrongly, came to the conclusion that the situation was not likely to occur.  Had I realised you were in a position to try the formula, including the workbook with its additional formula would have been helpful.

= LET(
  distinct, UNIQUE(T_Data[Name]),
  sorted, SORT(T_Data, 3),
  r, XMATCH(distinct, INDEX(sorted, ,1),,-1),
  INDEX(sorted, r, {1,2,3}) )

 

image.png

The n/a is achieved by number formatting and the blank test date will always appear as the last date.

Hello both @Sergei Baklan and @Peter Bartholomew 

Thank You for the learning material, I could understand the LET function on my example data.

The question from the subject is now answered.

Zdenek Moravec

@Zdenek_Moravec 

Glad it helped. Solution suggested by @Peter Bartholomew is definitely the best, the rest could be used mainly for education purposes.

Sorry to butt in (I don't mean to hijack the thread), but I have a clarification request:
I noticed that the return value of the LET function is

 

 

INDEX(sorted, r, {1,2,3}) 

 

 


I imagine that the

{1,2,3}

is intended as an array covering all 3 columns in "sorted" as the [column_num] parameter of INDEX. Is there a way to achieve the same thing when the number of columns in the array is not known in advance or may dynamically change?

@JackTradeOne 

You could use SEQUENCE to generate the index array.

You might also need to account for the Date column moving by changing the sort.

= LET(
  distinct, UNIQUE(T_Data[Name]),
  sorted, SORTBY(T_Data, T_Data[Test date]),
  r, XMATCH(distinct, INDEX(sorted, ,1),,-1),
  c, SEQUENCE(1,COLUMNS(T_Data)),
  INDEX(sorted, r, c))

@Peter Bartholomew 

Yes, I indeed forgot to account for the Date column moving about! Thanks!

1 best response

Accepted Solutions
best response confirmed by Zdenek_Moravec (Brass Contributor)
Solution

@Zdenek_Moravec 

The purpose of Index column is to fix the table in memory. Alternatively you may wrap it by Table.Buffer(). The only small difference you may manipulate with Index from UI only, Table.Buffer() shall be added manually in editor. But result is the same.

 

Without fixing in memory Table.Distinct could give wrong result (could not) due to lazy evaluation performed by Power Query.

View solution in original post