Help Identifying Employees in a List

Copper Contributor

Hi!

 

So I have a list of about 80K rows.  It contains 80K file transfers that were either uploaded by employees, but many of them are downloaded anonymously.  I am trying to match the anonymous download to the employee who uploaded it.  Example

 

FileA shows it was downloaded anonymously and if I filter the File Column by the word "FileA" I will see that the file was downloaded, and I can also see the file was uploaded by John Smith (our employee)

 

Not every file in this list was downloaded anonymously.  That is the only criteria I am interested in.  The outcome of this list will be a report of employees (probably around 50 or so out of 80K lines) who use this system to upload files with the purpose of them being downloaded anonymously.  

 

So if there is a way to scan the entire list of File Names then generate the output of the employee name who uploaded it.  that would be great!

9 Replies

@PaulyCA 

=SORT(UNIQUE(FILTER(uploaded_by_column, downloaded_by_column="Anonymous")))

@Hans Vogelaar  - Thank you for your reply.  We're getting close.  Let's see if we can knock this out with more specifics

 

Here is an example of what the sheet would look like. 

 

This shows that FileA was downloaded by Anonymous, but actually uploaded by John Smith (The file name will match every time to whomever downloads the data anonymously, but only show a single line for an upload by a single employee.  That's who I am looking for. 

 

Same thing applies to File B -- that was downloaded several times by anonymous, but only uploaded by Jane

 

So the output I am looking at in the 80K records is the list of employees names in Column B who uploaded something, that was downloaded by User "Anonymous"

 

I know this is really hard to explain -- appreciate the help! 

 

Column AColumn B
FileAAnonymous
FileAAnonymous
FileAAnonymous
FileAAnonymous
FileAJohn Smith
FileBAnonymous
FileBAnonymous
File BJane Smith

@PaulyCA 

It was confusing since you had FileB and File B.

See the attached demo for a solution using an intermediate helper range that might be useful.

@PaulyCA If it's just a list of employee names you need returned for files that were downloaded anonymously, the following formula might do the trick:

 

=UNIQUE(TOCOL(MAP(A2:A20, B2:B20, LAMBDA(a,b, IFS(COUNTIFS(A2:A20, a, B2:B20, "Anonymous")*(b<>"Anonymous"), b))), 2))

 

Alternatively, if you want to return both the filenames and employee names, you could try a using a double-filter formula as follows:

 

=LET(
    anon_files, UNIQUE(FILTER(A2:A20, B2:B20="Anonymous")),
    FILTER(A2:B20, ISNUMBER(XMATCH(A2:A20, anon_files))*(B2:B20<>"Anonymous"))
)

 

Please adjust the range references to meet your needs. If performance is an issue with 80K rows of data, the Advanced Filter feature can also be used with the same logic as the first formula shown above.

 

Please see the attached workbook...

@PaulyCA  alternative single formula

=LET(Files, A2:A11, Names, B2:B11,
anonFiles,UNIQUE(FILTER(Files,Names="Anonymous","")),
UNIQUE(FILTER(Names, (Names<>"Anonymous")*ISNUMBER(XMATCH(Files,anonFiles)),"")))

@PaulyCA As an improvement to my previous post using the TOCOL / IFS method, the MAP function is not needed as COUNTIFS can spill its results for the entire dataset by passing the entire criteria_range1 to the criteria1 argument:

 

=UNIQUE(TOCOL(IFS(COUNTIFS(A2:A20, A2:A20, B2:B20, "Anonymous")*(B2:B20<>"Anonymous"), B2:B20), 2))

 

This method can also be combined with the WRAPROWS function to return both the filenames and employee names if desired (the FILTER function is not needed here):

 

=UNIQUE(WRAPROWS(TOCOL(IFS(COUNTIFS(A2:A20, A2:A20, B2:B20, "Anonymous")*(B2:B20<>"Anonymous"), A2:B20), 2), 2))

 

See attached...

@djclements I like your use of countifs to make the conditional but might suggest using FILTER as a more direct output:

=FILTER(A2:B20,COUNTIFS(A2:A20,A2:A20,B2:B20,"Anonymous")*(B2:B20<>"Anonymous"),"")

then B2:B20 would output just the User and A2:B20 (as shown above) outputs both

@m_tarler Fair point. I've been experimenting with TOCOL / IFS recently to see where it can be used. I've read that, in general, TOCOL is faster than FILTER on larger datasets; however, I just tested both formulas in this scenario with 80,000 rows of data, and they were both equally poor to extract the results (30 seconds each). The COUNTIFS function was to blame for the poor performance, though...

 

As it turns out, the double-filter formula I shared in my first post successfully processed 80,000 rows of data in a split second:

 

=LET(
    anon_files, UNIQUE(FILTER(A2:A80000, B2:B80000="Anonymous")),
    FILTER(A2:B80000, ISNUMBER(XMATCH(A2:A80000, anon_files))*(B2:B80000<>"Anonymous"))
)

 

I probably should've done a full test on a large dataset before posting to see which method was worth sharing... oh well.

 

BTW, not sure if you noticed, but the formula you shared in your first post was virtually identical in structure to mine, with just a slight variation to return the employee names only. ;)

good thing to note about how slow countifs is.
as for the similarity in formulas, I didn't notice but now you mention it yea, they are basically identical. I was looking at Hans' and your first formula. I guess I assumed your second would be similar to/ based on the first.
Maybe they will include in Labs a recommendation tool tip when you use 1 function and it can recommend another variant that is more efficient. Sounds like some cool AI.