SOLVED

Remove Duplicates from query

Occasional Contributor

I'm running the following Script to view USB file activity, however it produces filename duplicates, which I am having issues with removing, I have tried playing with the summarize arg_max() but had no luck, as it doesn't work with the Filename field. MountTime and Timestamp are fields I want to avoid.

 

Anyone know how I can work on the FileName field to have unique file names please?

 

let UsbDriveMount = DeviceEvents

| where ActionType=="UsbDriveMounted"

| extend ParsedFields=parse_json(AdditionalFields)

| project DeviceId, DeviceName, DriveLetter=ParsedFields.DriveLetter, MountTime=Timestamp,

ProductName=ParsedFields.ProductName,SerialNumber=ParsedFields.SerialNumber,Manufacturer=ParsedFields.Manufacturer

| order by DeviceId asc, MountTime desc;

let FileCreation = DeviceFileEvents

| where InitiatingProcessAccountName != "system"

| where ActionType == "FileCreated"

| where FolderPath !startswith "C:\\"

| where FolderPath !startswith "\\"

| where InitiatingProcessAccountDomain == "Contoso"

| project ReportId,DeviceId,InitiatingProcessAccountDomain,

InitiatingProcessAccountName,InitiatingProcessAccountUpn,

FileName, FolderPath, SHA256, Timestamp, SensitivityLabel, IsAzureInfoProtectionApplied

| order by DeviceId asc, Timestamp desc;

FileCreation | lookup kind=inner (UsbDriveMount) on DeviceId

| where FolderPath startswith DriveLetter

| where Timestamp >= MountTime

| order by DeviceId asc, Timestamp desc

2 Replies
best response confirmed by AmjadGov (Occasional Contributor)
Solution

@AmjadGov 

I believe you may need to use MountTime in a Arg_max for this. Arg_max removes duplicates and you need to use a number time or date for this unfortunately.

The Arg_max would work below, so it will show the last MountTime for that FileName going to a specific USB. You may see duplicate FileNames but thats because its being saved to multiple USB devices.

 

then create a project with the fields you want and in what order.

 

Snippet of code to replace your join in the code you pasted.

 

FileCreation | lookup kind=inner (UsbDriveMount) on DeviceId
| where FolderPath startswith DriveLetter
| where Timestamp >= MountTime
| summarize arg_max(MountTime, *) by FileName, tostring(ProductName), tostring(SerialNumber)
| project Timestamp, DeviceId, DeviceName, InitiatingProcessAccountDomain,
InitiatingProcessAccountName,InitiatingProcessAccountUpn,
FileName, FolderPath, SHA256, SensitivityLabel, IsAzureInfoProtectionApplied, DriveLetter, MountTime, ProductName, SerialNumber, Manufacturer
| order by DeviceId asc, Timestamp desc
 
Hope this helps, as I know you wanted it without using MountTime.
@MattBurrows
This is perfect and works a treat. Thank you so much.
The reason behind this was, due to the 10k limit in the CSV file, the duplicates were limiting my reporting. Your adjustment was the perfect solution, once again thank you.