Identify duplicate values based on another column

smc1906 · ‎Feb 07 2024

Hi,

I have a table that contains an export from three separate databases. The first column identifies the database (e.g. A, B or C). I need to identify where an email address exists in more than one of the databases. For example, in the table below I need to identify that the email "email1" appears in both database A and B. How can I go about that?

Thanks!

Number	Database	Client Name	Client Email
1	A		email1
2	A		email123
3	A		email456
4	B		email789
5	B		email1
6	C		emailxyz

OliverScheurich · ‎Feb 07 2024

@smc1906

=IF(COUNTA(FILTER($B$2:$B$7,$D$2:$D$7=D2))>1,TRANSPOSE(FILTER($B$2:$B$7,$D$2:$D$7=D2)),"")

If you have access to the FILTER function you can apply this formula. The formula is in cell F2 and filled down.

smylbugti222gmailcom · ‎Feb 07 2024

@smc1906

I'd be glad to assist you with identifying duplicate email addresses based on the database column in your table. Here are two effective methods you can use:

Method 1: Conditional Formatting

Select the entire table or the relevant columns (Database, Client Name, and Client Email).
Go to the Home tab in your spreadsheet software (e.g., Excel, Google Sheets).
Click Conditional Formatting.
Choose Highlight Cells Rules > Duplicate Values.
In the Format values where this formula is true box, enter the following formula, depending on your software and whether you want to include or exclude the first occurrence:
- Include the first occurrence: =COUNTIF($A$2:$A2, $A2)>1 (replace $A2 and $A2:$A2 with the appropriate ranges for your database column)
- Exclude the first occurrence: =COUNTIF($A$3:$A2, $A2)>1 (replace $A2, $A3, and $A2:$A2 with the appropriate ranges)
Click Format to choose a highlighting color for the duplicates.
Click OK.

This will highlight all cells in the Client Email column that appear in multiple rows with different database values.

Method 2: Pivot Tables

Create a pivot table from your data.
Place the Database column in the Rows area.
Place the Client Email column in the Values area.
Set the Value Field Settings to Count or Count of Unique Values.

This will create a table showing the number of times each email address appears in each database. You can then identify the duplicates by looking for values greater than 1.

Additional Tips:

If you have many rows in your table, conditional formatting might be slower than a pivot table.
You can adjust the criteria in the conditional formatting formula to suit your specific needs. For example, you could only highlight duplicates that appear in more than two databases.
If you're using a different spreadsheet software, the specific steps for conditional formatting and pivot tables might differ slightly. Consult the software's documentation for more details.

I hope these methods help you effectively identify duplicate email addresses based on the database column in your table!

smc1906 · ‎Feb 07 2024

Hi, thanks! This seems to work, but also identifies when there is a duplicate within the same database. I was hoping to identify when an email exists in at least two databases only.

For example if in database A an email exists twice, I don't need to know that. But if an email exists twice in database A and once in database C it should be highlighted.

smc1906 · ‎Feb 07 2024

@smc1906

=IF(COUNTA(UNIQUE(FILTER($B$2:$B$13,$D$2:$D$13=D2)))>1,TRANSPOSE(UNIQUE(FILTER($B$2:$B$13,$D$2:$D$13=D2))),"")

You are welcome. This formula should return the required result. For example email3 occurs two times in database A and isn't recognised as a duplicate. Email2 is twice in database A and once in database C as is returned as a duplicate.

smylbugti222gmailcom · ‎Feb 13 2024

@OliverScheurich

I'd be glad to help you with the formula and understand the email3 and email2 results you're seeing.

Formula Breakdown:

The formula you provided, =IF(COUNTA(UNIQUE(FILTER($B$2:$B$13,$D$2:$D$13=D2)))>1,TRANSPOSE(UNIQUE(FILTER($B$2:$B$13,$D$2:$D$13=D2))),""), is used to identify and list unique values in column B (range $B$2:$B$13) that appear more than once in rows where column D (range $D$2:$D$13) has the same value as the value in cell D2. Here's a step-by-step explanation:

FILTER($B$2:$B$13,$D$2:$D$13=D2): This part filters column B based on the condition that the corresponding values in column D are equal to the value in cell D2.
UNIQUE(...): This removes duplicate values from the filtered range.
COUNTA(...): This counts the number of unique values remaining after the UNIQUE function.
>1: This checks if the number of unique values is greater than 1 (meaning there are duplicates).
TRANSPOSE(...): If duplicates exist, this part transposes the unique values into a column format.
"": If no duplicates are found, an empty string ("") is returned.

Possible Explanation for email3 Not Being Recognized:

There are a few possibilities why email3 might not be considered a duplicate:

Case Sensitivity: If your data is case-sensitive (e.g., "Email3" is different from "email3"), the formula might not recognize them as duplicates. Ensure consistent casing throughout your data.
Special Characters: If email3 contains special characters that are not present in other email addresses, the formula might treat them as distinct values. Check for any inconsistencies in formatting or characters.
Data Range: Make sure the formula's range ($B$2:$B$13 and $D$2:$D$13) encompasses all relevant email addresses. If email3 is outside this range, it won't be considered.

Clarifying email2 Results:

Without more information about your data, it's difficult to pinpoint the exact reason why email2 might be considered a duplicate in some cases but not others. However, here are some potential explanations:

Multiple Occurrences: If email2 appears twice in Database A and once in Database C, it would be considered a duplicate based on the formula's logic, even though it appears in different databases.
Database C Filter: If Database C has a filter applied that excludes the second occurrence of email2, the formula might not be able to access it and therefore not consider it a duplicate.
Data Inconsistencies: If there are inconsistencies in how email addresses are formatted or stored, the formula might not identify them as duplicates correctly.

I'll do my best to help you understand the formula's behavior

smc1906 · ‎Feb 07 2024

@smc1906

=IF(COUNTA(UNIQUE(FILTER($B$2:$B$13,$D$2:$D$13=D2)))>1,TRANSPOSE(UNIQUE(FILTER($B$2:$B$13,$D$2:$D$13=D2))),"")

You are welcome. This formula should return the required result. For example email3 occurs two times in database A and isn't recognised as a duplicate. Email2 is twice in database A and once in database C as is returned as a duplicate.

View solution in original post

Identify duplicate values based on another column

Identify duplicate values based on another column

Re: Identify duplicate values based on another column

Re: Identify duplicate values based on another column

Re: Identify duplicate values based on another column

Re: Identify duplicate values based on another column

Re: Identify duplicate values based on another column

Re: Identify duplicate values based on another column

Products (50)

Special Topics (27)

Video Hub (462)

Most Active Hubs

Most Active Hubs

Video Hub

Identify duplicate values based on another column