Forum Discussion

restats's avatar
restats
Copper Contributor
Jul 30, 2023
Solved

Extracting Records

How can I extract the records that are not included in two databases?

In combining the two databases Excel allows me to delete duplicates but can I use this to eliminate all that are two of a kind leaving me only the records not in both?

Thanks in advance.

restats

  • restats 

    There are some addresses that should be the same but differ slightly:

    10101 NE 46th Ave  Vancouver, WA 98686 
    10101 NE 46th Ave, Vancouver, WA 98686

    One option is to add helper columns:

    AD2 contains the formula =TRIM(SUBSTITUTE(G2, ",", ""))

    This eliminates differences such as comma vs no comma and double space vs single space.

    It does not eliminate spelling differences.

    AE2 contains the formula =COUNTIF(AD:AD, AD2)=1

    This returns TRUE if the helper address is unique.

    You can then filter on TRUE and copy the selected rows to another sheet, either manually using AutoFilter or using Advanced Filter or using the FILTER function:

     

    =FILTER(CJun23!A2:AD1323,CJun23!AE2:AE1323)

     

  • flexyourdata's avatar
    flexyourdata
    Iron Contributor

    restats 

     

    Suppose you have two tables with address information and differing attributes attached to each. 

     

     

    If you load each as a query in Power Query, such that you have:

     

     

    Then you can use a query similar to this to remove the duplicates while retaining those that are unique (including the additional attributes):

     

    let
        DeduplicateOn = {"Address","City"},
        Source = Table.Combine({Query_Address1, Query_Address2}),
        Group = Table.Group(Source, DeduplicateOn, {{"Rows",each Table.RowCount(_)}}),
        Filter = Table.RemoveColumns(Table.SelectRows(Group,each [Rows]=1),{"Rows"}),
        Result = Table.Join(Source,DeduplicateOn,Filter,DeduplicateOn)
    in
        Result

     

    The result being:

     

     

  • flexyourdata's avatar
    flexyourdata
    Iron Contributor

    restats 

     

    Given there's not much detail to go on, you can try something similar to this:

     

    =UNIQUE(VSTACK(Table1,Table2),,TRUE)

     

    As shown here:

     

     

    The important part is the TRUE for the third argument to UNIQUE. This will ensure that you only get results that appear exactly once in the stacked tables. 

  • restats's avatar
    restats
    Copper Contributor
    Let me correct this query because my data base is made up of properties. What I want to eliminate are the records that have the same addresses even though the other fields may have some different values. I attempted to use the Unique function but learned that the record fields must all be the same for elimination.

Resources