Home

Merging large data sets

%3CLINGO-SUB%20id%3D%22lingo-sub-771600%22%20slang%3D%22en-US%22%3EMerging%20large%20data%20sets%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-771600%22%20slang%3D%22en-US%22%3E%3CP%3EHi%20All%2C%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EPlease%20accept%20my%20apologies%20if%20there%20is%20a%20similar%20request%20asked%20already%2C%20or%20if%20this%20is%20a%20profoundly%20stupid%20question%2C%20any%20help%20would%20be%20greatly%20received.%26nbsp%3B%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EI%20have%20two%20large%20data%20sets.%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3E1%20is%20all%20of%20the%20property%20sales%20in%20an%20area%2C%26nbsp%3B%3C%2FP%3E%3CP%3E2%20is%20all%20the%20properties%20epc%20ratings%20by%20area%20showing%20square%20footage%2C%20energy%20usage%20etc.%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EI%20want%20to%20merge%20the%20two%20data%20sets%20into%20one%20large%20table%20so%20that%20I%20can%20chart%20the%20information.%26nbsp%3B%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EThe%20only%20common%20denominator%20between%20the%20two%20data%20sets%20is%20the%20address%20of%20the%20property.%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EWhat%20would%20the%20formula%20be%20to%20use%20the%20spreadsheet%201%20(Sold%20price%20data)%20as%20my%20base%2C%20and%20add%20all%20of%20the%20information%20to%20it%20from%20spreadsheet%202%20(EPC%20Data)%3F%26nbsp%3B%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EI%20would%20like%20all%20the%20data%20from%20spreadsheet%202%20added%20to%20each%20line%20of%20spreadsheet%201.%20I%20guess%20I%20need%20a%20formula%20to%20look%20up%20and%20make%20sure%20each%20full%20address%20and%20post%20code%20matches%20(they%20are%20in%20separate%20columns)%2C%20and%20then%20add%20each%20field%20from%20spreadsheet%202%20over%20to%20spreadsheet%201%3F%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3E%3CEM%3EI%20have%20attached%20an%20example%20file%20to%20show%20the%20two%20data%20types%20I%20have%20although%20they%20will%20not%20match%20as%20I%20have%20taken%2010%20lines%20or%20so%20from%20millions%20of%20lines%20of%20data.%3C%2FEM%3E%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EThanks%20in%20advance%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-LABS%20id%3D%22lingo-labs-771600%22%20slang%3D%22en-US%22%3E%3CLINGO-LABEL%3EExcel%3C%2FLINGO-LABEL%3E%3C%2FLINGO-LABS%3E%3CLINGO-SUB%20id%3D%22lingo-sub-771615%22%20slang%3D%22en-US%22%3ERe%3A%20Merging%20large%20data%20sets%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-771615%22%20slang%3D%22en-US%22%3EIt%20does%20not%20help%20that%20the%20first%20tab%20does%20not%20have%20any%20header%20information%20in%20the%20first%20row.%20Does%20your%20real%20data%20have%20that%3F%20If%20not%2C%20we%20still%20would%20have%20to%20know%20which%20is%20which%20to%20help%20you...%3C%2FLINGO-BODY%3E%3CLINGO-SUB%20id%3D%22lingo-sub-771635%22%20slang%3D%22en-US%22%3ERe%3A%20Merging%20large%20data%20sets%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-771635%22%20slang%3D%22en-US%22%3E%3CP%3E%3CA%20href%3D%22https%3A%2F%2Ftechcommunity.microsoft.com%2Ft5%2Fuser%2Fviewprofilepage%2Fuser-id%2F22322%22%20target%3D%22_blank%22%3E%40Jan%20Karel%20Pieterse%3C%2FA%3E%26nbsp%3B%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EApologies%20and%20agreed.%26nbsp%3B%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EI%20have%20uploaded%20the%20file%20again%20with%20column%20headers.%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EThank%20you%20for%20looking%20at%20this%20for%20me.%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3ERegards%2C%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-SUB%20id%3D%22lingo-sub-771763%22%20slang%3D%22en-US%22%3ERe%3A%20Merging%20large%20data%20sets%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-771763%22%20slang%3D%22en-US%22%3EOK%2C%20here%20is%20what%20I'd%20do.%3CBR%20%2F%3EIn%20both%20sheets%20add%20an%20additional%2C%20calculated%20column%20which%20combines%20the%20data%20from%20the%20address%20columns%2C%20like%20so%20(example%20for%20the%20EPC%20data%20sheet)%3A%3CBR%20%2F%3E%3DUPPER(TRIM(B2)%26amp%3B%22%7C%22%26amp%3BTRIM(C2)%26amp%3B%22%7C%22%26amp%3BTRIM(D2)%26amp%3B%22%7C%22%26amp%3BTRIM(E2))%3CBR%20%2F%3EUPPER%20changes%20everything%20to%20upper%20case%20and%20trim%20removes%20leading%20and%20training%20spaces.%3CBR%20%2F%3ESuppose%20you%20inserted%20a%20similar%20formula%20in%20column%20R%20of%20the%20first%20sheet%2C%20then%20in%20the%20second%20sheet%2C%20next%20to%20your%20new%20column%20(say%20column%20CH)%20with%20the%20above%20formula%20you%20enter%20this%20formula%20in%20CI2%3A%3CBR%20%2F%3E%3DMATCH(CH2%2C'Sold%20home%20data'!%24Q%3A%24Q%2C0)%3CBR%20%2F%3EThese%20should%20return%20row%20numbers%20where%20your%20matched%20data%20is.%3CBR%20%2F%3EFinally%2C%20in%20columns%20to%20the%20right%20of%20that%20you%20can%20fetch%20the%20information%20you%20need%20with%20a%20formula%20like%3A%3CBR%20%2F%3E%3DINDEX('Sold%20home%20data'!H%3AH%2CCI2)%3C%2FLINGO-BODY%3E
BaronCabot
New Contributor

Hi All,

 

Please accept my apologies if there is a similar request asked already, or if this is a profoundly stupid question, any help would be greatly received. 

 

I have two large data sets.

 

1 is all of the property sales in an area, 

2 is all the properties epc ratings by area showing square footage, energy usage etc.

 

I want to merge the two data sets into one large table so that I can chart the information. 

 

The only common denominator between the two data sets is the address of the property.

 

What would the formula be to use the spreadsheet 1 (Sold price data) as my base, and add all of the information to it from spreadsheet 2 (EPC Data)? 

 

I would like all the data from spreadsheet 2 added to each line of spreadsheet 1. I guess I need a formula to look up and make sure each full address and post code matches (they are in separate columns), and then add each field from spreadsheet 2 over to spreadsheet 1?

 

I have attached an example file to show the two data types I have although they will not match as I have taken 10 lines or so from millions of lines of data.

 

Thanks in advance

4 Replies
Highlighted
It does not help that the first tab does not have any header information in the first row. Does your real data have that? If not, we still would have to know which is which to help you...

@Jan Karel Pieterse 

 

Apologies and agreed. 

 

I have uploaded the file again with column headers.

 

Thank you for looking at this for me.

 

Regards,

OK, here is what I'd do.
In both sheets add an additional, calculated column which combines the data from the address columns, like so (example for the EPC data sheet):
=UPPER(TRIM(B2)&"|"&TRIM(C2)&"|"&TRIM(D2)&"|"&TRIM(E2))
UPPER changes everything to upper case and trim removes leading and training spaces.
Suppose you inserted a similar formula in column R of the first sheet, then in the second sheet, next to your new column (say column CH) with the above formula you enter this formula in CI2:
=MATCH(CH2,'Sold home data'!$Q:$Q,0)
These should return row numbers where your matched data is.
Finally, in columns to the right of that you can fetch the information you need with a formula like:
=INDEX('Sold home data'!H:H,CI2)

@Jan Karel Pieterse 

 

Thank you so much! I will be trying this over the weekend and see what results we get.

Related Conversations
Tabs and Dark Mode
cjc2112 in Discussions on
46 Replies
Extentions Synchronization
Deleted in Discussions on
3 Replies
Stable version of Edge insider browser
HotCakeX in Discussions on
35 Replies
How to Prevent Teams from Auto-Launch
chenrylee in Microsoft Teams on
30 Replies
flashing a white screen while open new tab
Deleted in Discussions on
14 Replies
Security Community Webinars
Valon_Kolica in Security, Privacy & Compliance on
13 Replies