Help with reference formulas for a family tree builder.

New Contributor

Hi All!

I'm trying to make a family tree builder in excel which will eventually turn into a many tabbed family report based on CSV data converted from a Gedcom (genealogy software file).


I have attached a mock file.


Each person in the tree is referred to by a SOSA # (an individual's fathers number is their number x2, the mothers number is x2+1 and so that continues up the tree, with the home individual being #1). I'd like to use the SOSA number in my table to refer each vital fact to the tree. I've tried using HLOOKUP to refer the data, but after about 20 different variations I can't seem to find the right combo. If it doesn't give me an error, it comes back a strange number!


Likewise, Id like to find one reference formula (USING the Sosa formula from the previous individual) to just exponentially create the reference so I don't have to type  =SOSA 242,  =SOSA 484 and onward. 


Any Excel super-geniuses willing to feed me formula's? I may be asking too much, but any help or suggestions would help me work it out. Frankly I can't believe someone hasn't published a more comprehensive template to work this out. Everything seems to be manual and thats quite a bit of data entry.

Thank you in advance!

3 Replies

@MelanieBeth I'm a bit confused at what exactly you want.  I created the following formula based on the general format you have for your tree.  It assumes you start in the 2nd column with the 1st gen and the data is in the 3rd column and each generation after is in the following pair of columns.  So all you need to do is set 1 cell to the SOSA# you want to start with and it will generate all the others.  You can add additional generations if you want also:


See attached for how it works.

If you don't have array formulas you might need to enter CTRL-SHIFT-ENTER or break it into 3 formulas for Name, Birth, and Death instead of the single array I created here (basically instead of $E:$G you need to use $E:$E then $F:$F and then $G:$G).  If you don't have XLOOKUP() it can be recreated using INDEX()-MATCH().

I hope this works, but if it isn't what you wanted please explain what you need a little more.


This is great! Its exactly what I wanted!!! I knew there had to be functions to carry the data over and autogenerate. Ill, admit the formula is a bit (read: entirely) over my head, but now I know what elements to study. 

I'm sure it does work, but there so much in the formula that I don't understand. Would you mind breaking the formula down to one bit of info without the array, (ex Birth) and without the generational building? Just so I can dissect it?

 Thank you again!

@MelanieBeth  Sure, no problem.  Here is the formula:


TRANSPOSE just converts it to a vertical array instead of horizontal

XLOOKUP finds the information based on:

    Lookup_Value =  $C$14*2^((COLUMN()-3)/2)+COUNTIF(B$1:B15,"=Birth")

               this calculation is based on the COLUMN() of "this" cell the formula is in and based on the layout you are using, by subtracting 3 and dividing by 2 it gives the power to which 2 needs to be raised for that generation and multiplies that by the initial Key Value (located at $C$14).

               then I add the count of values being done above by counting how many times you have the Label "Birth" in the rows starting at B$1 up until this row (B15 in this example) to increment within that generation.

    Lookup_Array = gedcomdata!$A:$A

                simply the column to search

    Lookup_Return = gedcomdata!$E:$G

                 returns the values from column E through G of the row corresponding to where the lookup was found.  So to make this NOT an array you just need to use $E:$E for NAME, $F:$F for BIRTH, or $G:$G for DEATH (and you don't need the TRANSPOSE since it is only 1 value)

     If_not_found = "N/A" = is just what it will show if the lookup value can't be found

     Search_type = 0 = just says to only find exact matches


So here is an example of the NAME without it being the Array:




as you can see it doesn't change much which is part of why making it an array is so nice.


I hope that helps you understand the formula better.  Again, this is based on you specific layout and may need to be changed if you change your layout (e.g. if you insert additional columns between generations).  A more flexible layout/formula would be to include an "index" for each 'block' in a cell next to or above that 'block' so instead of that complicated formula I put together you just multiply by that 'index' value.  So parents would be index values of 2 and 3, next gen has 4, 5, 6, 7 and so on.  But since your 4th gen or didn't have any spaces I would have had to change your layout to do that.  Alternatively, you could just put a "parental generation" value at the top of the column and use that partially simplify the formula.  e.g. P-Gen 1, P-Gen 2 and then just use 2^P-Gen+count("BIRTH")

For your enjoyment I made this tweak in the attached file and here is the new slightly simpler equation: