Please Check My Math
OK, a bunch of you out there have done statistical stuff more recently than I have (although I did read that stats text book a couple months ago). Anyway for those of you who have done stats calculations any time in the past five years or so (Chad? Greg? Randy? Chris?) could you please check this? I am really rusty at this and could easily have made a fundamental mistake...
Here is what I am trying to analyse. In the last post I mentioned that big long chain of ancestors. That long chain all hinges on a connection with a Jane Gillham born in 1773.
All the sources that I can find that mention Jane Gillham being married to John Minter and having kids that result eventually in me give Jane Gillham's birthday as April 21st 1773 in South Carolina.
Meanwhile, the document I can find linking a Jane Gillham to her parents, and through them eventually all the way back to King Kenneth and the like... lists her birthday as October 21st 1773 in South Carolina... and has no mention whatsoever of John Minter... but also not of any other information that would contradict her being the same Jane... other than the birthday.
My hypothesis is that at some point in the last 233 years, as the Gillham records that show who Janes parents are were copied over and over again, April at somepoint got miscopied into October. (Someone couldn't read it, recreated it from memory, whatever... )
Basically, I think these "two" Jane Gillhams are really the same person. But there is no proof of course, so I want to figure out the odds...
Here is the analysis I did... please check me and point out any math or logic errors I may have made:
- South Carolina population in 1773 was about 250,000 (based on 1790 census so this is actually bigger than reality)
- Live births were approximately 50 per 1000 population in the late 1700's (based on stat in "Encyclopedia of the New American Nation")
- This gives about 12500 births in South Carolina in 1773.
- About 6250 of those would have been girls.
- About 3% of those would be named Jane (based on Given Names Frequency Project for 1801-1810 time period)
- That gives us about 181 Janes born in South Carolina in 1773.
- We need to multiply by the percentage of the whole South Carolina population that were Gilhams.
- I have no idea what that number is. For now I will call it "G". (As a fraction, not a percentage, to avoid the factor of 100 everywhere.)
- So the number of Jane Gillhams born in South Carolina in 1773 would be about 181*G.
- Now, we know pretty confidently that John Minter's Jane Gillham was born April 21st.
- We could figure out the odds of a second Jane Minter being born on October 21st specifically.
- It would be 1-(364/365)^(181*G). This would be our lower bound on the odds. (Using math principles found on Wikipedia Birthday Paradox page)
- But... the hypothesis is that sometime in the last 233 years someone just transposed October for April in the Gillham family records.
- In that case we don't care specifically about October 21st, but instead just the odds of a second person being born on ANY of the 21sts other than April 21st.
- That is because our hypothetical miscopier could have switched it with any of the eleven other months, not just October.
- In that case our odds turn out to be 1-(354/365)^(181*G). This should be our upper bound on the odds.
- This gives the chances of another Jane Gillham being born on the 21st of any other month besides April, given that our Jane Gilham was born on April 21st.
So lets run this with some possible values of G:
This shows the chances (X) of a second Jane Gillham being born on the 21st of another month, and therefore probably being an actual second Jane Gillham rather than the same person with the date miscopied.
Everybody in SC is a Gillham (G=1): 99.6%
1 out of 2 is a Gillham (G=0.5): 93.7%
1 out of 5 is a Gillham (G=0.2): 67.0%
1 out of 10 is a Gillham (G=0.1): 42.5%
1 out of 20 is a Gillham (G=0.05): 24.2%
1 out of 50 is a Gillham (G=0.02): 10.5%
1 out of 100 is a Gillham (G=0.01): 5.4%
1 out of 200 is a Gillham (G=0.005): 2.7%
1 out of 500 is a Gillham (G=0.002): 1.1%
1 out of 1000 is a Gillham (G=0.001): 0.6%
Reversing the calculation... and solving for G...
G=log(1-X)/(181*log[354/365])
Plugging in a few numbers there...
As long as there are fewer Gillhams than 1 in 108 you have over a 95% chance that these two Jane Gillhams are the same Jane Gillham and not seperate people after all.
If there are fewer Gillhams than one in 552 then you have over a 99% chance that these are the same Jane Gillham...
(And even if there were so many Gillhams that 1 in every 8 people in SC was a Gillham, you'd still have better than even odds that this was the same Jane Gillham.)
One in 108 would mean that there were about 2300 people with the surname Gillham in South Carolina around the time of the 1790 census
One in 552 would mean that there were about 450 people with the surname Gillham in South Carolina around that time.
So, this all depends on the number of Gillhams in South Carolina in 1790... but if there were any less than 2300 or so, I'd feel really confidant betting that this is only one Jane Gilham, and someone just miscopied her birthday at some point (probably on the Gillham side... although all the math is the same if it was the reverse.)
Thoughts?
Links to the sites I got stats and math from: