(Photo by Crosa)
Genealogy is Gross
There has been a lot of hullabaloo in the genealogy blogging community recently about the large interlinked genealogy tree websites due to some recent changes made by one of their ilk concerning privileges and the like. It is not my intention with this article to single out any particular Really Big Tree Co. – I intend to dismiss them all equally.
Before I start the bashing though, and in all fairness, I should allow readers the opportunity to review my own ancestral tree. I have been presenting my tree online since 1999, and it has gone through many, many updates, revisions, guttings and skinnings. There have been times in the past when the tree has contained gross errors. If you look at it today, you will find it still contains gross errors. In fact I’d bet my shorts that there is not a single ancestral tree in existence in any database anywhere in the world that does not contain gross errors. It’s inevitable. It is just not possible to obtain all the primary (and infallible) sources necessary to support every branch of our individual trees. So it is expected that any online ancestral tree, whether isolated to a single family or interlinked with other families, will have this same deficiency. It’s important to understand that the deficiency is not necessarily reflective of the medium in which the trees are presented. Offline ancestral trees suffer from the same problems.
So as consumers of online ancestral trees, we are presented with a problem. How do we identify which claims made of individuals are valid, when we know darn well the tree has gross errors in it? We do the same thing that any good detective would do, we look at the evidence presented for each claim being made and analyze that evidence to determine its likely certainty. The level of certainty can usually be assessed by reviewing the sources referenced for that claim and by comparing the claim to other claims to determine their probability. Sources tend to fall into several categories based on three properties: authority of the original record, how contemporary the original document was to the events it records, and how closely associated was the original author to the recorded claims. I discuss this in more detail in the FAQ. Based on these categories, a basic level of certainty can then be determined, however, once other more certain claims are identified, the certainty may need to be adjusted. I will discuss this more below.
So the problem I have with ALL online ancestral trees, with few exceptions, including those at The Really Big Tree Co. is that they usually do not provide the evidence needed by consumers to be able to assess the level of certainty of a claim. If we cannot assess the level of certainty, then the claim is in effect – worthless. Its only real value is to provide a jumping on point to further research. Unsupported claims like these should never be presented in an online database unless it is clearly indicated that the claim is not supported. It can be argued that if no source reference is provided for a claim, then this ‘unsupported’ status IS clearly indicated. I would argue that if, like most trees, more than 80% of the claims are shown to be not supported, the tree is junk.
Back to The Really Big Tree Co. I freely admit that before the recent hullabaloo, I had zero experience with these trees. I went to some of the free websites talked about in the blog posts, and did some basic tests. As I expected, each failed miserably. I found several individuals for whom I know there is a wealth of available sources, and calculated the ‘claim-to-evidence’ ratio for each. I never expect much when I’m looking at hobbyists’ online trees, but I expected more for these large profitable companies. Unfortunately, I only found about one source reference for every ten claims, and about half of those source references were for completely unreliable sources. When I tested less known persons, the ratio was closer to zero. The other test I ran was to check if the sites provided the ability to add and show the source references for the most important genealogy claim. Ancestral trees at their most basic level are nothing more than claims of parental associations. If there is no way to enter or display the evidence for this claim, the tree has no value. That is not to say the individual profiles are worthless. They stand independent of the tree as a whole and are supported by their individual claims.
That’s enough bashing. I wasn’t surprised, and I don’t think they are either. It is only natural that the primary goal of these companies, like all companies, is to maximize profits. They do this by becoming popular. They become popular by collecting as many profiles as possible so that new users can connect into the trees’ root systems readily, get excited, and tell their friends. What they don’t do is enforce good genealogy. Okay now I’m done bashing. I actually have absolutely no problem with The Really Big Tree Co. Since they are not doing genealogy, they fall outside my domain of expertise.
The Online Interlinked Shared Ancestral Tree Website Thingy
The controversy, as would have it, came as I was building my own online interlinked shared ancestral tree website thingy. Should The Really Big Tree Co.s be shaking their leaves? I don’t think so, the chances that I’ll ever finish mine is doubtful – at least not anytime soon.
I designed mine, as I do most of my genealogy projects, in my own little bubble. I did not do it for profit. I did not collect input from possible users. I did not do it ever expecting anyone to use it. I did it for me. It was designed to allow people to upload their GEDCOM files or add their profiles to the ancestral tree in an effective way so that the accuracy of the tree continually improves. The design as you might expect differs in a few critical, and by now hopefully obvious ways from The Really Big Tree Co. What follows is a description of 10 of the major design points.
The Little Tree That Might
Firstly, I should explain to those readers who are not yet aware, that the GEDCOM ‘standard’ that is used by ALL genealogy editors and applications to transfer your data between them – sucks. A couple little things that were overlooked by the creators of GEDCOM: Source categorizations and parental associations. Actually it’s kind of funny, because these are the main two things that any genealogy application that hopes the help the user create accurate ancestral trees should have. I can’t even imagine how they missed these. They also completely botched the source reference certainty assessment field (they call it quality, and intermingled it with certainties and source categories). So the first thing that needs to be done is to fix the new standard. One option is to add user defined fields to GEDCOM that can be used by applications that give a shit about your genealogy. Incidentally, this is exactly how the source categories, certainty assessments, and parental associations found their way onto my tree. The better way is to write a new standard that could include all sorts of other valuable genealogical information left out of GEDCOM. I made an initial attempt at this when I wrote GREnDL, The Genealogical Record Exchange and Description Language (See GREnDL 2.0 for a description of the underlying structure).
- So the first piece of the design is to support importing and exporting of files using the new standard. It should also support importing GEDCOM 5.5 and 5.5.1 files so they can be converted for export.
- At a minimum, the new standard would need to support source categories and parental associations. Obviously, it also needs to support all of the other useful claim fields as well, like births and deaths.
- It also needs to provide the ability for users to manually add profiles and sources.
- It needs to enforce source categories for all sources. In other words when a user adds a source, they must set the categories or the source is discarded.
- It needs to enforce source references for every claim. In other words, users cannot add a claim without first having added a source and then referencing that source. (claims for persons within living memory excepted). No source reference = discardation.
- When linking individuals together, such as assigning a child to a parent, it should treat this like any other claim and enforce source references as stated above.
- It needs to programatically create initial certainty assessments for each claim based on the referenced sources’ categories and display that certainty assessment alongside each claim along with the source references.
- It needs to permit social voting for certainty assessments. Basically this allows users to vote on the certainty assessment for each claim. The certainty assessment with the most votes wins. Whenever multiple claims exist for single occurrence events (events for which only one occurrence is possible, i.e. birth, death), the claim with the highest level of certainty gets used for system wide processing. An example of system wide processing would be the calculation of improbabilities where birth and death dates are needed. Another example, is the person’s name to be used throughout the system. Social voting not only gives users a vested interest in each claim, but provides a democratic method for improving the reliability of the entire tree. There are a lot of caveats that might be applied to social voting. For instance, users should only be able to vote once per claim. Disproved claims would need to be explained, reviewed and approved by designated moderators, and once approved, locked so that further votes cannot change the assessment.
- It would need to do profile synchronization and improbability checking in the background, and on claim entry. This should work well because all claims must be supported by evidence, all sources categorized, and all certainties assessed, and democratic voting applied. This narrows significantly the margin of error. It is much easier to match profiles and calculate improbabilities when the claims in each profile can be relied upon for accuracy.
- It would need user registration so that private records such as living persons could only be displayed for appropriate users. Configurable certainty thresholds would need to be allowed so that users could filter out uncertain ancestors at various levels from system wide processes. Users should also be able to disable global certainty assessment enforcement for claims (i.e. displaying ancestors in their own ancestral tree), so that only their local certainty assessments are used. This allows users to ignore what everybody else thinks, because they know they are right and everybody else is wrong. Users of The Really Big Tree Co. seem to complain about this a lot, feeling that the quality of their data is being corrupted, but I would expect this would not be as necessary in a system that used only the best claims. Various improbability thresholds should be configurable so that each user could view an improbability list specific to their tastes. If you are not familiar with improbability lists check out my new utility, Bonkers: The GEDCOM Sanity Checker. There are of course many other optional configuration items that could be provided to the user. For the profit seeking company, some of these advanced features could be available for paid subscribers only.
I hope that the advantages of this design over The Really Big Tree Co are somewhat obvious. Unfortunately, there are some huge disadvantages that for a large company make it inoperable.
There is no new standard as of yet. Even if I spent a couple weeks updating GREnDL 2.0, I’d never be able to convince all the major genealogy applications to start using it. It would be easier, I think, to be elected president of China. Because of this limitation, users could only add their profiles manually, which wouldn’t be very popular. That and most of the users of The Really Big Tree Co. have already proved that they don’t have reliable sources for their claims, so they couldn’t add their data even if they wanted to. Until the genealogy community as a whole steps up and demands better software, the chances of getting this type of service is slim and slimmer.
For anyone interested in the features that my design provides, there is hope.
All Most of these features are currently available in another one of my utilities, Adam: The GEDCOM Family Tree Builder. Adam is the family tree generator used to create my own family tree.
Edit 11/27/2012: I apologize to those who replied to this article when it was originally posted on an earlier version of this blog. Unfortunately, when transferring the content I lost all comments (I also just ran through an edit removing some of the poorly dropped sarcasm, if you can believe it). In summary, and I am obviously paraphrasing here, there were a few of you who questioned the sanity of a system that would not allow users to add claims without having sources, more specifically if they imported their GEDCOM file, all unsourced claims would suffer discardation – including parental associations. I countered … And?. We’ll just leave it at that.