Bonkers: The GEDCOM Sanity Checker will help you quickly locate this data by identifying groups of claims in your database that are inconsistent with each other. Bonkers works by running your GEDCOM file thorugh a battery of recursive, multipass, deep scanning algorithms designed to rat even the most elusive errors.
Deep Scanning option is enabled. See the Vivify page for a discussion of its configurable options. Bonkers has several other configurable options that can also be adjusted to fine-tune your results.
Dead if Born Before Year is the earliest year a living person could have been born, before which is considered improbable.
Dead if Married Before Year is the earliest year a living person could have been married, before which is considered improbable.
John, I hope i can quote you on that. :)
Spread the word folks, there are still a few people that haven't had a chance to use it.
Tim
Thanks. That worked fine.
This a brilliant program and I think anyone with more than just a few names needs to use it!
Regards
John
John, thanks for the feedback. I glad to here the program worked well for you.
VGedX is a strict GEDCOM validator so requires a GEDCOM version in the file to know which dictionary to compare it against. Your GEDCOM file does not include the GEDCOM version in the header record, hence "empty". The file should have the following beneath the 0 HEAD line in your file.
1 GEDC
2 VERS 5.5
You can manually do the insertion, and then VGedX should work for you.
Bonkers, and the rest of the tools, are more relaxed in that they all use an expanded GEDCOM dictionary to compare against, so they ignore the GEDCOM version. This is the reason your file runs correctly in Bonkers.
I ran my gedcom file (16,000 people) and Bonkers gave me four A4 pages of Errors. I went though the list and made the corrections in my family history program. An excellent tool!
I have generated a new gedcom and want to rerun the file (Coldwell 4.ged) to check if I missed anything and I cannot get a result. VGedX gives the message "E028 198176 Unsupported GEDCOM version detected (empty)". I am puzzelled by this because I am using identicle programs and methods which worked correctly the first time. Also the gedcom file works correctly when loaded into my family history program.
I would much appreciate help sorting this out.
Regards
John
Sue, The limit is currently set to 50 MB. I recently ran a file for someone with 220K people that was 130 MB and it took 4.5 hours to complete (successfully I might add). I've only recently begun tracking times so have no data on the maximum file run so far. The service times out in 10 minutes though. I generally recommend for users with large databases to export a single ancestral line that they are interested in and run that.
Is there a limit to the size of Gedcom file I can run through Bonkers? I use Legacy with currently 205208 individuals?
BTW I think you can have both "Persons Born After One of Their Parents Died" and "Persons Died Before Having Children". Some babies are born after their father died.
Colm, would it be possible for you email me the error? I don't see any entry in my logs.
I tried loading my gedcom file but got an sql error
Thanks Jean. I have some more free GEDCOM tools coming out soon.
Extraordinary piece of work. Thank you so much. I plan to confer with your web site quite often.
Jay, I don't, but thanks for thinking it worthy.
Do you have a logo for BONKERS that I can use on my site to say to visitors that Bonkers has been used on my gedcom ?
Keith, I think you're probably right. That's pretty funny. I must have gotten carried away when I originally wrote these.
I ran Bonkers on my 4000+ person file with most of the defaults, Assume Flourishing For All Vendor Specific Event Types, Show Misformatted Dates, and Hide Improbabilities (show impossibilities only), and it picked up 25 records to check, and even after having run the file through a different plausibility check, so good job! However, some of the checks seem a bit redundant. For example, isn't "Persons Born After One of Their Parents Died" the same as "Persons Died Before Having Children?" Anyway, now I'm going to unhide the improbabilities and see what it comes up with. Thanks for the great tool!
Tony, it also takes into consideration data qualifiers, such that it must sometimes make assumptions when comparing dates, i.e. how do we compare “bef 1900″ with “bef 1901″ or “bet 1900 and 1910″ with “bef 1905″. Is one earlier than the other. This can sometimes cause results to skew, or seem skewed and false positives. If you have an example I can take a look at it closer.
I calculate the Julian Day Number for all dates, and my plan is to eventually use these so that when comparing exact dates, it can be calculated to with a single day, however there will always be assumptions made when qualifiers are used.
Tony, Yes, it does not currently resolve to months, or days.
Bonkers comes up with errors if critical dates are close together i.e in the same year. Does it only consider years and not months and days?
Andy, I agree. Until then I’ve made this tool available in my ongoing effort to provide free and useful tools to the genealogy community. Genealogists, researchers and family historians (whatever one wishes to call oneself) now will have an opportunity to easily identify some of the most obvious errors in their databases – and fix them once and for all. I have been using the tool myself for about 10 years and have found it to be remarkably accurate and able to pinpoint inconsistencies where I would have never expected them. It is also great for identifying areas that need to be researched further and cleaned up. Happy Bonking.
Gary, it available online as a web service so that anyone with the internet can use it.
Too bad all online trees aren’t required to go thru this before being uploaded to to a site.
Think of the junkology that would never make it to the web!
Does this work with Windows 8? Tks, Gary
Outstanding! Looking forward to trying it out
Here is one more tip while I'm thinking about it.
The GEDCOM parser that Bonkers uses is the same one that is used by VGedX (http://timforsythe.com/tools/vgedx), my GEDCOM validator. I periodically look through the VGedX reports to find new event types that are being used by different genealogy vendors and add these to the parser. By doing this, Bonkers is able to use these new events in its calculations, increasing the accuracy of its reports. Unfortunately, I don't scan the VGedX reports very often anymore since I am no longer supporting the Ancestors Now Tree Ring. I am more than happy to add new event types on request though, so if you want to improve your Bonkers results, you should run your GEDCOM file through VGedX and post here any new event types that are shown in the VGedX report. I'll add these at my earliest convenience. This will not only improve your results, but also the results of anyone else who uses the same genealogy application that you do. Win, win!
Here is a strategy that you can use to increase the accuracy of your Bonkers report.
Bonkers categorizes all claims as being either Single Occurrence Events (SOEs) or otherwise. An SOE is an event that can only occur once per individual, such as their birth or death. Other types of events such as marriage and graduation can occur multiple times. When performing calculations that rely on SOEs, Bonkers uses the first record of that event type found in the GEDCOM file. So if your genealogy editor allows you to order your claims such that when you export your GEDCOM file, the record order is retained, you can move your 'best' SOE first to improve Bonkers accuracy. So, for instance, if you have multiple birth dates, move the one you have concluded is that most accurate to be first. This is actually a good general rule that can be useful for other types of genealogy applications as well.
Here is another tip for reducing clutter in your Bonkers output.
If you are an evidence driven genealogist rather than a conclusion driven one, you probably have multiple, but similar events in your database. For instance, you might have several birth claims for an individual, each with a different date and source reference (this is common when multiple census records are entered). Some of these claims might be incorrect, or you might suspect them as so, but you don't want to remove them from your database, because they are still valuable references. You can tell Bonkers to ignore them by entering either of the strings "[Disproved]" or "[Not Applicable]" in the claim record's CAUSe field.
More Hints:
Flagged items do not all have to be resolved. Sometimes they cannot be. For instance, I might have conflicting birth and baptism claims for the same individual each from a different equally reliable source, or perhaps better said, equally unreliable sources. I cannot discount one or the other, because I don't know which of the two is incorrect. This isn't necessarily a bad thing. It gives me insight on where to concentrate my research. If you were to look closely at the Improbability List (http://timforsythe.com/tree/tjforsythe/improbs) on my personal tree you would see a lot these types of issues - they are in limbo waiting to be resolved. The advantage of publishing an Improb List (or Bonkers List) along with your tree, is that you cannot be faulted, or faulted as badly, for not informing the public (or so I wish) :).
Here's another hint for users.
I initially set the parameters too narrow for my database, and then after addressing any flagged claims, I will begin to widen them until the false positives increase substantially.
An example of this is that I may start with a minimum child bearing age of say 12 or 13. Something that is just not possible. Anything flagged should be addressed. Then I'll start dialing it up. Somewhere around 15 or 16, we start to get within the realm of possibility. Anything flagged at 16 years or greater should be scrutinized closely, because it may be a false positive.
LK>
Flourishing is the span of time when we expect to find non-vital records associated with a living person, such as graduation, military, etc. Built-in GEDCOM events are categorized as flourishing based on their type, so for instance, First Communion would not be. Vital records such as Marriage is, but Baptism is not. You get the point.
When Deep Scanning is enabled, Bonkers will first attempt to estimate birth dates for any individuals missing them, before embarking on its recursive hunt. We give up speed and accuracy (in some cases), but can gain other insights into problems, especially when birth dates cannot be estimated. These types of problems are not always obvious when doing straight comparisons. Deep Scanning will flag individuals already flagged for other reasons, so you do get some duplicates. I recommend it for anyone patient enough to wait an extra minute or two for the results.
I generally run several combinations of options, for instance I first run while ignoring improbs and address the impossible issues when I can. Then I'll turn on improbs w/o deep scanning, and after addressing those issues, I turn on deep scanning.
Tim,
What do you mean by "flourishing"? And what will "deep scanning" do differently when checked and when unchecked?
Thank you.