I Love City Directories But

City Directories and the OCR Problem


City directories are an easy to understand and amazing resource for finding your ancestors and relatives. When they are searchable, you can quickly find hundreds of relatives over decades with a simple mouse click. You gotta love those computer programmers who made it all happen.



And as much as I LOVE city directories, there is one thing I don't care for.


OCR technology (optical character recognition) is a great technology to quickly scan and information from a printed record and convert it into searchable text. I'm fairly certain that Ancestry employed this technology to create the City Directory database. I can't imagine the cost of having human capital key in the directories from all over the United States. That would be an astronomical cost.  OCR is awesome!

Except when it's not.

I've encountered a number of hiccups with the OCR technology making the Ancestry database a bit messier than I would like.

In the 1920s and 1930s, some city directories used ditto marks rather than waste time typing the last name Brown thousands of times. The practice makes a lot of sense pre-computer era. Unfortunately, my experience has been that the OCR interprets the ditto marks as the letter M.

Sometimes my Samuel Brown relative will be recorded as M Samuel (first name M) or Samuel M (last name M). Thankfully, you can suggest an edit to the entry and then that edit becomes searchable. I have no problem with that and I'm glad that resource is there.

So, what's my problem?

Sometimes OCR skips an entire group of people and there is no way to 'edit' Ancestry's database to insert these skipped individuals or inform Ancestry there are people who need to be added.

For me, it happens with high frequency in my relative August Hoppe who is skipped more times than not. So, I will add a "Residence" fact to August. Then I will create a note for his fact with the direct link to the image where he can be found but the database has no entry for him. It's my working solution, but couldn't things be better?

When a skip of names happens, my guess is there was something wrong with the paper from which the OCR attempted to read. I had originally thought the error was in the Beta version of the directories, but it has continued in the 'full' version of the collection. OCR technology has it's limits and Ancestry made it possible to edit for those limitations to a degree. However, there's one more step necessary.

I can report when an image is of poor quality or the like. However, there is no way to report that a name or group of names is not indexed. I haven't found a way to insert an index entry for the missing name either. So, if Ancestry.com would create an option on their 'report problems' for missing or skipped names, I could take the 'but' out of my "I Love City Directories, But..." statement.

Comments

  1. I just sent Ancestry a note regarding this as well! I manually searched about 20 city directories this weekend and about 2/3 of the time I had no way to link it to my ancestor's profile so I would pick a random index entry to link to. Not ideal obviously but slightly better than downloading and adding as a separate source citation. As you mention, it would be great if a mechanism was available that allowed me to add or suggest the missing people. Hopefully this feature is added someday, the crowdsourcing could really improve this vast resource on the site. I am looking forward to attending one of your lab sessions at RootTech 2017!

    ReplyDelete
    Replies
    1. Ellen, thanks so much for stopping by. I do wish Ancestry would add that collaborative tool to the site. I look forward to seeing you in February.

      Delete

Post a Comment