I bought a license with the hope of cleaning up some historic deed data (early documents typwritten on vellum, then microfilmed, then jpg-ed.
I ran the whole thing through tesseract and PICCL, but I haven't been able to get very clean text yet. Here's an example of what I have to work with:
No. 4 Allof Lot “Numbered one (1) md the Fact ote-half of Let numbered
‘Pwo (Be) in Blook twnbered Five (5) in <+- P.T. Smith's Addition to the Tom of St Johne
jand also the West One-half of Lot Numbered Two (Wy 2) in Blook Vumberee Six (6) in P.T.
Smith's Addi tion to the Town of St Jehns as s'iown and designated on the culy recorded plat
: f£ said P.T.Gmith’s Addition to the Town of St Johns, now of record with the clerk of
Henze osiad Colmty O2egon
I want to find all of the entities like:
West One-half of Lot Numbered Two (Wy 2) in Blook Vumberee Six (6) in P.T. Smith's Addi tion to the Town of St Jehns
I have a list of all of the addition names to use as an ontology (in this case, "P.T. Smith's Addition to The Town of St. John's" At this point, I assume I need to do some manual tagging and annotation.
I am fine with not correcting the text as long as I can identify the parcels.
All ideas appreciated!