NER model to extract addresses from text

zakaria47fs · July 24, 2020, 5:44pm

Hi folks !

Need to train a custom NER model to extract addresses from texts, but after many google searches can't find a convenient dataset (text containing addresses),
Anyone can help with this : Where I could find such dataset ? Or if u have any other suggestions to build the model ?

lejimmy · July 25, 2020, 3:41pm

Hey Zakaria,

I've been trying to build something similar on my end. I have a large corpus of legal documents that have been OCR'd (with varying scan qualities) via Tesseract and I'm looking to extract names and addresses.

I'm currently going trying regular expressions to match for street addresses, create a JSONL file and cycle through the following recipes until I get the results I'm happy with:

ner.teach
ner.match
ner.batch-train
ner.train-curve

Check out the flowchart for more details.

zakaria47fs · July 25, 2020, 7:40pm

Hi Jemmy thanks for ur reply,
My problem is that I didn't find any dataset of texts containing addresses, for annotation I will do it manually I have no issues with that,

ines · July 27, 2020, 9:17am

What texts do you want to process with the model later on? Can't you just use those texts? You typically want to be training your model on data that's similar to what you'll be analyzing at runtime. Publicly available datasets can be useful sometimes if they're similar enough, but it's better to use your own data.

Topic		Replies	Views
Address extraction: NER or Spancat? ner , spacy , spancat	1	2251	June 9, 2023
Address entity recognition from a resume/CV ner , best-practices	2	2405	January 18, 2019
ner.teach not giving relevant entities from patterns jsonl ner , done	21	2844	October 2, 2018
Questionable results from NER - we must be doing something wrong ner , spacy , best-practices , legal	5	4344	August 30, 2018
Advice on training NER models with new entities usage , ner , hr	13	3885	January 25, 2019

NER model to extract addresses from text

Related topics