Can we train an NER to recognise some entities not learned from labelled examples, but a list of imported entities, such as names of areas, main roads, etc.?

calista · June 18, 2020, 12:57pm

Thank you for the awesome free SpaCy library!

We are looking for an annotation tool that allows us to train our system to extract information in listings of real estate in Singapore.

Apart from using Prodigy to manually tag the entities in the listings, can we import lists of entities, such as lists of real estate developers in Singapore/main roads in Singapore/real estate projects in Singapore, so our system can recognise the entities without being trained using labelled examples?

Unfortunately our dataset isn't large enough for the system to learn to recognise and extract ALL those entities.

Thanks!

ines · June 19, 2020, 1:58pm

Hi! The main goal of training a model from labelled examples is to allow your system to generalise and extract other similar entities in similar contexts, even if it hasn't seen those during training. For example, it could recognise a road name that's not in your list because it's mentioned in a similar context as other road names in the data. If that's your goal, you should want to train a model on examples.

If you have existing lists, you can use them in Prodigy to pre-label examples for you, so you only need to correct the suggestions and fill in the blanks. That's much faster than doing everything by hand. Check out the examples of annotating named entities with patterns: Named Entity Recognition · Prodigy · An annotation tool for AI, Machine Learning & NLP

If you don't want to train a model and just recognise whatever is in your lists, you can just use spaCy's Matcher/PhraseMatcher/EntityRuler and load in your lists. See here:

You can also combine this approach with a model later on, so you can have a system that generalises, but also reliably tags whatever is in your list. Definitely run the rule-based approach first and evaluate it because that gives you a baseline accuracy. (For instance, if you get to 95% using only your lists, it's very likely that a model won't be able to beat that )

calista · June 21, 2020, 8:43am

Thanks for your kind answer and advice!

Topic		Replies	Views
Transfer Learning for NER usage , ner	6	2508	May 24, 2021
Train NER model to improve existing entities spacy vs prodigy ner , spacy	1	953	December 9, 2019
Improve trained models with annotations usage , ner , training	3	519	September 20, 2021
Improving on spacy's existing NER entities ner	1	664	December 5, 2019
spaCy, prodigy, annotation usage , ner , solved	2	721	February 8, 2019

Can we train an NER to recognise some entities not learned from labelled examples, but a list of imported entities, such as names of areas, main roads, etc.?

Related topics