How Spacy is using lemma in its training NER

gilbertofp · February 15, 2019, 5:31am

Hi,

First thanks a lot for great support to Spacy and Prodigy.
I would like to know how using custom lemmas can change the model training to Spacy, the model is always training from the token text or when the doc has a lemma it uses the lemma for training? how the lemma helps Spacy perform better predictions?

Example
ner_model.tokenizer.add_special_case(u’.NET’, [{ORTH: u’.net’, LEMMA: u’.net’, TAG: u’PROPN’}])

Our project is to identify entities such as .Net developer(. NET -> specialism of a job)

ines · February 15, 2019, 10:26am

Hi! The short answer is: it doesn’t. The features (currently) used in the model are the norm, shape, prefix and suffix. So what you’re trying to do makes sense – but instead of the LEMMA, you want to be using the NORM (available as token.norm_).

In your example, it might not make a big difference, since the norm already defaults to the lowercase form of the token text. But in other cases like spelling variations, it can have a bigger impact and will ensure that tokens with the same norm receive similar representations, even if one of them is much less frequent in the training data than the other. Also see spaCy’s norm exceptions for more background on this.

Btw, if you haven’t seen it yet, you might also find this video helpful:

gilbertofp · February 18, 2019, 5:24am

Thanks a lot Ines for answer. I will check the video as well.

Best,

Gilberto

Topic		Replies	Views
Recommendation for creating new special token usage , spacy , solved	3	696	March 26, 2018
lemmas in the annotation workflow	2	277	April 7, 2023
Training a new parser or NER using a model with no lexeme normalization table. This may degrade the performance of the model to some degree... usage , spacy	4	2214	November 2, 2020
Adding Custom Features to Train a NER spaCy Model ner , spacy	1	699	February 16, 2021
Will custom word vectors improve NER training on new entities? usage , ner , spacy , solved	1	375	November 20, 2020

How Spacy is using lemma in its training NER

Related topics