First thanks a lot for great support to Spacy and Prodigy.
I would like to know how using custom lemmas can change the model training to Spacy, the model is always training from the token text or when the doc has a lemma it uses the lemma for training? how the lemma helps Spacy perform better predictions?
Example
ner_model.tokenizer.add_special_case(u’.NET’, [{ORTH: u’.net’, LEMMA: u’.net’, TAG: u’PROPN’}])
Our project is to identify entities such as .Net developer(. NET -> specialism of a job)
Hi! The short answer is: it doesn’t. The features (currently) used in the model are the norm, shape, prefix and suffix. So what you’re trying to do makes sense – but instead of the LEMMA, you want to be using the NORM (available as token.norm_).
In your example, it might not make a big difference, since the norm already defaults to the lowercase form of the token text. But in other cases like spelling variations, it can have a bigger impact and will ensure that tokens with the same norm receive similar representations, even if one of them is much less frequent in the training data than the other. Also see spaCy’s norm exceptions for more background on this.
Btw, if you haven’t seen it yet, you might also find this video helpful: