textcat affects NER

WeinstockShahar · January 28, 2018, 6:30pm

I used textcat to train a new model based on en_core_web_lg.
Apparently the new model does nice text classification but really bad NER.

Is it possible that the NER model is affected by TextCategorizer?

honnibal · January 28, 2018, 11:33pm

Hmm, it shouldn’t!

The NER and textcat models shouldn’t share any weights except the pre-trained vectors, but the pre-trained vectors should be static. So updates to the model shouldn’t be changing the NER. My guess is that the NER model is being updated during the textcat updates, even though that shouldn’t happen.

For sanity, could you try copying the textcat model directory within the original directory? Assuming this works, this also gives you a workaround until we figure out what’s wrong.

WeinstockShahar · January 29, 2018, 6:54am

I just noticed that the ner directory is actually missing in my new model.
Sorry for the bad problem explanation...

Did you mean copying the textcat from the created model into the original en_core_web_lg?
I tried it but the model don't recognize it.

ines · January 29, 2018, 3:13pm

Ah, this explains a lot – so it looks like the textcat training process disabled the ner pipeline component. So when you save out the model, it saves without NER weights. Will have a look at this issue!

Edit: Just had a look and the most likely explanation is the following line in the textcat.teach recipe:

if input_model is not None:
    nlp = spacy.load(input_model, disable=['ner'])

I think we originally added this for efficiency, since the recipe always keeps a serialized copy of the best model, which can easily take long for large models. But maybe this is a bad default, since it's pretty unintuitive. You should be able to change this behaviour by removing disable=['ner']. (If you do so, keep us updated on the speed and performance!)

Yes – but since the original model didn't have a text classifier, you'll also have to add "textcat" to the pipeline in the model's meta.json. For example:

{
    "pipeline": ["parser", "tagger", "ner", "textcat"]
}

WeinstockShahar · January 29, 2018, 7:50pm

Perfect, Thanks a lot.

Will do!

Topic		Replies	Views
Text classification, model "forgets" about trained named entities after textcat.batch-train enhancement , textcat , done	6	596	June 7, 2018
NER and TextClassification usage , textcat	1	544	February 5, 2018
Can't load model if trained for NER and TEXTCAT usage , ner , textcat , spacy	3	817	July 2, 2019
Does textcat use NER entities as features? ner , textcat , spacy , solved	2	561	April 20, 2021
Combining NER and Classification usage , ner , textcat , solved	7	723	August 5, 2022

textcat affects NER

Related topics