How do I disable some entity types in spacy, e.g. MONEY
? I can’t seem to find it anywhere.
There’s not really an easy way to stop an existing trained model from predicting a label, because all its weights are based on the presence of all the labels it was trained on. When you get the predictions, you can obviously filter the entities to only include the labels you’re interested in – but these predictions will also be based on the entirety of the models weights, including all other labels. For example, a model trained with a label scheme that distinguishes between MONEY
, ORDINAL
and CARDINAL
will make very different predictions compared to a model trained with only NUMBER
for all numbers.
If you have the original training corpus, you can “disable” a label by retraining the model and not including annotations for that label. If not, you could try and update it with more examples that explicitly label examples of what the model previously predicted as that label as “not an entity”. But this might cause other unintended side effects, because the updates you’re making are pretty significant.
I see. Are the corpus for the en
models available so its easy to retrain the pretrained models from scratch with these configurations set instead?
The OntoNotes 5 corpus is available for research, but if you want to use it commercially, you need a membership (which costs around 25k, so it’s not exactly cheap). The corpus comes in its own format, so you’ll need to run a bunch of conversion scripts to get tokens plus entity labels out. It’s doable, but if you haven’t done this kinda stuff before, it’s probably not “easy”.
So you’d probably be better off creating your own corpus. It might not be as large (2m words), but at least you can make it more specific to your use case.
Alright. Thanks for your quick response!