Adding Custom Features to Train a NER spaCy Model


I have been training a NER model with the spaCy and the results are pretty good :smile: So I really enjoy doing NLP with spaCy.

While I was thinking how to further improve the NER model's accuracy, I wanted to ask if I can add some custom features in the training data.

For instance, I am working on a Merchant Name Recognition task. Currently, by following the documentation, the input data looks like:

[('Amazon co ca', {'entities': [(0, 6, 'BRD')]}),
('AMZNMKTPLACE AMAZON CO', {'entities': [(13, 19, 'BRD')]})]

However, it would also be very helpful if the model can take more informative features such as country, transaction amount for the merchant recognition. So far, I didn't see any articles that add any custom features in the training data. Therefore, I wanted to submit this post and see if it's feasible to do so?


Hi @TanjirouNezuko ,

This is a very reasonable request, but unfortunately there's no good solution for spaCy v2, which is used by the current version of Prodigy. spaCy v3 does open up possibilities for this, but we haven't done experiments for it or done a guide yet. It's definitely on our list though!