I’m currently dealing with short text observations (up to 140 characters), essentially representing bank transactions. I am trying to extract relevant entities from the text, using spaCy’s NER.
My current workflow is the following one:
- Create a blank model
- Create a pattern file, providing some initial knowledge (limited to exact matches)
- Collect the training data using ner.train (Source data contains 1000 observations/transactions)
- Annotate until “No tasks available” (In my case 1500 annotations)
- ner.batch-train (using default settings) and produce the output model
After the model has been created, I present the model with some observations the model has already seen and successfully recognized during the teaching part and I don’t get good results.
For the model to recognize, I have defined 2 custom entity types, let’s call them MERCHANT and TRXTYPE (Transaction Type).
TRXTYPE is easily recognized since there is a fixed number of Transaction types.
However, the MERCHANT label is rarely assigned, even though the patterns file have provided certain names that can be found quite often (consider supermarkets for instance).
Maybe I am doing something wrong, could you please evaluate and shed some light on it?
Thank you in advance!