Hi,
I am new at prodigy and working on a NER model with multiple new labels. I am wondering how to optimise the creation of such model.
This model should have between 10 and 15 labels, mostly derived from Spacy native named entities (PERSON, ORG, NORP,....). In fact, some labels should be exactly as in Spacy model (e.g PERSON), while others (e.g NORP), should be split into different labels (e.g NAT for nationalities, POA for political affiliation).
I first try to manually annotate sentences from a large dataset with all labels, using a pattern json file with patterns for my set of labels. But I soon realised that I was getting confused and making mistakes (missing labels and changing rules with time). Also, my large dataset was not particularly optimal for some labels.
So my second approach was the following: manually annotate each label using a dedicated dataset (so that I would find named entities on almost every example). In short, for each label, I used ner.manual, then trained the model and then ner.correct until I reached the F-score I wanted (>~80%).
That worked well and gave me good precision and recall for each model (one model per label).
What is the best way to combine my single label-based models into one multiple labels model?
Or, if I had trained a model with say 5 new named entities and later on created a new label, how could I "add it" to my existing model?
Or perhaps, assuming I was entirely satisfied with Spacy NER model performance for the label PERSON but wanted to use my trained model for ORG (for instance), would there be a way to do so?
Thanks in advance for your help.
PaulineB