We’re just getting started using prodigy and spacy and its working great so far!
We’re working on training a ner model to understand music data such as artist names, labels, genres and music venue names. Is it a better strategy to extend existing labels ,like add artists to PERSON and record labels to ORG etc? we have a fairly diverse set of artist names such as obscure names like Ø [Phase], &ME or SHXCXCHCXSH to normal human names like Carl Cox or Adam Beyer etc.
are we going against the existing models understanding of whats a person by adding an artist as a new label ARTIST instead?
If I may chime in, how does the base model perform as it is in recognizing artists as PERSON entities and labels as ORGs? If it’s already doing a good job, then further training the “base” model would make sense. If it is not, then perhaps a new label starting from scratch makes more sense.
That’s a bit what I experienced myself. That’s what is nice with prodigy, in only a couple hours, you can have a good idea of what direction you should go with simple experiments.
In my data, I found that trying to create a new label for JOB TITLE was more work than just training the ORG label further as it was already labeling more JOB TITLES as ORG. So I just further trained it and got better results that way.
You’ll likely be better off starting a new model. You can bootstrap with the existing model to get started though. You might want to try running the model over a lot of your text, and then sorting the entities by frequency. Then you can label the most frequent PERSON and ORG entities according to whether they’re actually bands. This should let you build up a patterns dictionary, to make it easier to get started.