Training strategies, extending or replacing labels

jaywalk · October 23, 2018, 7:28pm

Hi,
We’re just getting started using prodigy and spacy and its working great so far!

We’re working on training a ner model to understand music data such as artist names, labels, genres and music venue names. Is it a better strategy to extend existing labels ,like add artists to PERSON and record labels to ORG etc? we have a fairly diverse set of artist names such as obscure names like Ø [Phase], &ME or SHXCXCHCXSH to normal human names like Carl Cox or Adam Beyer etc.
are we going against the existing models understanding of whats a person by adding an artist as a new label ARTIST instead?

etlweather · October 23, 2018, 9:22pm

If I may chime in, how does the base model perform as it is in recognizing artists as PERSON entities and labels as ORGs? If it’s already doing a good job, then further training the “base” model would make sense. If it is not, then perhaps a new label starting from scratch makes more sense.

That’s a bit what I experienced myself. That’s what is nice with prodigy, in only a couple hours, you can have a good idea of what direction you should go with simple experiments.

In my data, I found that trying to create a new label for JOB TITLE was more work than just training the ORG label further as it was already labeling more JOB TITLES as ORG. So I just further trained it and got better results that way.

honnibal · October 25, 2018, 1:12pm

You’ll likely be better off starting a new model. You can bootstrap with the existing model to get started though. You might want to try running the model over a lot of your text, and then sorting the entities by frequency. Then you can label the most frequent PERSON and ORG entities according to whether they’re actually bands. This should let you build up a patterns dictionary, to make it easier to get started.

Topic		Replies	Views
Add more 3 new entity type usage , ner	4	600	November 1, 2019
how to use ner.correct --update usage , ner , solved	4	543	October 21, 2021
Help with messy data usage , ner	8	628	January 20, 2019
Adding new label usage , ner	5	1155	November 8, 2021
how to extend an already labeled corpus? usage , ner , solved	5	1042	June 29, 2019

Training strategies, extending or replacing labels

Related Topics