NER with large set of labels

Iyad · October 22, 2020, 2:27pm

Is there a way to load 300 labels from a CSV to be used to annotate a dataset?
Any idea how can I make it in a user friendly way, like filter drop down menu with multiple select?

honnibal · October 23, 2020, 1:39pm

Hi Iyad,

On a technical level, you could read in a label set using a custom recipe: instead of passing the labels over the command line, you'd just read it from the file in the recipe script, and work with them from there.

The challenge is definitely usability and model performance, though. You'll need a huge number of annotations to recognise all 300 entities. I think a much more likely approach will be to do the typing in a separate step. So you would first recognise that some phrase is an entity, and then in your separate step assign the types. This should be much more efficient in terms of annotations, because the NER model only needs to worry about one entity, and when you're making the typing decisions, the model can consider more information.

This structure will probably also be easier to annotate. You could sort the mentions so that all of the mentions of the same text (e.g. all mentions of "America") are processed at once, so the annotator can click through more easily. You could also regroup the categories into hierarchical scheme. For instance, you might have 3 top level categories, each with 10 sub categories, and with a further 10 leaf categories. You would do the top-level classification first, and then work within that category in the next pass.

Topic		Replies	Views
NER with dozens of entities usage , ner	4	847	April 16, 2021
Manual NER with huge count of entities usage , ner	1	578	December 18, 2018
Search functionality for labels usage , front-end	1	448	August 31, 2021
80 Entities ner.manual usage , ner , solved	7	804	August 15, 2021
Recipe choice for NER Annotated Dataset Creation usage , ner , solved	2	462	April 20, 2020

NER with large set of labels

Related topics