Is there a way to load 300 labels from a CSV to be used to annotate a dataset?
Any idea how can I make it in a user friendly way, like filter drop down menu with multiple select?
Hi Iyad,
On a technical level, you could read in a label set using a custom recipe: instead of passing the labels over the command line, you'd just read it from the file in the recipe script, and work with them from there.
The challenge is definitely usability and model performance, though. You'll need a huge number of annotations to recognise all 300 entities. I think a much more likely approach will be to do the typing in a separate step. So you would first recognise that some phrase is an entity, and then in your separate step assign the types. This should be much more efficient in terms of annotations, because the NER model only needs to worry about one entity, and when you're making the typing decisions, the model can consider more information.
This structure will probably also be easier to annotate. You could sort the mentions so that all of the mentions of the same text (e.g. all mentions of "America") are processed at once, so the annotator can click through more easily. You could also regroup the categories into hierarchical scheme. For instance, you might have 3 top level categories, each with 10 sub categories, and with a further 10 leaf categories. You would do the top-level classification first, and then work within that category in the next pass.