Present span labels in groups in span classification task


I have too many span labels which makes it hard to utilize the span classification interface. I would like to group my span labels into 5 groups and present them to the annotators in 5 drop down lists. Is this possible through custom recipes?


Wouldn't it make sense to limit the spans from the command line. Maybe via something like:

python -m prodigy ner.manual <dataset> <spacy_model> examples.jsonl --labels=subset_a,subset_b,subset_c

This way, you can define the subset of spans upfront.

Personal Experience

I would personally be a bit careful with having an annotaton interface that contains many labels to attach. You might spend a lot of time clicking around in submenus and less time producing annotations.

Would it make sense to annotate less labels per Prodigy run? If not, could you elaborate more on the task that you're dealing with? If I understand the application I might be able to give more bespoke advise.

We developed a hierarchical classification system for our spans. We did an initial set of annotations and trained a spancat model which is predicting level 1 labels in our hierarchical system. For example, one of the span labels in the level 1 is "Equipment" and our current model is pretty good in identifying different equipments in the text. Now we want to implement the level 2 labels. Some example of level twos under equipment are "Heater", "Valve", "Engine". Note that text will be pre-labels and the current model will annotate text with level ones to make annotators' job easier.

My idea was to represent level 2 labels in drop downs lists whose titles are level 1 labels. This will make everything much cleaner for annotators.

As you mentioned, one solution is to fire up separate Prodigy runs for each level 1 span. This is not an ideal option for us for multiple reasons and that is why it is not my first choice. Main reason is that we added custom login layer to Prodigy web app to manage our annotators and we would need to modify the user management if we have to deploy multiple Prodigy web apps. Also it will cost more to have multiple prodigy web apps running.

If there was any way to modify spancat interface and represent span labels as dropdown lists it would be a neat solution.

Dropdown lists are currently not supported. I can imagine that we might revisit them when we start working on Prodigy v2, but for the short term I don't see them getting introduced natively.

However, if only as a temporary trick, you might consider using a custom recipe with a text input?

As the docs show, you could pre-populate it with text values that the user might select. Would that be an alternative for now? You could have text items like "Level 1: Equipment, Level 2: Engine" and "Level 1: Equipment, Level 2: Valve" that the user could still select.

Thank you for your suggestion @koaning. I'll give this a try. I also revisited our custom login page and it seems it is not that hard to modify it to handle multiple Prodigy instances. We may also consider your initial suggestion to fire up separate Prodigy web apps for separate Level 1 span titles. It is never a bad idea to make annotation tasks smaller.

1 Like

Happy to hear it!

Let me know if you have extra questions :slightly_smiling_face: