What would be the best approach to create a searchable list of labels for the "image_manual" recipe, or any recipe really, including custom recipes? I have a list of labels in the order of 100, so if I populate them the usual way, they'll clutter the UI and would be hard to find at the same time.
Thanks for your question and for joining the Prodigy community
This question ("handling many labels") is common so here are a few good ideas to get started.
What's important first is to think of annotators, from a "human-centered" perspective. Let's forget the UI design for a minute and pretend that a UI could even have the real estate to present so many labels. Could you reasonably expect a human annotator to be able to keep 100 categories in their mind? Think of the cognitive costs of context switching?
This is why the first key idea is whether you have some prior information that would enable you to break up your labels into subtasks. Our text classification docs provide some ideas of how to approach handling many labels:
If you’re working on a task that involves more than 10 or 20 labels, it’s often better to break the annotation task up a bit more, so that annotators don’t have to remember the whole annotation scheme. Remembering and applying a complicated annotation scheme can slow annotation down a lot, and lead to much less reliable annotations. Because Prodigy is programmable, you don’t have to approach the annotations the same way you want your models to work. You can break up the work so that it’s easy to perform reliably, and then merge everything back later when it’s time to train your models.
If your annotation scheme is mutually exclusive (that is, texts receive exactly one label), you’ll often want to organize your labels into a hierarchy, grouping similar labels together. For instance, let’s say you’re working on a chat bot that supports 200 different intents. Choosing between all 200 intents will be very difficult, so you should do a first pass where you annotate much more general categories. You’d then take all the texts annotated for some general type, such as
information, and set up a new annotation task to sort them into more specific subtypes. This lets the annotators study up on that part of the annotation scheme, so they can make more reliable decisions.
But as mentioned above, hierarchical approaches may be worth considering if you can nest your labels within groups. Here's some ideas of how to nest labels using the
You may also find the
text_input and its
field_suggestions to be helpful instead of listing out all sub-labels in the
choice interface. Adding
field_suggestions can allow you to have a box with auto-suggest and auto-complete to fill in answers.
Last, to your question on
image_manual, here's a related post that incorporates a few of the ideas previously mentioned:
One other outside of the box idea could be bulk labeling. This may help out if you need to have many categories/hierarchies but don't know a priori of good hierarchies or ways to cluster (partition) your data, but projected in a lower (say UMAP 2d space) that you can visualize and manually cluster with a vis tool. My colleague, @koaning, has a wonderful tutorial where he shows how to use bulk labeling:
He also has an accompanying repo with a related project and code:
Hope this helps!