I'm trying to setup prodigy to annotate texts for several classification tasks (e.g. one task just being binary classification - is it about topic x or not and others multilabel classification - which method(s) is/are used from method a to method d). Currently, I have a custom recipe with a choice block that covers all classification tasks (basically just a long flattened list of all labels of all classification task) which obviously is not optimal.
Having seen other forum post, I was thinking of breaking it down into several tasks to make it less tedious and error-prone to the annotators, i.e. the annotators seeing the text several time, each time annotating for a different task (sometimes single, sometimes multiple choice depending on the task). Do you have any guidance on how to set this up? I was looking at task routing but that seems to be more about coordinating several annotators; we only have one annotator at the time.
Welcome to the forum @vera-bernhard
You're definitely right about separating different classification tasks into different annotation workflows. If there's not much dependency between the binary and multiple choice decisions, the easiest way to set up the annotation in your case would be to run one textcat.manual
session with the binary classification task i.e. specifying just one label, then stop the Prodigy server and then run another textcat.manual
session with multiple choice classification storing the examples to a different dataset to keep your annotations in order.
This way you would be able to use out-of-the-box recipes and train by specifying the corresponding datasets for textcat
and textcat-multilabel
components. Prodigy train
command (as well as data-to-spacy
) will take care of merging the annotations for spaCy training function (which is used under hood).
There's actually quite some dependency between the the classification tasks; it would be better if the same sample could be seen several times, first for classification task a, then task b and so on. Otherwise, quite a bit of overhead is introduced if the annotator has to familiarize themselves with the sample several times.
I've tried setting it up as explained in my other Forum Post "Keeping Duplicates in Stream" but I can't prevent prodigy from deduplicating. Is there any other ways to set it up?
Hi @vera-bernhard,
If the dependency between tasks consist only in that the annotator must answer different questions about the same input but their answer to one question does not impact the formulation of the next question, it should be fairly easy to implement, especially that you have just one annotator (for multiple annotators the function would be more complex to ensure they see all questions per given input).
In Prodigy the task stream can consist of task with different view IDs that you can define on the task level so you could design your stream to show each input multiple times, each time with a different annotation task. Pretty much what you were sharing in Keeping Duplicates in Stream I think.
Please see my answer there re the deduplication issue and let's see if that solves the problem.