How to setup prodigy for several tasks on the same data points.

vera-bernhard · April 2, 2024, 4:26pm

I'm trying to setup prodigy to annotate texts for several classification tasks (e.g. one task just being binary classification - is it about topic x or not and others multilabel classification - which method(s) is/are used from method a to method d). Currently, I have a custom recipe with a choice block that covers all classification tasks (basically just a long flattened list of all labels of all classification task) which obviously is not optimal.
Having seen other forum post, I was thinking of breaking it down into several tasks to make it less tedious and error-prone to the annotators, i.e. the annotators seeing the text several time, each time annotating for a different task (sometimes single, sometimes multiple choice depending on the task). Do you have any guidance on how to set this up? I was looking at task routing but that seems to be more about coordinating several annotators; we only have one annotator at the time.

magdaaniol · April 3, 2024, 11:27am

Welcome to the forum @vera-bernhard

You're definitely right about separating different classification tasks into different annotation workflows. If there's not much dependency between the binary and multiple choice decisions, the easiest way to set up the annotation in your case would be to run one textcat.manual session with the binary classification task i.e. specifying just one label, then stop the Prodigy server and then run another textcat.manual session with multiple choice classification storing the examples to a different dataset to keep your annotations in order.
This way you would be able to use out-of-the-box recipes and train by specifying the corresponding datasets for textcat and textcat-multilabel components. Prodigy train command (as well as data-to-spacy) will take care of merging the annotations for spaCy training function (which is used under hood).

vera-bernhard · April 23, 2024, 4:23pm

There's actually quite some dependency between the the classification tasks; it would be better if the same sample could be seen several times, first for classification task a, then task b and so on. Otherwise, quite a bit of overhead is introduced if the annotator has to familiarize themselves with the sample several times.

I've tried setting it up as explained in my other Forum Post "Keeping Duplicates in Stream" but I can't prevent prodigy from deduplicating. Is there any other ways to set it up?

magdaaniol · April 25, 2024, 7:48am

Hi @vera-bernhard,

If the dependency between tasks consist only in that the annotator must answer different questions about the same input but their answer to one question does not impact the formulation of the next question, it should be fairly easy to implement, especially that you have just one annotator (for multiple annotators the function would be more complex to ensure they see all questions per given input).
In Prodigy the task stream can consist of task with different view IDs that you can define on the task level so you could design your stream to show each input multiple times, each time with a different annotation task. Pretty much what you were sharing in Keeping Duplicates in Stream I think.
Please see my answer there re the deduplication issue and let's see if that solves the problem.

Topic		Replies	Views
annotating two tasks at once usage , solved	1	428	September 3, 2019
How to do multiclass textcat? usage , textcat	8	4754	May 25, 2018
From Choice annotations to binary annotations with Teach usage , textcat , spacy	4	982	January 2, 2019
No tasks available for textcat.teach with multiple labels textcat	1	959	March 27, 2018
Label multiple text at the same time	5	367	September 1, 2023

How to setup prodigy for several tasks on the same data points.

Related topics