Is is possible to display different sets of labels based on the meta value in stream?
I have a number of paragraphs that I would like to parse. Each of the paragraphs belongs to one of 8 different classes. Is it possible to display different set of labels based on the class to which the paragraph belongs? Or would I need to create 8 different recipes?
Also, if the above is possible, is there a way to save the answers into 8 different datasets or does it always have to save to the same one?
I think in your case, it’d probably be better and more efficient to start multiple sessions. Each session can have its own source file, label set and dataset that the annotations are saved to. For example:
prodigy ner.manual dataset1 en_core_web_sm data1.jsonl --label A,B,C
prodigy ner.manual dataset2 en_core_web_sm data2.jsonl --label D,E,F
# and so on...
This will likely also make the annotation process more pleasant – if both the text and the label scheme change on each example, your brain has to refocus constantly, and might introduce more human errors, too.
Thank you for your reply.
Is there any way I can start all of them at the same time? One person is writing the recipes and some other team members are doing labeling so it would be useful if it could all be started at the same time, or combined in one recipe.
Yes, if you want to do it all on one machine, you could just spin up each session on a different port, by setting the
PRODIGY_PORT environment variable. For example:
PRODIGY_PORT=8080 prodigy ner.manual dataset1 en_core_web_sm data1.jsonl --label A,B,C
PRODIGY_PORT=1234 prodigy ner.manual dataset2 en_core_web_sm data2.jsonl --label D,E,F
The first Prodigy server will then be started on
localhost:8080, and the second session on
localhost:1234. If you’re running it yourself in your terminal, you’ll either need a new terminal session per command (e.g. a new window or tab), or use something like tmux. You can also make the whole thing more elegant by wrapping it in a shell script so you only have to run one command to start everything.
Thank you for your reply.
I’m still wondering if there is anyway it can all be combine to one recipe, e.g. label is generated based on the properties of each task in stream?
No, by design, Prodigy expects you to define one label set per annotation session.
The label scheme is one of the most important parts of an NLP application and model, so allowing too much arbitrary variance during annotation can easily lead to many other problems down the line. Similarly, changing the annotation objective completely on a per task-basis really goes against Prodigy’s UX philosophy and I can’t think of many cases where this would be beneficial compared to an approach that uses multiple, dedicated sessions per label scheme.
That said, I understand that there are always exceptions and special use cases where you might want to do things differently. I still think doing 8 concurrent sessions makes the most sense here – especially since you do want to save the annotations to different datasets as well. Running concurrent sessions is no problem out-of-the-box and can be easily automated.
Do you have an example of one of those classes and the corresponding labels? (If you can’t share the exact details, maybe you can come up with a similar-ish example?)