We would like to run multiple recipes on the same port and switch between them similar to how you switch between interfaces in the live demo. Is this possible?
Prodigy will start each task in a separate process that provides the model in the loop (if required), web application and REST API. Each task has its own app, to allow running more than one task independently – that’s why they need to be served on separate ports. There’s not really an easy way to serve several independent web applications on the same port.
There’s also not really a good case for why you’d want to switch between interfaces, as this would introduce all sorts of problems and open questions: You’d need to make sure that the answers are stored with the correct task, make sure to re-fetch unanswered questions that were sent out etc. (The demo app is just a demo, so it’ll discard all questions and answers when you switch.)
So if I understand correctly, there is currently no way to create labels in multiple tasks through one interface (in the way it looks within the demo)? This is a bit disappointing, there are many use-cases where you’re having people label data for more than one task at a time.
Is this something that will be possible with Prodigy Scale?
So just to clarify the sort of workflow you’d like for the annotator:
- Annotator sees some text
- Select which text cat label to apply
- Select NER from the task dropdown
- Choose the NER label
- Accept all annotations, go to next text
We don’t have plans to support this workflow, as it would require the tool to be quite different, and we think it’s actually significantly worse than doing one task at a time in almost all situations.
If your annotation tasks get too complicated, your annotation speed and accuracy will go down a lot. Your annotators will perform much better if you let them concentrate on one thing at a time. It doesn’t really matter that they have to move through the text in multiple passes. If they’re making a simple decision on each piece of text, they’ll be able to do that very quickly, often without really reading. If you ask them to remember a lot of policies at once, their behaviour will shift subtly as they move through the text, as at times they’ll forget parts of the schema.
Here’s one way to think about it. You basically have two loops: the loop over texts, and the loop over tasks. So you can do things two ways:
# 1. Do one task at a time, visit texts multiple times. What Prodigy does. for task in tasks: for text in texts: ... # 2. Do one text at a time, visit tasks multiple times. What you're suggesting. for text in texts: for task in tasks: ...
The information in your tasks’ annotation schemes is going to be much bigger than the information in a single text. This means that if you’re iterating over the tasks for each text, you’re paging a much larger amount of information in and out of memory. Now, obviously your brain doesn’t work like your computer’s RAM…But information is information, and some of the principles are sort of similar.
We’ve used the one-task-per-server constraint to give Prodigy a much more direct design. You only have to select one dataset when you load up the server, because you’re only ever creating one type of annotation at a time. You can have a model in the loop because the
ner recipe is distinct from the
textcat recipe. The frontend is faster because it has fewer abstractions, as it knows it’s always serving questions for one type of task at a time.
The online demo actually shows the task-switching behaviour as well. When you select a new task from the dropdown, you get a new feed of questions. It doesn’t give you the option of moving over the texts, and for each text, moving over the tasks.
@honnibal I’m assuming that was directed at me - it’s not quite the workflow I am picturing. In my case, the tasks are completely separate, just as they are in the demo. Imagine something more like this:
I have two tasks, one is to classify a bunch of sentences into binary categories, the other is to do slot filling annotations (let’s say on a completely different set of sentences).
Now as I understand, currently I would just start two separate prodigy servers / Docker containers, one for each task. So far so good.
My problem now is this: we need to dynamically add new tasks to these on the fly. So let’s say tomorrow I want to add another binary classification task, with its own data source. Right now, the way to do that is to just start another process/container, map the port somewhere so it’s accessible to labellers, etc.
What we’d really want is to simply have one prodigy server, some sort of registry of tasks, and labellers simply select from a drop-down the task they are currently going to work on. Creating new tasks via a UI would be nice, but to be honest it’s not a very high priority (we could write our own UI for that pretty easily).
My currently planned workaround is to have our own server for the task-management part of this, which spins up new prodigy containers and automatically configures a reverse proxy for them so they look as though they’re all part of the same app. This should work, but it seems like quite a hassle at the same time. Since the web demo actually already has almost the exact look and feel of what we are trying to achieve, I thought I’d ask whether you’re planning to implement something like this (e.g. in Prodigy Scale).
Thanks for the quick reply by the way!
@phdowling Really sorry I missed your reply before!
I understand what you want a bit better now, but unfortunately this isn’t something we can provide in the current standalone version of Prodigy.
New tasks really require a new process to be spawned, so there has to be this extra layer of task scheduling. That moves the software out from the standalone library sort of space, towards something that needs to be a bit more operationally complex. This is what we’re doing for Prodigy Scale.
If you need to develop a lightweight solution yourself, I would look at using Kubernetes as the task scheduler, especially if you’re already using Google Cloud Platform. Prodigy Scale will definitely meet your requirements once it’s released though.