New Prodigy user here with plenty of questions, but let's start with just one: is there a built-in mechanism for forwarding annotation tasks from one interface to another?
To exemplify, if one interface is used to annotate text spans, could the resulting spans be sent automatically to the next interface, which would involve classifying them?
Essentially, I'm asking if Prodigy supports building more complex annotation pipelines that allows the data to stream from one interface to another.
It's actually one of the most important design features of Prodigy to be able to add annotations in layers. We usually recommend that over combining different annotation types into a single interface, which results in more complex and thus error-prone annotation process.
As far as I can understand your use case, you'd have a group of annotators performing (to follow your example) span annotations and, then, another group adding textcat labels to the span-annotated text. And you want to paralelize the task as much as possible so that the textcat annotators don't have to wait until span annotators are done with the entire dataset or maybe the input data is streaming constantly.
In that case, you'd need to set up two Prodigy servers: one with the spancat recipe and another with the textcat recipe. The textcat recipe should contain a custom loader that would pull the data from the spancat output dataset.
One thing worth considering is that with this automatic forwarding you're not performing any quality checks on your span annotations and these usually can be done once a representative sample of annotations has been collected. This is, of course, only the case if the textcat annotations depend on the spancat annotations, which I assume they do, because otherwise you could just have these tasks in parallel and merge datasets afterwards.
So, one solution to consider would be to have an intermediate script that pools spancat annotations in batches, performs your custom quality checks and if it's good enoug, it makes the data available to the textcat recipe (e.g. by storing them in a dedicated dataset). The poor quality spancat annotations, on the other hand, should be subject to revision.
Do let me know though, if I misunderstood your use case and we'll take it from there.
Hi @magdaaniol, thank you for your informative reply!
I gave a very simple example, but we will naturally implement quality control mechanisms between the tasks as well.
I do have a follow-up question: if we were to build a complex annotation, essentially we would have to launch a Prodigy server for each annotation task.
How is this typically achieved? Perhaps using some batch script that launches the required number of Prodigy instances? And how is the data forwarding between annotation interfaces configured?
If you can point us towards any documentation or examples about this, I would be very grateful.
There's a fair amount of flexibility when it comes to the implementation of services including Prodigy. At its core, each Prodigy instance is just a web server. You can launch multiple instances by starting them on different ports and/or hosts. You can also add a reverse proxy to give them specific subdomains and add load balancing e.g. with nginx. The details will always depend on your infrastructure (on premise? serverless cloud solutions?). The minimal solution for what you're describing should be possible to implement via bash or python scripts or Docker containers, yes.
To start Prodigy server programatically, you might want to check the prodigy.serve function.
I also recommend checking out our deployment docs for some considerations related to remote deployments in multi-user scenarios.
There's not really a way to configure data forwarding between instances via Prodigy API. Since it is a very use-case specific procedure it would have to by a custom service around Prodigy data loaders and Prodigy DB API. This service could pull the data from the instance A and, after optionally applying some processing, it would store the data in an intermediate database or table where the loader of the instance B can find it:
This data bridge and the instance manager for spinning up/closing down instances (not shown in the image)would be your custom services.
This case study with Posh contains a very high level description of their deployment using additional FastAPI services for managing multiple Prodigy instances, but I think it can give you an idea of how flexible the solution can be and the range of options available.
The deployment patterns are usually fairly specific and I'm afraid I don't have any concrete example implementation handy, but we're happy to support you as you go!