Hi, we are using Prodigy mainly for annotation for text classification task and there is a need for multiple annotators to annotate the dataset. So we have divided the dataset and hosted it on multiple ports for different annotators. For example, if we have 500 samples to annotate and are having 2 annotators, we are dividing it into 2 datasets having 250 samples and hosting it on two different ports. But the problem we are facing is that after sometime either 1 or both of the ports goes down and we are having to start the process all over again. Any help in this regard is highly appreciated.
Hi! Do you have some more details on what exactly happens? How does it "go down", is there an error message? Does the machine run out of memory, and is there anything else running on it? Do you see anything in the logs that could be relevant?
Under the hood, Prodigy is a fairly straightforward Python service with a REST API powered by FastAPI. There are many different reasons why a service like this could die – memory constraints, an actual error in the Python process, something else interfering with the service, or something completely different.
There is no error message. It seems that the process just quits, it might be because of the machine running out of memory. There is nothing in the logs which indicates why the application crashed.
I ran into a problem like that a few days ago. In my case, I was loading improperly encoded/escaped tasks as jsonl. These caused the server to crash only once the broken tasks were being served. Is that similar to your situation?
I've a similar problem (I think)
I have multiple annotators who will not simultaneously make annotations (I can tell them to). However, I cannot run two different scripts simultaneously (don't know how).
We have a number of labels, and for me to change the labels once they're "done" with a label is not sustainable. So, is there a trick an annotator can select between
prodigy textcat.manual dataset-01 data-01.txt --label LABEL-01
prodigy textcat.manual dataset-02 data-02.txt --label LABEL-02
What exactly is not working for you? You can just run them with different ports configured via the
PRODIGY_PORT env variable and if the ports are open and nothing else is running on them, they'll be accessible.