Hello Prodigy Team, Alex from AXSMARINE here.
I wanted to discuss an issue we've been encountering when distributing Prodigy tasks among our annotators and share the options we've explored thus far. We are using sessions "/?session=alex" to distribute workflow.
Currently, we've been using the following configuration settings:
Prodigy version : Prodigy v1.11.11
Options used in prodigy.json:
{
"card_css": {
"textAlign": "left",
"fontSize": 16
},
"custom_theme": {
"cardMaxWidth": 1200
},
"force_stream_order": true,
"feed_overlap": false,
"allow_work_stealing": false,
"db_settings": {
"mysql": {
"db": "my_db"
}
}
}
Recipes used:
"ner.maunal"
"spans.manual"
"textcat.manual"
We deploy the Prodigy instance in our Kubernetes cluster.
Problem description:
While these settings initially worked as intended, we have observed occasional task overlaps if Prodigy remains active for an extended period. We've been addressing this issue by having our labeling operators cross-reference document ID's that are provided in the meta data.
It appears that overlaps tend to occur when an annotator steps away from their workstation for an extended period of time, particularly at the end of the workday. Subsequently, on the following day, we often encounter several documents that have already been labeled, creating redundancy.
Interestingly, we've noticed that the recurring documents typically have the same IDs, suggesting that they are cycling repetitively in a seemingly random manner.
Restarting the Prodigy instance temporarily resolves this problem, but we are looking for a more sustainable solution to prevent feed overlaps from recurring.
I would greatly appreciate any suggestions or recommendations you may have to mitigate this issue and ensure smoother task allocation and management.
Best regards.