Hello, I've posted before about this, but as I'm using a different set up and version here I go again.
I'm using Prodigy 1.11.8, dockerized and deployed in amazon ECS, I'm using an external Postgres database, and a multi session set up. This is a very simple classification task. I pull texts from DB before calling prodigy.serve within a script and then I override the stream function.
But examples are repeating (same task hash), and a lot! Sometimes even 13 times for the same session. I wonder if I'm doing something wrong, or maybe Prodigy simply isn't meant to be used like this. I've read other posts here, I've tried other configs but nothing works.
In the logs I'm seeing the "re adding to stream" message....
This is the **config I'm passing to prodigy.serve (I've tried with bigger batches, is worse):
Is there a reason why you're still using 1.11.8 and not the experimental alpha? I saw you posted there in May 2022 but seemed to have issues in this post:
We're very close to releasing Prodigy v1.12 that implements this new database refactoring. As mentioned in the alpha, one of the key new features is the Feed table that will track annotations statuses (e.g., answered, sent, cancelled, unsent) and timestamped from when it's sent from the last time the status changed.
It is worth noting even the most recent alpha version available has changed too (we found additional fixes) so you may want to wait for v1.12.
Once we release, we plan to have a few more engineers help out with support as the new refactoring also requires a migration for datasets. That would be a perfect time to iterate/debug and we'd appreciate feedback in case there are still problems.
I understand you probably need v1.12 as quickly as possible. I can tell you our dev team is working very hard but we don't want to release until we can perform successfully on all of the tests. I'll let you know as soon as we have a concrete date for v1.12. Thank you for your patience!
Can you provide the prodigy.json file? This would help to ensure there isn't a problem with the database config.
I've posted an internal note for the dev team. I'll post back if I hear suggestions but I know several will be off for the holidays so it may be until early January. I'll also play around some more to see if I can find anything.
Maybe. Python 3.8 is generally the minimum but since this is experimental, we didn't do robust testing to see if there's a conflict. I would try at least Python 3.9.