Example repeated/duplicated within and across sessions

raulsperoni · December 12, 2022, 12:09pm

Hello, I've posted before about this, but as I'm using a different set up and version here I go again.

I'm using Prodigy 1.11.8, dockerized and deployed in amazon ECS, I'm using an external Postgres database, and a multi session set up. This is a very simple classification task. I pull texts from DB before calling prodigy.serve within a script and then I override the stream function.

But examples are repeating (same task hash), and a lot! Sometimes even 13 times for the same session. I wonder if I'm doing something wrong, or maybe Prodigy simply isn't meant to be used like this. I've read other posts here, I've tried other configs but nothing works.

In the logs I'm seeing the "re adding to stream" message....

This is the **config I'm passing to prodigy.serve (I've tried with bigger batches, is worse):

{
"host": "0.0.0.0", 
"buttons": ["accept", "ignore", "undo"], 
"batch_size": 5, 
"exclude_by": "task", 
"show_stats": True, 
"choice_style": "single", 
"feed_overlap": False, 
"auto_count_stream": True, 
"choice_auto_accept": True, 
"auto_exclude_current": True
}

Any ideas will be much appreciated. I'm almost giving up on this.

Thank you!!

ryanwesslen · December 14, 2022, 4:03pm

Thanks for your post!

Is there a reason why you're still using 1.11.8 and not the experimental alpha? I saw you posted there in May 2022 but seemed to have issues in this post:

We're very close to releasing Prodigy v1.12 that implements this new database refactoring. As mentioned in the alpha, one of the key new features is the Feed table that will track annotations statuses (e.g., answered, sent, cancelled, unsent) and timestamped from when it's sent from the last time the status changed.

It is worth noting even the most recent alpha version available has changed too (we found additional fixes) so you may want to wait for v1.12.

Once we release, we plan to have a few more engineers help out with support as the new refactoring also requires a migration for datasets. That would be a perfect time to iterate/debug and we'd appreciate feedback in case there are still problems.

I understand you probably need v1.12 as quickly as possible. I can tell you our dev team is working very hard but we don't want to release until we can perform successfully on all of the tests. I'll let you know as soon as we have a concrete date for v1.12. Thank you for your patience!

raulsperoni · December 15, 2022, 12:35pm

Ryan thank you so much for your answer!

Honestly, I thought that I was already in a stable version that had the experimental feed into it. I didn't read carefully the versions.

But regretfully I'm getting same error that I got months ago:

ei_annotation  | 12:51:56: VALIDATE: Creating validator for view ID 'choice'
ei_annotation  | 12:51:56: VALIDATE: Validating Prodigy and recipe config
ei_annotation  | 12:51:56: DB: Initializing database SQLite
ei_annotation  | 12:51:56: DB: Connecting to database SQLite
ei_annotation  | 12:51:56: DB: Creating dataset '2022-12-15_12-51-56'
ei_annotation  | 12:51:56: FEED: Initializing from controller
ei_annotation  | Traceback (most recent call last):
ei_annotation  |   File "/usr/local/lib/python3.8/runpy.py", line 194, in _run_module_as_main
ei_annotation  |     return _run_code(code, main_globals, None,
ei_annotation  |   File "/usr/local/lib/python3.8/runpy.py", line 87, in _run_code
ei_annotation  |     exec(code, run_globals)
ei_annotation  |   File "/app/recipe.py", line 53, in <module>
ei_annotation  |     prodigy.serve(
ei_annotation  |   File "/usr/local/lib/python3.8/site-packages/prodigy/__init__.py", line 49, in serve
ei_annotation  |     controller = loaded_recipe(*recipe_args, config=config)
ei_annotation  |   File "cython_src/prodigy/core.pyx", line 436, in prodigy.core.recipe.recipe_decorator.recipe_proxy
ei_annotation  |   File "cython_src/prodigy/core.pyx", line 80, in prodigy.core.Controller.from_components
ei_annotation  |   File "cython_src/prodigy/core.pyx", line 183, in prodigy.core.Controller.__init__
ei_annotation  |   File "cython_src/prodigy/components/feed_v2.pyx", line 139, in prodigy.components.feed_v2.Feed.__init__
ei_annotation  | TypeError: add_dataset() got an unexpected keyword argument 'feed'
ei_annotation exited with code 1

This error happens with the external postgres connection (fresh database) and sqllite (also fresh).

Python is 3.8, could this be the issue?

Again thank you @ryanwesslen , and please let me know if there is something else I can try. I'm running out of time for this. Thanks!

ryanwesslen · December 16, 2022, 5:44pm

Thank you @raulsperoni!

Can you provide the prodigy.json file? This would help to ensure there isn't a problem with the database config.

I've posted an internal note for the dev team. I'll post back if I hear suggestions but I know several will be off for the holidays so it may be until early January. I'll also play around some more to see if I can find anything.

Maybe. Python 3.8 is generally the minimum but since this is experimental, we didn't do robust testing to see if there's a conflict. I would try at least Python 3.9.

raulsperoni · December 16, 2022, 7:04pm

@ryanwesslen thank you, I'm only overriding these options through prodigy serve:

raulsperoni:

{
"host": "0.0.0.0", 
"buttons": ["accept", "ignore", "undo"], 
"batch_size": 5, 
"exclude_by": "task", 
"show_stats": True, 
"choice_style": "single", 
"feed_overlap": False, 
"auto_count_stream": True, 
"choice_auto_accept": True, 
"auto_exclude_current": True
}

ryanwesslen · December 20, 2022, 11:06pm

Have you tried to reduce your batch_size to 1?

This may fix the duplication problem but at the expense of removing the "undo" for those annotations in the browser but not yet sent to the database.

This isn't a permanent fix but may be a short term fix until we can release v1.12.

Topic		Replies	Views
Duplicated examples over sessions in NER manual ner	7	595	May 19, 2022
Duplicate annotations in output Getting Started bug , to-be-released , streams	53	3518	January 27, 2023
Few records in in the db for the same example usage	26	630	June 13, 2023
Missing data when usage , server	8	1036	May 21, 2021
Duplicated annotation when changing version ner , spacy	6	556	November 9, 2022

Example repeated/duplicated within and across sessions

Related topics