Refresh browser fix with force_stream_order

No problem at all !

1 Like

Re :smile: So I tested the latest release v1.9.10 and the beta v1.10. Here's some feedback:

1/ With both versions, Prodigy stops sending tasks once a batch is finished. I had to refresh the browser in order for it to fetch new ones. Is this an expected behavior ? Not that it's an important matter; I just want to be sure because in previous versions this transition was automatic.

2/ With v1.9.10, I did the following experiment: built-in pos.correct recipe, 60 examples, 2 sessions, batch size to 10, feed_overlap=false and force_stream_order=true. The expected scenario was for session1 to start with example1 and session2 with example11. It still didn't work, both sessions started with example1. I clicked through both sessions while switching between them every once in a while, and I ended up with 63 annotations in the database. Duplicates still exist but are indeed rarer than before.

3/ With v1.10beta I put up the exact same workflow. There were still some duplicates in the tasks, but none of them were saved into the database. I had exactly 60 annotations. For me this did solve the problem. We might have spent a little extra time when annotating, but the results are clean. Great workaround !

1 Like

Great to hear about the 1.10 release. Is this bug fixed there? Based on @Kairine's report, it seems not?

This is not the expected behavior and hasn't been reported by other users that tested 1.10. If you have to refresh the page to get each batch of questions, that's a bug, and you should create a new thread with reproduction info so we can help you resolve it.

You should not need to put any specific delay between answering questions with the latest version.

Please try the build for yourself and see if your specific problems are solved. The original issue in this thread has been solved and confirmed by multiple users. If you have another issue, please open a new thread and include reproduction steps. :pray:

@snd507, @dshefman, @cgreco, @Kairine thanks for helping test the preview build and for confirming that the latest version mitigates the duplicates that were ending up in your db when using force_stream_order.

1 Like

The duplication error with named sessions (?session=user) still happen!
Using config "force_stream_order": True, "feed_overlap":False

custom recipe: (very basic simple recipe) stream is a generator yielding one example at a time.

I don't want to have to write workarounds for something that should be basic functionality
[feed_overlap bug?]
Version 1.10.1

Hi @snu-ceyda, welcome! Sorry to hear you're having trouble, please try the latest version 1.10.2, which was released today.

Unfortunately I still get duplicates with v1.10.2. even after adding hashes to my stream, which yields objects like the following;

{'text': 'jasdnja  njkadf.',
'image': 'smt.jpg',
'_input_hash': -2106113696,
'_task_hash': -427767885,
'meta': {'pattern': '2197'}}

hashes are correctly generated.

@snu-ceyda thanks for the follow-up. Since you're using a custom recipe it's a bit hard to tell where things are going wrong. I'm sure it's a simple recipe as you say, but can you provide me more information to help understand it?

What kind of tasks are you annotating? Are they text-based, images, or both? Are you using the prefer_uncertain, prefer_high_scores or prefer_low_scores functions in your recipe?

Also, when you see duplicate tasks shown in the frontend, do multiple entries end up in the database after answering them? You can test this by annotating a few duplicates, then saving and exporting the database table to see if the answers are there twice.

EDIT: I found an issue where duplicates were not filtered if you use exclude_by=input with 1.10.2. If you're willing to try out a beta build that fixes this issue to see if it solve your problem, send me an email at justin@explosion.ai

Hi @justindujardin ! I have the exact same problem and I am using version 1.9.10. The problem is that I do not have access to version 1.10 with my license. Will this be fixed for 1.9 at some point? Is there a workaround?

For my use case, after about 25 samples, it loops around on these samples and never show anything else. The dataset has 500 samples in total.

Here is my prodigy.json

{
  "db": "postgresql",
  "db_settings": {
    "postgresql": {
      "dbname": ...,
      "user": ...,
      "password": ...,
      "host": ...
    }
  },
  "host": "0.0.0.0",
  "feed_overlap": true,
  "choice_auto_accept": true,
  "batch_size": 10,
  "force_stream_order": true

This issue combined with the one described here basically makes the multi-session annotation impossible to use for me: Multi-session - annotators do not receive all tasks with feed_overlap with textcat.manual recipe - #3 by jalpaca

Thanks a lot in advance for your help!
Cheers