ner.correct --exclude not excluding duplicate tasks

Winawer · September 14, 2020, 8:46pm

@ines I seem to be experiencing a related issue, using Prodigy 1.10.4 on OS X (Catalina) with Python 3.8.5. I had the same issue as in the OP, with repeated tasks during ner.correct --exclude, found this thread, and applied the workaround with feed_overlap. That seemed to fix the issue ... at least until I got about 25-30 annotations in to my next batch. Then, Prodigy started to repeatedly feed the same tasks back to me, as if it was starting from the beginning of the same 25 tasks and looping through again. When I see the task again, the annotations I made are gone. The only way I can get it to stop looping through the same set of 25-30 annotations is to kill Prodigy and re-run the same recipe. Then, I get a new batch of about 25-30 tasks, and the fun begins again.

When I dump the dataset with db-out, it looks like the annotations I made are being saved, but my confidence is a bit shaken. Two further points:

1). I'm using --unsegmented because I'm correcting with a model I trained on a cold start dataset (following the general process you put forth in this video). However, it was complaining that the model didn't set sentence boundaries, so I came upon the --unsegmented option. I am having the same problem with or without it, though.

2). I'm probably missing something, but the docs say that the hashs _input_hash and _task_hash are supposed to be uint32, no? When I look in my db-out output, a lot of the hashes seem to be negative integers

... "_input_hash":-705417333,"_task_hash":-803297770 ...

Any ideas?

Topic		Replies	Views
Exclude not functioning / duplicate tasks done , streams	6	1694	July 21, 2020
Tasks are duplicated	3	439	June 7, 2023
Presenting the same annotation task multiple times ner , solved	3	948	April 12, 2020
ner.correct re-using same source and dataset usage , ner	2	614	February 5, 2020
Repeating examples when using `--exclude` with `ner.correct` usage , ner	4	625	May 22, 2021

ner.correct --exclude not excluding duplicate tasks

Related topics