ner.correct --exclude not excluding duplicate tasks

,

@ines I seem to be experiencing a related issue, using Prodigy 1.10.4 on OS X (Catalina) with Python 3.8.5. I had the same issue as in the OP, with repeated tasks during ner.correct --exclude, found this thread, and applied the workaround with feed_overlap. That seemed to fix the issue ... at least until I got about 25-30 annotations in to my next batch. Then, Prodigy started to repeatedly feed the same tasks back to me, as if it was starting from the beginning of the same 25 tasks and looping through again. When I see the task again, the annotations I made are gone. The only way I can get it to stop looping through the same set of 25-30 annotations is to kill Prodigy and re-run the same recipe. Then, I get a new batch of about 25-30 tasks, and the fun begins again.

When I dump the dataset with db-out, it looks like the annotations I made are being saved, but my confidence is a bit shaken. Two further points:

1). I'm using --unsegmented because I'm correcting with a model I trained on a cold start dataset (following the general process you put forth in this video). However, it was complaining that the model didn't set sentence boundaries, so I came upon the --unsegmented option. I am having the same problem with or without it, though.

2). I'm probably missing something, but the docs say that the hashs _input_hash and _task_hash are supposed to be uint32, no? When I look in my db-out output, a lot of the hashes seem to be negative integers

... "_input_hash":-705417333,"_task_hash":-803297770 ...

Any ideas?

2 Likes